Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added support for SparseArray input in Dense layer #987

Closed
wants to merge 1 commit into from
Closed

added support for SparseArray input in Dense layer #987

wants to merge 1 commit into from

Conversation

ajprabhu09
Copy link

@ajprabhu09 ajprabhu09 commented Jan 6, 2020

Fixes Issue #965
Dense Layer with is able to take SparseArray as input

@MikeInnes
Copy link
Member

Can you describe why this is needed a bit more?

It seems like it's the same definition that's already covered by (a::Dense)(x::AbstractArray), since sparse arrays are abstract arrays (and the method body hasn't changed).

@ajprabhu09
Copy link
Author

For some reason gradients of Dense layer with an Arrays as it's weight and a SparseMatrix as input fails.


using Flux, SparseArrays

md = Dense(2,2)
ms = Dense(sparse(randn(2,2)), sparse(randn(2)))
xd = randn(2, 2)
xs = sparse(randn(2, 2))

gradient(() -> sum(md(xd)), Flux.params(md)) #Works
gradient(() -> sum(ms(xs)), Flux.params(ms)) #Works
gradient(() -> sum(ms(xd)), Flux.params(ms)) #Works
gradient(() -> sum(md(xs)), Flux.params(md)) #Fails

I tried using sum(md.σ.(md.W * xs .+ md.b)) in its place which works. Maybe something to do with BLAS calls and SparseArrays in the backward pass?

@MikeInnes
Copy link
Member

Right you are, but this error actually comes from trying to convert the sparse array to Float32 to match the layer:

julia> gradient(x -> sum(Float32.(x)), sparse(randn(2)))
ERROR: MethodError: no method matching zero(::Type{Any})

Your patch avoids the conversion, but really we should just fix this error in Zygote.

@ajprabhu09
Copy link
Author

So what can be defined for zero(::Type{Any}) As I understand it the zero function is missing for the Type Any?

@MikeInnes
Copy link
Member

Well, at some point we're calling zero(Any). That can't be right, so it needs digging into to find out why that's happening.

@racinmat
Copy link
Contributor

The main problem is caused by mixing Float32 and Float64, which results in zero(Any).
When we have consistent types, it's working.

W = randn(Float32, 2,2)
b = randn(Float32, 2)
md = Dense(W, b)
xs = sparse(randn(Float32, 2, 2))
gradient(() -> sum(md(xs)), Flux.params(md))

W = randn(Float64, 2,2)
b = randn(Float64, 2)
md = Dense(W, b)
xs = sparse(randn(Float64, 2, 2))
gradient(() -> sum(md(xs)), Flux.params(md))

are working.

But

W = randn(Float32, 2,2)
b = randn(Float32, 2)
md = Dense(W, b)
xs = sparse(randn(Float64, 2, 2))
gradient(() -> sum(md(xs)), Flux.params(md))

W = randn(Float64, 2,2)
b = randn(Float64, 2)
md = Dense(W, b)
xs = sparse(randn(Float32, 2, 2))
gradient(() -> sum(md(xs)), Flux.params(md))

throw the above-mentioned error.
The main question is: how should we approach it?
Since Flux uses Float32 by default, but Julia itself is using usually Float64 by default, it's very easy to run into this inconsistency.

@racinmat
Copy link
Contributor

For Zygote itself, it's working, see:

W = randn(Float32, 2,2)
b = randn(Float32, 2)
xs = sparse(randn(Float32, 2, 2))
gradient(() -> sum(W*xs .+ b), Params([W,b]))

W = randn(Float32, 2,2)
b = randn(Float32, 2)
md = Dense(W, b)
xs = sparse(randn(Float64, 2, 2))
gradient(() -> sum(W*xs .+ b), Params([W,b]))

W = randn(Float64, 2,2)
b = randn(Float64, 2)
md = Dense(W, b)
xs = sparse(randn(Float32, 2, 2))
gradient(() -> sum(W*xs .+ b), Params([W,b]))

W = randn(Float64, 2,2)
b = randn(Float64, 2)
md = Dense(W, b)
xs = sparse(randn(Float64, 2, 2))
gradient(() -> sum(W*xs .+ b), Params([W,b]))

all of them are passing, so I don't think this is problem with Zygote.

@racinmat
Copy link
Contributor

Hm, it seems the conversion is problematic. Dense layer explicitly casts input data to type of parameter in https://github.com/FluxML/Flux.jl/blob/master/src/layers/basic.jl#L138.

And Zygote can't pullback through it, following Zygote code


W = randn(Float32, 2,2)
b = randn(Float32, 2)
md = Dense(W, b)
xs = sparse(randn(Float64, 2, 2))
gradient(() -> sum(W*Float32.(xs) .+ b), Params([W,b]))

W = randn(Float64, 2,2)
b = randn(Float64, 2)
md = Dense(W, b)
xs = sparse(randn(Float32, 2, 2))
gradient(() -> sum(W*Float64.(xs) .+ b), Params([W,b]))

crashes. The question is: should this be fixed in Zygote or in Flux?

@ajprabhu09
Copy link
Author

I guess if mixed-precision compute is supported this shouldn't be an issue.

@racinmat
Copy link
Contributor

I made issue on Zygote with MWE, we'll see FluxML/Zygote.jl#810

@ToucheSir
Copy link
Member

Since this turned out to be about eltype mismatches instead of sparse array support, I think it can be safely closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants