Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method errors with Flux.jl and Knet.jl #47

Open
outlace opened this issue Jun 12, 2018 · 8 comments
Open

Method errors with Flux.jl and Knet.jl #47

outlace opened this issue Jun 12, 2018 · 8 comments

Comments

@outlace
Copy link

outlace commented Jun 12, 2018

I'm trying to use TensorOperations to make a tensor regression. Basically it's a sequence of (in my case) 3rd-order tensor contractions. But I need to treat some of the tensors as if they were trainable parameters and then use gradient descent to optimize them. I get method errors with both knet and Flux because of these use their own wrapped versions of Arrays.

Here's an example of what I'm trying to do:

A = randn(28^2,100,100)
B = randn(100,100,10)
predict(input) = @tensor C[d,s] := (input[g,s] * A[g,b,c]) * B[b,c,d]
loss(x,y) = mean(abs2,y-predict(x))
lossgradient = grad(loss)

But I get an error with Knet trying to do gradient calculations:

julia> lossgradient(X, Y)
ERROR: MethodError: no method matching numind(::AutoGrad.Rec{Array{FixedPointNumbers.Normed{UInt8,8},2}})

Similar error when I use Flux.

@outlace outlace changed the title Method errors with Flux.jl or Knet.jl Method errors with Flux.jl and Knet.jl Jun 12, 2018
@Jutho
Copy link
Owner

Jutho commented Jun 12, 2018

It seems that these are not subtypes of AbstractArray. As such, TensorOperations cannot work with them out of the box. If the array/tensor types used by these packages can be made compatible with TensorOperations (i.e. they represent a form of data that has a strided memory layout), it should not be that hard to do so by adding the relevant methods.

@mcabbott
Copy link

I had a go at making this happen, and have an almost-working implementation which acts on Flux's TrackedArrays, and defines gradients for each of the three basis functions, by calling again such functions.

Would you be interested to have something like this?

I meant to make a PR, and then I found one bug in my gradients (hence "almost") and put it aside. However this was for v0.7, and I see now you're busy re-writing things... I had a quick look at updating it, and I think that I would probably need to delete three lines of existing code (after which your tests still pass) to re-use the same approach.

@Jutho
Copy link
Owner

Jutho commented Dec 21, 2018

I am pretty new to the topic of computation graphs and automatic differentiation, but I am very interested in it and would like to explore it further. In particular, how does it deal with mutating operations like mul! or all the tensor operations methods.

That being said, yes it would be great if TensorOperations could be useful for the machine learning packages. However, I would think that the design is such that mostly the additional code should be defined in e.g. Flux, no, just overloading a few methods from TensorOperations for the TrackedArray type. The documentation of TensorOperations v1.0 on this topic has been updated and extended (but still far from perfect).

If there is anything that could be done in TensorOperations.jl to facilitate this overloading, I am certainly open to suggestions.

@mcabbott
Copy link

I am new to all this too, but begin to understand a little how Flux's model works. The essence of this is to overload functions like add!(α, A::TrackedArray, ... so that they un-wrap their input, calculate as usual, and then return a re-wrapped output, which includes a function to be called on the reverse pass.

For this to work, what I need to change in tensormacro.jl is to return not the mutated C=$dst but instead what the function returns -- I believe these two are always the same for what you do, and thus I think changes like this don't break things:

        return quote
            $initex
            add!($α * $α2, $src, $conjarg, $β, $dst, $p1, $p2) # ::TrackedArray
            # $dst # mutated C isa Array
        end

Is that accurate? Perhaps there is a reason to return C which I've missed. For the := macros, I believe this should be enough -- at least it was on the old version. I haven't thought much about the mutating case, I'm not sure it fits well into this model, but perhaps there are tricks.

Besides such changes, there are two parts to write, a set of reverse functions (e.g. if forward was trace, then reverse is some contraction of delta with the upstream gradient) which are not specific to Flux, and a set of overloadings which are. I was thinking that the latter could be loaded here via @require Flux, but they could equally well live elsewhere.

@Jutho
Copy link
Owner

Jutho commented Dec 27, 2018

I am a bit confused or missing something: if the overloaded method accepts both A and C as wrapper arrays, i.e. TrackedArray, then the $dst variable will be the TrackedArray, no? In the overloaded methods (add!, trace!, contract!), both the input arrays and the destination array would be of the custom type?

@mcabbott
Copy link

The issue is that TrackedArrays are immutable. Perhaps one could still cheat.... my strategy was instead that add! etc. act on the Array which will become C.data, but return a TrackedArray C, which differs in having the tracking information glued on.

I put today's version in this gist, maybe clearer to see it? Something is still failing for one contract! case which I swear used to work. And I'm unhappy about some other details.

@Jutho
Copy link
Owner

Jutho commented Dec 27, 2018

I agree that removing the final $dst or $symC is harmless, or at least should be.

In what sense are TrackedArrays immutable. Many array wrappers are struct types, i.e. immutable, but they still have mutable data as they wrap some plain array. I would argue that in that case, it is fine or even standard julia to have mutating methods (e.g. scalar multiplication with lmul! and rmul!) act such that they modify the data, but still return the result which is a struct type, i.e. I have many definitions like this:

rmul!(A::MyWrappedArray, alpha) = (rmul!(A.data, alpha); return A)

and I would think that for add! and contract!, similar definitions would be needed, where furthermore similar_from_indices would return a TrackedArray.

Not sure if TrackedArrays are any different, as I said, I am not really familiar (yet) with Flux and the wider machine learning and automatic differentiation tools in Julia. Is there something fundamental that forces you to think about TrackedArrays as completely immutable?

@mcabbott
Copy link

OK, re $dst that is good to hear.

In addition to the data they have tracking information param(rand(2)).tracker, which Flux's track and @grad know how to update. Or rather, they know how to appending the reverse function you hand it to the end of the graph, but I don't understand all the details. If we made C::TrackedArray then we could certainly mutate C.data, but I don't know if we could copy this information back in, Flux.Tracker.Tracked and Flux.Tracker.Call are themselves structs. Returning D = add!(A::TrackedArray,C,...) ensures that we keep the correct information.

One could make make C::TrackedArray for the sake of consistency, but still return D with the correct D.tracker. One reason not to do this is that, as it stands, ∇add can distinguish based on types whether it needs to compute the gradient for C. Notice also that Z = zero(param(rand(2))) is an ordinary Array -- I think it's designed that way because Z is a dead-end in the backward pass, so tracking gradients leading to it will always be a waste.

The other slightly odd issue is that the eltype of tracked arrays is TrackedReal. At present this results in constants α being promoted, which means that my ∇add function cannot tell whether you really wished to track this, or not. I have not worked out gradients for α and β in all the functions, and would propose not to bother, at least at first, but it would be nicer for this to be an error than a silent mistake. The previous version was less eager to promote. I haven't yet looked at how easy it would be to change this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants