Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update Embedding layer #1656

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

manikyabard
Copy link
Contributor

Updates the Embedding layer to use gather for AbstractVector which earlier had an issue with repeated indices. Also adds special case for outputsize.

@@ -168,3 +168,6 @@ for (fn, Dims) in ((:conv, DenseConvDims), (:depthwiseconv, DepthwiseConvDims))
end
end
end

(m::Embedding)(x::AbstractVector{<:Nil}) = fill(nil, size(m.weight, 1), length(x))
(m::Embedding)(x::AbstractArray{<:Nil}) = fill(nil, size(m.weight, 1), last(size(x)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods should not be here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly where they should be (see the other code in the file).

Copy link
Member

@DhairyaLGandhi DhairyaLGandhi Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, these methods should not be defined at all since this is what we would expect output size to already know how to do. If output size cannot find the expected size, then it should be fixed there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outputsize's underlying mechanism was designed for numerical array operations which hits ~95% case even with custom layers. There will always be some layers that we might need to special case. A layer that uses indexing as the underlying operation is non-sensical with outputsize. How would you know the result of indexing an array with Nil?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only need the size, so maybe outputsize should know what to do with getindex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether it's useful, but the ordinary function above accepts any array. They should surely match.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to specialise on OneHot, but what happens if by some chance you collect it? Best case is that this gives you the same result, it's just an optimisation. (Indexing things often treat an array of Bools differently to other integers.)

Second best is for it to be an error, which is what I get now, but is this guaranteed? (Perhaps it is without offset arrays?)

julia> m(Flux.OneHotMatrix([6, 15, 15], 26))
2×3 Matrix{Float32}:
 4.01  22.01  22.01
 5.01  23.01  23.01

julia> m(collect(Flux.OneHotMatrix([6, 15, 15], 26)))
ERROR: BoundsError: attempt to access 2×26 Matrix{Float32} at index [1:2, 78-element Vector{Bool}]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether it's useful, but the ordinary function above accepts any array. They should surely match.

Yeah, but my intent was the >2D hits the reshape route in the embedding layer source. We only need to specify the Nil path for what finally gets called which is AbstractVecOrMat.

Good to specialise on OneHot, but what happens if by some chance you collect it?

This looks related to your other suggestions. We should constraint on AbstractVector{<:Integer} and AbstractVecOrMat{<:Bool} separately? In the latter case, it should be a matmul (mathematically) or m.weight[:, idx]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this make more sense for AbstractArray?

(m::Embedding)(x::AbstractArray{<:Nil}) = fill(nil, size(m.weight, 1), size(x)...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would keep the Nil overrides to a minimum and only do the vector case. Let the rest go through the normal routing for all arrays.

Copy link
Member

@darsnack darsnack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need some tests, but this looks good.

src/layers/basic.jl Outdated Show resolved Hide resolved
@DhairyaLGandhi
Copy link
Member

Needs tests

@CarloLucibello
Copy link
Member

ref. #1516

@CarloLucibello
Copy link
Member

I rebase the ref. #1516 on master since there were some conflicts. So I guess you should do git rebase -i origin/cl/embed here

@CarloLucibello
Copy link
Member

Or maybe it is just easier if I merge #1516 and then this targets master

@manikyabard
Copy link
Contributor Author

Or maybe it is just easier if I merge #1516 and then this targets master

Yeah sure that works

src/layers/basic.jl Outdated Show resolved Hide resolved
src/layers/basic.jl Outdated Show resolved Hide resolved
src/layers/basic.jl Outdated Show resolved Hide resolved
src/layers/basic.jl Outdated Show resolved Hide resolved
@mcabbott
Copy link
Member

mcabbott commented Jul 10, 2021

Maybe labels other than integers starting at 1 deserve some thought. Flux's one-hot encoding lets you specify the set of labels, which are not stored, onecold returns integers starting at 1.

julia> Flux.onehotbatch(collect("hello"), 'a':'z') 
26×5 Flux.OneHotArray{26,2,Vector{UInt32}}:
 0  0  0  0  0
 0  0  0  0  0
 0  0  0  0  0
 0  0  0  0  0
 0  1  0  0  0
 0  0  0  0  0
 0  0  0  0  0
 1  0  0  0  0
 0  0  0  0  0
 0  0  0  0  0
 ⋮           

julia> Flux.onecold(ans) # does not remember
5-element Vector{Int64}:
  8
  5
 12
 12
 15

Should this do something similar? Then m = Embedding(0:9 => 5) would have to store something, so that m(0) can give the first vector. Or is this unnecessary complication? Better handled by composing onehot and Embedding?

@darsnack
Copy link
Member

darsnack commented Jul 10, 2021

Mapping to an indices or one-hot space is a standard data transformation for using a layer like Embedding. So I would say it's not a necessary feature for the layer. And we would always need to support 1:N which could get tricky if the "labels" are also an integer range.

If we did want to include it, I would store a "transform" function within the layer.

PS: we do support onecold(x, labels)

@mcabbott
Copy link
Member

Ok, indeed this should probably be written something like Chain(onehot('a':'z'), Embedding(26 => 5)) rather than duplicating this.

@darsnack
Copy link
Member

Should this be re-targeted for master?

@CarloLucibello
Copy link
Member

yes. Probably filing a new PR is easier

@manikyabard manikyabard changed the base branch from cl/embed to master July 14, 2021 13:17
@manikyabard
Copy link
Contributor Author

I changed the base branch to master.

@darsnack
Copy link
Member

Looks like you need a rebase too


Embedding(dims::Pair{<:Integer, <:Integer}; init = randn32) = Embedding(init(last(dims), first(dims)))

(m::Embedding)(x::Union{OneHotVector, OneHotMatrix}) = m.weight * x # equivalent to m.weight[:,onecold(x)]
Copy link
Member

@mcabbott mcabbott Jul 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I think treating collect(onehot(...)) the same as onehot() would be nice. They are both arrays of booleans, so something like this may work:

Suggested change
(m::Embedding)(x::Union{OneHotVector, OneHotMatrix}) = m.weight * x # equivalent to m.weight[:,onecold(x)]
(m::Embedding)(x::AbstractVecOrMat{Bool}) = m.weight * x # handles OneHotVector, OneHotMatrix

(Or it may require a bit more thought which is more specific.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it comes down to whether the Vector{Bool} case should be a matmul or index operation.

Copy link
Member

@mcabbott mcabbott Jul 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those should agree when it's actually one-hot (if not OneHot) up to a dropdims:

julia> [1 2 3; 4 5 6] * [false, true, false]
2-element Vector{Int64}:
 2
 5

julia> [1 2 3; 4 5 6][:, [false, true, false]]
2×1 Matrix{Int64}:
 2
 5

But not when it's not:

julia> [1 2 3; 4 5 6] * [false, true, true]
2-element Vector{Int64}:
  5
 11

julia> [1 2 3; 4 5 6][:, [false, true, true]]
2×2 Matrix{Int64}:
 2  3
 5  6

Should the latter case be an error?

Maybe indexing would be faster although I'm not sure we care about optimising this. Logical indexing can't give you aliasing problems, if I'm thinking correctly.

Another possible strategy would just be to it an error on AbstractArray{Bool}. Only permit OneHot, or counting numbers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly speaking from an optimization stand point. If we don't care about this case being fast (it is rare), then the suggested code works for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to change the current onehot paths from

function (m::Embedding)(x::Union{OneHotLikeVector{T,L}, OneHotLikeMatrix{T,L,I}}) where {T,L,I}
    size(m.weight, 2) == L || throw(DimensionMismatch("Matrix column must correspond with OneHot size: $(size(m.weight, 2)) != $L"))
  return m(onecold(x))
end

back to a matmul?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes definitely, the current implementation is just code duplication.

Comment on lines 473 to 526
(m::Embedding)(x::AbstractVector) = NNlib.gather(m.weight, x)
(m::Embedding)(x::AbstractArray) = reshape(m(vec(x)), :, size(x)...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also restrict these to integers

Suggested change
(m::Embedding)(x::AbstractVector) = NNlib.gather(m.weight, x)
(m::Embedding)(x::AbstractArray) = reshape(m(vec(x)), :, size(x)...)
(m::Embedding)(x::AbstractVector{<:Integer}) = NNlib.gather(m.weight, x)
(m::Embedding)(x::AbstractArray{<:Integer}) = reshape(m(vec(x)), :, size(x)...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe only for AbstractVector so that Nil-arrays can go through the reshape?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I wasn't thinking about Nil, good point. Do you want Array{Nil} to behave like Array{Int} or like Array{Bool}?

One other weird feature is that if you make a OneHotTensor, like a boolean 3-array, this will reshape it to a vector. Maybe that's too weird to worry about. Or maybe it should reshape AbstractArray{Bool} not with vec but with m(reshape(size(x,1), :))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you are correct. Reshaping should be different when the first dimension encodes one-hot information. This definitely presents a problem for Nil, because the Nil-array is agnostic of whether the original input is one-hot or indices. In other words, the output size is input type dependent for this layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, is the only way for getting the correct output from outputsize just to ignore the first dimension of the one-hot input while passing in the inputsizes argument?

test/layers/basic.jl Outdated Show resolved Hide resolved
src/layers/basic.jl Outdated Show resolved Hide resolved
test/cuda/layers.jl Outdated Show resolved Hide resolved
@darsnack
Copy link
Member

@manikyabard are you still up for rebasing this and moving forward? We ought to close up this loose end.

@manikyabard
Copy link
Contributor Author

manikyabard commented Jan 28, 2022

@manikyabard are you still up for rebasing this and moving forward? We ought to close up this loose end.

Yeah I can continue working on this, although I am not sure about the approach we should take for outputsize. Maybe we can discuss this further in the next community call.

@darsnack
Copy link
Member

Maybe we can discuss this further in the next community call.

Yeah, sounds good!

@darsnack
Copy link
Member

darsnack commented Feb 1, 2022

Summarizing what was discussed on the call:

  • the issue is that the paths for AbstractVector{<:Integer} that do getindex calls don't make sense since Vector{Nil} is not a vector of indices
  • Michael's PR to make Nil a subtype of Real can help avoid the cases where Nil should only be hitting the AbstractArray{<:Real} paths
  • it still won't solve the issue when outputsize is used for a model utilizing the AbstractVector{<:Integer} path
    • this will need an outputsize override rule for Embedding + AbstractVector{Nil}
    • (this wasn't brought up during call but I just thought of it) a better solution would be to define NNlib.gather for Nil which will cover all indexing cases beyond just Embedding

@manikyabard
Copy link
Contributor Author

  • (this wasn't brought up during call but I just thought of it) a better solution would be to define NNlib.gather for Nil which will cover all indexing cases beyond just Embedding

You mean something like this?

NNlib.gather!(dst::AbstractArray, ::AbstractArray, ::AbstractArray{<:Nil}) = fill(nil, size(dst)...)
(m::Embedding)(x::AbstractVector{<:Nil}) = NNlib.gather(m.weight, x)

@darsnack
Copy link
Member

darsnack commented Feb 1, 2022

That looks right but you won't need to special case for Embedding anymore. It should go through the NNlib.gather rule automatically.

@CarloLucibello
Copy link
Member

this needs a rebase in master

@codecov-commenter
Copy link

Codecov Report

Merging #1656 (de43bf5) into master (3cc9067) will decrease coverage by 0.07%.
The diff coverage is 60.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1656      +/-   ##
==========================================
- Coverage   84.58%   84.51%   -0.08%     
==========================================
  Files          21       21              
  Lines        1486     1485       -1     
==========================================
- Hits         1257     1255       -2     
- Misses        229      230       +1     
Impacted Files Coverage Δ
src/onehot.jl 95.29% <ø> (-0.06%) ⬇️
src/outputsize.jl 82.05% <0.00%> (-2.16%) ⬇️
src/layers/basic.jl 80.99% <75.00%> (-0.16%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3cc9067...de43bf5. Read the comment docs.

@darsnack
Copy link
Member

Can you add a test for outputsize of Embedding?

weight::W
end

@functor Embedding

Embedding(in::Integer, out::Integer; init = randn32) = Embedding(init(out, in))
Embedding(dims::Pair{<:Integer, <:Integer}; init = randn32) = Embedding(init(last(dims), first(dims)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old constructor should be deprecated


julia> vocab_idxs = [1, 722, 53, 220, 3]
```jldoctest
julia> m = Embedding(reshape(-6:45, 2, 26) .+ 0.01f0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old example was much clearer.
This constructor (Embed(weight)) is not even part of the docstring, we should add it

Copy link
Member

@mcabbott mcabbott Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed on the constructor.

The virtue of this example is that it doesn't have random numbers, so it can be a doctest. My hope is that onehotbatch("foo", 'a':'z') might connect with 26 well enough to be easy to follow. Maybe it can be made clearer somehow?

gpu_gradtest("Embedding OneHotMatrix index", embedding, OneHotMatrix([1,2,3], 5), 5, 2)
gpu_gradtest("Embedding OneHotMatrix repeated indices", embedding, OneHotMatrix([1,2,2], 5), 5, 2)
gpu_gradtest("Embedding", embedding, [1,3,5], 5 => 2)
gpu_gradtest("Embedding repeated indices", embedding, rand(1:50, 10^6), 50 => 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to test such large arrays, CI takes already a lot of time, previous test was fine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10^6 seems quite big, but maybe there's a some value to bigger than 3, in case e.g. 3 is never parallelised. No idea if there's a particular cutoff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated it to be 10^3. Would this also add a lot of time to CI?

gpu_gradtest("Embedding 2d index", embedding, [1 2; 3 4], 5 => 2)
gpu_gradtest("Embedding OneHotVec index", embedding, OneHotVector(1, 5), 5 => 2)
gpu_gradtest("Embedding OneHotMatrix index", embedding, OneHotMatrix([1,2,3], 5), 5 => 2)
gpu_gradtest("Embedding OneHotMatrix repeated indices", embedding, OneHotMatrix(rand(1:50, 10^6), 50), 50 => 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

src/layers/basic.jl Outdated Show resolved Hide resolved
@@ -168,3 +168,5 @@ for (fn, Dims) in ((:conv, DenseConvDims), (:depthwiseconv, DepthwiseConvDims))
end
end
end

NNlib.gather!(dst::AbstractArray, ::AbstractArray, ::AbstractArray{<:Nil}) = fill(nil, size(dst)...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To understand what's going on, this is using Nil to stand in as an index, right? Maybe that warrants a comment. This is not otherwise handled:

julia> rand(3)[Flux.nil]
ERROR: ArgumentError: invalid index: Flux.NilNumber.Nil() of type Flux.NilNumber.Nil
Stacktrace:
 [1] to_index(i::Flux.NilNumber.Nil)
   @ Base ./indices.jl:300
 [2] to_index(A::Vector{Float64}, i::Flux.NilNumber.Nil)

Also, is the fact that this doesn't mutate dst and returns something else going to bite us someday?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, we can modify the non-mutating variation: NNlib.gather which is what Flux layers will end up calling.

src/layers/basic.jl Show resolved Hide resolved
src/layers/basic.jl Outdated Show resolved Hide resolved
manikyabard and others added 9 commits March 1, 2022 17:21
Co-authored-by: Michael Abbott <32575566+mcabbott@users.noreply.github.com>
Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>
Updated Embedding constructor to use `=>` and added OneHotLikeVector and OneHotLikeMatrix consts.
Co-authored-by: Michael Abbott <32575566+mcabbott@users.noreply.github.com>
Co-authored-by: Michael Abbott <32575566+mcabbott@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants