update Embedding layer #1656

manikyabard · 2021-07-09T12:27:48Z

Updates the Embedding layer to use gather for AbstractVector which earlier had an issue with repeated indices. Also adds special case for outputsize.

DhairyaLGandhi · 2021-07-09T12:30:22Z

src/outputsize.jl

@@ -168,3 +168,6 @@ for (fn, Dims) in ((:conv, DenseConvDims), (:depthwiseconv, DepthwiseConvDims))
    end
  end
 end
+
+(m::Embedding)(x::AbstractVector{<:Nil}) = fill(nil, size(m.weight, 1), length(x))
+(m::Embedding)(x::AbstractArray{<:Nil}) = fill(nil, size(m.weight, 1), last(size(x)))


These methods should not be here.

This is exactly where they should be (see the other code in the file).

I mean, these methods should not be defined at all since this is what we would expect output size to already know how to do. If output size cannot find the expected size, then it should be fixed there.

outputsize's underlying mechanism was designed for numerical array operations which hits ~95% case even with custom layers. There will always be some layers that we might need to special case. A layer that uses indexing as the underlying operation is non-sensical with outputsize. How would you know the result of indexing an array with Nil?

You only need the size, so maybe outputsize should know what to do with getindex.

I don't know whether it's useful, but the ordinary function above accepts any array. They should surely match.

Good to specialise on OneHot, but what happens if by some chance you collect it? Best case is that this gives you the same result, it's just an optimisation. (Indexing things often treat an array of Bools differently to other integers.)

Second best is for it to be an error, which is what I get now, but is this guaranteed? (Perhaps it is without offset arrays?)

julia> m(Flux.OneHotMatrix([6, 15, 15], 26)) 2×3 Matrix{Float32}: 4.01 22.01 22.01 5.01 23.01 23.01 julia> m(collect(Flux.OneHotMatrix([6, 15, 15], 26))) ERROR: BoundsError: attempt to access 2×26 Matrix{Float32} at index [1:2, 78-element Vector{Bool}]

I don't know whether it's useful, but the ordinary function above accepts any array. They should surely match.

Yeah, but my intent was the >2D hits the reshape route in the embedding layer source. We only need to specify the Nil path for what finally gets called which is AbstractVecOrMat.

Good to specialise on OneHot, but what happens if by some chance you collect it?

This looks related to your other suggestions. We should constraint on AbstractVector{<:Integer} and AbstractVecOrMat{<:Bool} separately? In the latter case, it should be a matmul (mathematically) or m.weight[:, idx]?

Would something like this make more sense for AbstractArray?

(m::Embedding)(x::AbstractArray{<:Nil}) = fill(nil, size(m.weight, 1), size(x)...)

Personally, I would keep the Nil overrides to a minimum and only do the vector case. Let the rest go through the normal routing for all arrays.

darsnack

We'll need some tests, but this looks good.

src/layers/basic.jl

DhairyaLGandhi · 2021-07-09T12:53:15Z

Needs tests

CarloLucibello · 2021-07-09T13:08:32Z

ref. #1516

CarloLucibello · 2021-07-09T16:09:51Z

I rebase the ref. #1516 on master since there were some conflicts. So I guess you should do git rebase -i origin/cl/embed here

CarloLucibello · 2021-07-09T16:13:17Z

Or maybe it is just easier if I merge #1516 and then this targets master

manikyabard · 2021-07-09T16:33:12Z

Or maybe it is just easier if I merge #1516 and then this targets master

Yeah sure that works

src/layers/basic.jl

mcabbott · 2021-07-10T22:17:17Z

Maybe labels other than integers starting at 1 deserve some thought. Flux's one-hot encoding lets you specify the set of labels, which are not stored, onecold returns integers starting at 1.

julia> Flux.onehotbatch(collect("hello"), 'a':'z') 
26×5 Flux.OneHotArray{26,2,Vector{UInt32}}:
 0  0  0  0  0
 0  0  0  0  0
 0  0  0  0  0
 0  0  0  0  0
 0  1  0  0  0
 0  0  0  0  0
 0  0  0  0  0
 1  0  0  0  0
 0  0  0  0  0
 0  0  0  0  0
 ⋮           

julia> Flux.onecold(ans) # does not remember
5-element Vector{Int64}:
  8
  5
 12
 12
 15

Should this do something similar? Then m = Embedding(0:9 => 5) would have to store something, so that m(0) can give the first vector. Or is this unnecessary complication? Better handled by composing onehot and Embedding?

darsnack · 2021-07-10T23:13:47Z

Mapping to an indices or one-hot space is a standard data transformation for using a layer like Embedding. So I would say it's not a necessary feature for the layer. And we would always need to support 1:N which could get tricky if the "labels" are also an integer range.

If we did want to include it, I would store a "transform" function within the layer.

PS: we do support onecold(x, labels)

mcabbott · 2021-07-11T00:40:58Z

Ok, indeed this should probably be written something like Chain(onehot('a':'z'), Embedding(26 => 5)) rather than duplicating this.

darsnack · 2021-07-14T13:11:21Z

Should this be re-targeted for master?

CarloLucibello · 2021-07-14T13:13:33Z

yes. Probably filing a new PR is easier

manikyabard · 2021-07-14T13:18:12Z

I changed the base branch to master.

darsnack · 2021-07-14T13:20:24Z

Looks like you need a rebase too

mcabbott · 2021-07-14T13:24:40Z

src/layers/basic.jl

+
+Embedding(dims::Pair{<:Integer, <:Integer}; init = randn32) = Embedding(init(last(dims), first(dims)))
+
+(m::Embedding)(x::Union{OneHotVector, OneHotMatrix}) = m.weight * x # equivalent to m.weight[:,onecold(x)]


This is where I think treating collect(onehot(...)) the same as onehot() would be nice. They are both arrays of booleans, so something like this may work:

Suggested change

(m::Embedding)(x::Union{OneHotVector, OneHotMatrix}) = m.weight * x # equivalent to m.weight[:,onecold(x)]

(m::Embedding)(x::AbstractVecOrMat{Bool}) = m.weight * x # handles OneHotVector, OneHotMatrix

(Or it may require a bit more thought which is more specific.)

I guess it comes down to whether the Vector{Bool} case should be a matmul or index operation.

Those should agree when it's actually one-hot (if not OneHot) up to a dropdims:

julia> [1 2 3; 4 5 6] * [false, true, false] 2-element Vector{Int64}: 2 5 julia> [1 2 3; 4 5 6][:, [false, true, false]] 2×1 Matrix{Int64}: 2 5

But not when it's not:

julia> [1 2 3; 4 5 6] * [false, true, true] 2-element Vector{Int64}: 5 11 julia> [1 2 3; 4 5 6][:, [false, true, true]] 2×2 Matrix{Int64}: 2 3 5 6

Should the latter case be an error?

Maybe indexing would be faster although I'm not sure we care about optimising this. Logical indexing can't give you aliasing problems, if I'm thinking correctly.

Another possible strategy would just be to it an error on AbstractArray{Bool}. Only permit OneHot, or counting numbers.

I was mostly speaking from an optimization stand point. If we don't care about this case being fast (it is rare), then the suggested code works for me.

Would it be better to change the current onehot paths from

function (m::Embedding)(x::Union{OneHotLikeVector{T,L}, OneHotLikeMatrix{T,L,I}}) where {T,L,I} size(m.weight, 2) == L || throw(DimensionMismatch("Matrix column must correspond with OneHot size: $(size(m.weight, 2)) != $L")) return m(onecold(x)) end

back to a matmul?

Yes definitely, the current implementation is just code duplication.

mcabbott · 2021-07-14T13:25:06Z

src/layers/basic.jl

+(m::Embedding)(x::AbstractVector) = NNlib.gather(m.weight, x)
+(m::Embedding)(x::AbstractArray) = reshape(m(vec(x)), :, size(x)...)


I would also restrict these to integers

Suggested change

(m::Embedding)(x::AbstractVector) = NNlib.gather(m.weight, x)

(m::Embedding)(x::AbstractArray) = reshape(m(vec(x)), :, size(x)...)

(m::Embedding)(x::AbstractVector{<:Integer}) = NNlib.gather(m.weight, x)

(m::Embedding)(x::AbstractArray{<:Integer}) = reshape(m(vec(x)), :, size(x)...)

Maybe only for AbstractVector so that Nil-arrays can go through the reshape?

Oh I wasn't thinking about Nil, good point. Do you want Array{Nil} to behave like Array{Int} or like Array{Bool}?

One other weird feature is that if you make a OneHotTensor, like a boolean 3-array, this will reshape it to a vector. Maybe that's too weird to worry about. Or maybe it should reshape AbstractArray{Bool} not with vec but with m(reshape(size(x,1), :))

Yes, I think you are correct. Reshaping should be different when the first dimension encodes one-hot information. This definitely presents a problem for Nil, because the Nil-array is agnostic of whether the original input is one-hot or indices. In other words, the output size is input type dependent for this layer.

In that case, is the only way for getting the correct output from outputsize just to ignore the first dimension of the one-hot input while passing in the inputsizes argument?

test/layers/basic.jl

src/layers/basic.jl

test/cuda/layers.jl

darsnack · 2022-01-27T21:59:23Z

@manikyabard are you still up for rebasing this and moving forward? We ought to close up this loose end.

manikyabard · 2022-01-28T09:36:20Z

@manikyabard are you still up for rebasing this and moving forward? We ought to close up this loose end.

Yeah I can continue working on this, although I am not sure about the approach we should take for outputsize. Maybe we can discuss this further in the next community call.

darsnack · 2022-01-28T13:19:58Z

Maybe we can discuss this further in the next community call.

Yeah, sounds good!

darsnack · 2022-02-01T18:22:18Z

Summarizing what was discussed on the call:

the issue is that the paths for AbstractVector{<:Integer} that do getindex calls don't make sense since Vector{Nil} is not a vector of indices
Michael's PR to make Nil a subtype of Real can help avoid the cases where Nil should only be hitting the AbstractArray{<:Real} paths
it still won't solve the issue when outputsize is used for a model utilizing the AbstractVector{<:Integer} path
- this will need an outputsize override rule for Embedding + AbstractVector{Nil}
- (this wasn't brought up during call but I just thought of it) a better solution would be to define NNlib.gather for Nil which will cover all indexing cases beyond just Embedding

manikyabard · 2022-02-01T19:24:35Z

(this wasn't brought up during call but I just thought of it) a better solution would be to define NNlib.gather for Nil which will cover all indexing cases beyond just Embedding

You mean something like this?

NNlib.gather!(dst::AbstractArray, ::AbstractArray, ::AbstractArray{<:Nil}) = fill(nil, size(dst)...)
(m::Embedding)(x::AbstractVector{<:Nil}) = NNlib.gather(m.weight, x)

darsnack · 2022-02-01T19:31:05Z

That looks right but you won't need to special case for Embedding anymore. It should go through the NNlib.gather rule automatically.

CarloLucibello · 2022-02-13T16:55:43Z

this needs a rebase in master

codecov-commenter · 2022-02-13T17:36:54Z

Codecov Report

Merging #1656 (de43bf5) into master (3cc9067) will decrease coverage by 0.07%.
The diff coverage is 60.00%.

@@            Coverage Diff             @@
##           master    #1656      +/-   ##
==========================================
- Coverage   84.58%   84.51%   -0.08%     
==========================================
  Files          21       21              
  Lines        1486     1485       -1     
==========================================
- Hits         1257     1255       -2     
- Misses        229      230       +1

Impacted Files	Coverage Δ
src/onehot.jl	`95.29% <ø> (-0.06%)`	⬇️
src/outputsize.jl	`82.05% <0.00%> (-2.16%)`	⬇️
src/layers/basic.jl	`80.99% <75.00%> (-0.16%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3cc9067...de43bf5. Read the comment docs.

darsnack · 2022-02-14T23:40:56Z

Can you add a test for outputsize of Embedding?

CarloLucibello · 2022-02-13T17:34:15Z

src/layers/basic.jl

  weight::W
 end

 @functor Embedding

-Embedding(in::Integer, out::Integer; init = randn32) = Embedding(init(out, in))
+Embedding(dims::Pair{<:Integer, <:Integer}; init = randn32) = Embedding(init(last(dims), first(dims)))


the old constructor should be deprecated

CarloLucibello · 2022-02-13T17:36:13Z

src/layers/basic.jl

-
-julia> vocab_idxs = [1, 722, 53, 220, 3]
+```jldoctest
+julia> m = Embedding(reshape(-6:45, 2, 26) .+ 0.01f0)


The old example was much clearer.
This constructor (Embed(weight)) is not even part of the docstring, we should add it

Indeed on the constructor.

The virtue of this example is that it doesn't have random numbers, so it can be a doctest. My hope is that onehotbatch("foo", 'a':'z') might connect with 26 well enough to be easy to follow. Maybe it can be made clearer somehow?

CarloLucibello · 2022-02-13T17:39:06Z

test/cuda/layers.jl

-gpu_gradtest("Embedding OneHotMatrix index", embedding,  OneHotMatrix([1,2,3], 5), 5, 2)
-gpu_gradtest("Embedding OneHotMatrix repeated indices", embedding, OneHotMatrix([1,2,2], 5), 5, 2)
+gpu_gradtest("Embedding", embedding, [1,3,5], 5 => 2)
+gpu_gradtest("Embedding repeated indices", embedding, rand(1:50, 10^6), 50 => 2)


no need to test such large arrays, CI takes already a lot of time, previous test was fine

10^6 seems quite big, but maybe there's a some value to bigger than 3, in case e.g. 3 is never parallelised. No idea if there's a particular cutoff.

I have updated it to be 10^3. Would this also add a lot of time to CI?

CarloLucibello · 2022-02-13T17:39:16Z

test/cuda/layers.jl

+gpu_gradtest("Embedding 2d index", embedding, [1 2; 3 4], 5 => 2)
+gpu_gradtest("Embedding OneHotVec index", embedding, OneHotVector(1, 5), 5 => 2)
+gpu_gradtest("Embedding OneHotMatrix index", embedding,  OneHotMatrix([1,2,3], 5), 5 => 2)
+gpu_gradtest("Embedding OneHotMatrix repeated indices", embedding, OneHotMatrix(rand(1:50, 10^6), 50), 50 => 2)


same as above

src/layers/basic.jl

mcabbott · 2022-02-15T04:26:05Z

src/outputsize.jl

@@ -168,3 +168,5 @@ for (fn, Dims) in ((:conv, DenseConvDims), (:depthwiseconv, DepthwiseConvDims))
    end
  end
 end
+
+NNlib.gather!(dst::AbstractArray, ::AbstractArray, ::AbstractArray{<:Nil}) = fill(nil, size(dst)...)


To understand what's going on, this is using Nil to stand in as an index, right? Maybe that warrants a comment. This is not otherwise handled:

julia> rand(3)[Flux.nil] ERROR: ArgumentError: invalid index: Flux.NilNumber.Nil() of type Flux.NilNumber.Nil Stacktrace: [1] to_index(i::Flux.NilNumber.Nil) @ Base ./indices.jl:300 [2] to_index(A::Vector{Float64}, i::Flux.NilNumber.Nil)

Also, is the fact that this doesn't mutate dst and returns something else going to bite us someday?

Probably, we can modify the non-mutating variation: NNlib.gather which is what Flux layers will end up calling.

src/layers/basic.jl

Co-authored-by: Michael Abbott <32575566+mcabbott@users.noreply.github.com> Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>

Updated Embedding constructor to use `=>` and added OneHotLikeVector and OneHotLikeMatrix consts.

Co-authored-by: Michael Abbott <32575566+mcabbott@users.noreply.github.com>

DhairyaLGandhi requested changes Jul 9, 2021

View reviewed changes

darsnack requested changes Jul 9, 2021

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

CarloLucibello force-pushed the cl/embed branch from 2398dfc to 2670684 Compare July 9, 2021 16:05

mcabbott reviewed Jul 9, 2021

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

src/layers/basic.jl Outdated Show resolved Hide resolved

src/layers/basic.jl Outdated Show resolved Hide resolved

mcabbott reviewed Jul 9, 2021

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

CarloLucibello force-pushed the cl/embed branch from 2670684 to 4d3944c Compare July 10, 2021 14:42

CarloLucibello force-pushed the cl/embed branch from 9553267 to 062fc09 Compare July 11, 2021 06:07

CarloLucibello mentioned this pull request Jul 11, 2021

add Embedding layer #1516

Merged

4 tasks

CarloLucibello force-pushed the cl/embed branch from eff6306 to dfb390d Compare July 12, 2021 17:54

manikyabard force-pushed the cl/embed branch from 4532ec6 to b7c588a Compare July 12, 2021 21:26

CarloLucibello force-pushed the cl/embed branch from 51c7ccf to a9618af Compare July 13, 2021 19:01

manikyabard changed the base branch from cl/embed to master July 14, 2021 13:17

mcabbott reviewed Jul 14, 2021

View reviewed changes

manikyabard force-pushed the cl/embed branch from b7c588a to 0eb7ed9 Compare July 14, 2021 13:46

CarloLucibello closed this Jul 19, 2021

CarloLucibello reopened this Jul 19, 2021

manikyabard force-pushed the cl/embed branch from 4c242f1 to de43bf5 Compare February 13, 2022 17:07

CarloLucibello requested changes Feb 15, 2022

View reviewed changes

mcabbott reviewed Feb 15, 2022

View reviewed changes

manikyabard and others added 9 commits March 1, 2022 17:21

Embedding special case for outputsize

cb3a4ca

Apply suggestions from code review

eb489c3

Co-authored-by: Michael Abbott <32575566+mcabbott@users.noreply.github.com> Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>

update Embedding constructor

f702260

Updated Embedding constructor to use `=>` and added OneHotLikeVector and OneHotLikeMatrix consts.

updated Embedding docstring

5ff8280

Co-authored-by: Michael Abbott <32575566+mcabbott@users.noreply.github.com>

updated and exported Embedding

73d7281

updated Embedding tests

6e1e66d

add outputsize special case for NNlib.gather

2d80696

Update src/layers/basic.jl

a2f0961

Co-authored-by: Michael Abbott <32575566+mcabbott@users.noreply.github.com>

updated tests and outputsize gather

ef13026

manikyabard force-pushed the cl/embed branch from f617aa6 to ef13026 Compare March 1, 2022 12:12

mcabbott added the enhancement label Mar 21, 2022

This was referenced Oct 15, 2022

Simplify Embedding #2084

Merged

Make outputsize work with Embedding #2088

Open


		Embedding(dims::Pair{<:Integer, <:Integer}; init = randn32) = Embedding(init(last(dims), first(dims)))

		(m::Embedding)(x::Union{OneHotVector, OneHotMatrix}) = m.weight * x # equivalent to m.weight[:,onecold(x)]

	(m::Embedding)(x::Union{OneHotVector, OneHotMatrix}) = m.weight * x # equivalent to m.weight[:,onecold(x)]
	(m::Embedding)(x::AbstractVecOrMat{Bool}) = m.weight * x # handles OneHotVector, OneHotMatrix

		(m::Embedding)(x::AbstractVector) = NNlib.gather(m.weight, x)
		(m::Embedding)(x::AbstractArray) = reshape(m(vec(x)), :, size(x)...)

update Embedding layer #1656

Are you sure you want to change the base?

update Embedding layer #1656

Conversation

manikyabard commented Jul 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DhairyaLGandhi Jul 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darsnack left a comment

Choose a reason for hiding this comment

DhairyaLGandhi commented Jul 9, 2021

CarloLucibello commented Jul 9, 2021

CarloLucibello commented Jul 9, 2021

CarloLucibello commented Jul 9, 2021

manikyabard commented Jul 9, 2021

mcabbott commented Jul 10, 2021 • edited Loading

darsnack commented Jul 10, 2021 • edited Loading

mcabbott commented Jul 11, 2021

darsnack commented Jul 14, 2021

CarloLucibello commented Jul 14, 2021

manikyabard commented Jul 14, 2021

darsnack commented Jul 14, 2021

mcabbott Jul 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcabbott Jul 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darsnack commented Jan 27, 2022

manikyabard commented Jan 28, 2022 • edited Loading

darsnack commented Jan 28, 2022

darsnack commented Feb 1, 2022

manikyabard commented Feb 1, 2022

darsnack commented Feb 1, 2022

CarloLucibello commented Feb 13, 2022

codecov-commenter commented Feb 13, 2022

Codecov Report

darsnack commented Feb 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcabbott Feb 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DhairyaLGandhi Jul 9, 2021 •

edited

Loading

mcabbott commented Jul 10, 2021 •

edited

Loading

darsnack commented Jul 10, 2021 •

edited

Loading

mcabbott Jul 14, 2021 •

edited

Loading

mcabbott Jul 14, 2021 •

edited

Loading

manikyabard commented Jan 28, 2022 •

edited

Loading

mcabbott Feb 15, 2022 •

edited

Loading