Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Embedding layer #1516

Merged
merged 14 commits into from
Jul 13, 2021
Merged

add Embedding layer #1516

merged 14 commits into from
Jul 13, 2021

Conversation

CarloLucibello
Copy link
Member

@CarloLucibello CarloLucibello commented Feb 22, 2021

Basic implementation.

Maybe could be improved when FluxML/NNlib.jl#255 lands

PR Checklist

  • Tests are added
  • Entry in NEWS.md
  • Documentation, if applicable
  • Final review from @dhairyagandhi96 (for API changes).

src/layers/basic.jl Outdated Show resolved Hide resolved
src/layers/basic.jl Outdated Show resolved Hide resolved
@DhairyaLGandhi
Copy link
Member

Needs GPU tests

@DhairyaLGandhi
Copy link
Member

For such a straightforward routine, do we need a layer? I guess people are used to seeing them elsewhere, but why should that make it a necessary thing.

Might just make it a necessary evil. Oh well.

Copy link
Member

@darsnack darsnack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work

src/layers/basic.jl Outdated Show resolved Hide resolved
src/layers/basic.jl Outdated Show resolved Hide resolved
src/layers/basic.jl Outdated Show resolved Hide resolved
darsnack
darsnack previously approved these changes Feb 22, 2021
Copy link
Member

@darsnack darsnack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems simple, makes sense to add.

@DhairyaLGandhi
Copy link
Member

Yeah, I don't think we want to take the charge on mat here. That, the hard typing for the same logic, and changes to dense aside, it's looking neat.

@CarloLucibello
Copy link
Member Author

Since I need to define and use mat here for the Embedding layer it is needlessly painful to factor out the change to the dense layer, so please let me keep that change to dense here. We don't need to document and export mat

@darsnack
Copy link
Member

mat seems useful for writing N-dimensional behavior for layers. I am fine leaving as is and undocumented/unexported. If there is protest, let's just rename it to something else (it's basically a sibling to flatten). Could just have a kwarg in flatten.

@DhairyaLGandhi
Copy link
Member

please take care of the rest of the comments in the meantime?

I don't see how mat was something needed to be defined. The reshape api is pretty clever so as to be self documenting, but I can deal with that later.

@CarloLucibello
Copy link
Member Author

CarloLucibello commented Feb 22, 2021

please take care of the rest of the comments in the meantime?

don't see anything to take care of

I don't see how mat was something needed to be defined. The reshape api is pretty clever so as to be self documenting, but I can deal with that later.

it is defined because it is convenient, same as flatten, when a code pattern is used in multiple spots it is typically factorized into a function, this is how people write programs

@DhairyaLGandhi
Copy link
Member

hard typing for the same logic, and changes to dense aside

These ones.

@CarloLucibello
Copy link
Member Author

hard typing for the same logic,

I don't know what you refer to here

and changes to dense aside

I asked to keep the change here

test/layers/basic.jl Show resolved Hide resolved
test/layers/basic.jl Show resolved Hide resolved
src/layers/basic.jl Outdated Show resolved Hide resolved
@CarloLucibello
Copy link
Member Author

@mcabbott given the aliasing problem, I think that the thing to do here is to restrict the input to cpu only (i.e. error out on CuArrays) for the time being so that we can merge this safely. When we have the gather/scatter infrastructure in place in NNlib.jl and CUDA.jl we switch the backend here.

@jeremiedb
Copy link
Contributor

Is there a reason to favor the reliance on ScatterNNLib for the embedding operations rather than having a fix for the incorrect gradient when indices have repeats?
I understand that ScaterNNLib presents the benefit of doing the correct calculation, but it seems to leave in place a nasty silent bug. Having improper gradient if indexing an array with repeating indices looks like the elephant in room.

It's my understanding that JuliaLang/julia#31407 was to address it, but it looks stalled. Any idea how to raise awareness on this issue would be welcome!

@CarloLucibello
Copy link
Member Author

Yes, it would be good to fix FluxML/Zygote.jl#821.
Since the cpu gradient is working correctly, maybe it is enough to address JuliaGPU/CUDA.jl#89

@CarloLucibello
Copy link
Member Author

this can now be implemented on top of the scatter/gather functions implemented in NNlib

@darsnack
Copy link
Member

darsnack commented Jun 25, 2021

This layer needs to special case (like the normalization layers) in Flux.outputsize since indexing with Nil is non-sensical. We ran into this issue trying to use the layer in FastAI.jl.

Something like

(m::Embedding)(x::AbstractVector{<:Nil}) = fill(nil, size(m.weight, 1), length(x))
(m::Embedding)(x::AbstractArray{<:Nil}) = fill(nil, size(m.weight, 1), last(size(x)))

@CarloLucibello
Copy link
Member Author

gpu failure seems unrelated

@CarloLucibello
Copy link
Member Author

@DhairyaLGandhi can you dismiss your change request?

@bors
Copy link
Contributor

bors bot commented Jul 13, 2021

This PR was included in a batch that successfully built, but then failed to merge into master. It will not be retried.

Additional information:

{"message":"1 review requesting changes and 1 approving review by reviewers with write access.","documentation_url":"https://docs.github.com/articles/about-protected-branches"}

Copy link
Member

@DhairyaLGandhi DhairyaLGandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a couple thoughts.

There seem to be a lot of changes around code not intended to be changed in this PR (around the field names of Dense) which should go in a different PR.

src/layers/basic.jl Outdated Show resolved Hide resolved
src/layers/basic.jl Outdated Show resolved Hide resolved
src/layers/basic.jl Show resolved Hide resolved
test/cuda/layers.jl Outdated Show resolved Hide resolved
test/cuda/layers.jl Outdated Show resolved Hide resolved
test/cuda/layers.jl Outdated Show resolved Hide resolved
test/layers/basic.jl Outdated Show resolved Hide resolved
test/cuda/layers.jl Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
Project.toml Outdated Show resolved Hide resolved
@DhairyaLGandhi
Copy link
Member

Also needs GPU tests with one-hot and integers.

@mcabbott
Copy link
Member

If the plan is to make this perfect, then #1656 should be closed and all comments there addressed here. Alternatively, if the plan is to merge this as a stepping stone to #1656, as mentioned above, then it's a waste of time trying to make sure it doesn't have a hair out of place.

CarloLucibello and others added 13 commits July 13, 2021 20:22
Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>
Co-authored-by: Manikya <manikyabard@gmail.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
@DhairyaLGandhi
Copy link
Member

My comment pointed out adding tests for the features this PR wants to bring in.

#1656 aims to build on top of this, and has its own opinions about API, not implementation.

DhairyaLGandhi
DhairyaLGandhi previously approved these changes Jul 13, 2021
@CarloLucibello
Copy link
Member Author

need an approval again

Copy link
Member

@DhairyaLGandhi DhairyaLGandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

@bors
Copy link
Contributor

bors bot commented Jul 13, 2021

Build succeeded:

@bors bors bot merged commit 9931730 into master Jul 13, 2021
@CarloLucibello CarloLucibello deleted the cl/embed branch April 7, 2022 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants