Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EmbedLayer #1872

Closed
wants to merge 86 commits into from
Closed

EmbedLayer #1872

wants to merge 86 commits into from

Conversation

jeffdonahue
Copy link
Contributor

Based on #1486 (N-D blobs) and #1663 (parameter gradient accumulation). This adds EmbedLayer (should probably change the name to EmbeddingLayer for consistency with PoolingLayer etc.), which essentially learns a lookup table for integer inputs, useful for language modeling and such. Its computation is equivalent to an InnerProductLayer where the inputs are "one-hot" vectors, but instead of explicitly representing the one-hot vectors (which wastes lots of memory), this assumes the input itself is the indices of the "hot" index of those one-hot vectors (like the label inputs for the categorical losses). This should probably be replaced with SparseInnerProduct (#937) once that's merged, assuming that's faster -- this is a more lightweight change (or at least it will be once #1486 is merged) that continues the unfortunate trend of casting floats to ints as labels.

shelhamer and others added 9 commits February 20, 2015 11:21
set the right rpath for tools and examples respectively

thanks for the report @mees!
[build] fix dynamic linking of tools
… was overwritten with symlink created at build time and installed with install(DIRECTORY ...)
… systems).

This commit specifies Python2 with which cpp_lint.py works :-)
shelhamer and others added 14 commits March 3, 2015 22:27
Blobs are N-D arrays (for N not necessarily equals 4)
When setting the mean, assert that it is either one pixel or an array with
shape equal to the input data size.
(With layers whose backwards accumlate gradients), this effectively
decouples the computational batch from the SGD minibatch. Each
iteration accumulates gradients over iter_size batches, then parameters
are updated.
(double impl from NVIDIA dev docs; float impl included in CUDA as
"atomicAdd")
@jzhang533
Copy link

EmbedLayer has a blob stores the vocabulary_size X embedding_size embeddings, during Forwards/Backward, only invloved words' embeddings are used for computation, but all the embeddings (the whole blob) are updated during solving (Solver::ComputeUpdateValue, Blob::Update).

Is my understanding correct?

@jeffdonahue
Copy link
Contributor Author

@jzhang533 yes that's correct; it has the same behavior as other parameter layers (InnerProductLayer, ConvolutionLayer).

@jzhang533
Copy link

@jeffdonahue thanks for clarify, am trying to learn embeddings for a large vocabulary, will try to figure out a way to avoid needless computation during solving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.