-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EmbedLayer #1872
EmbedLayer #1872
Conversation
* A sample code was added. * `slice_dim` and `slice_point` attributes were explained.
[docs] brief explanation of SLICE layer's attributes
2c2248a
to
8a5e448
Compare
d618f70
to
811d0fa
Compare
811d0fa
to
697012c
Compare
Next: release candidater
fix Imagenet example path
set the right rpath for tools and examples respectively thanks for the report @mees!
[build] fix dynamic linking of tools
… was overwritten with symlink created at build time and installed with install(DIRECTORY ...)
… systems). This commit specifies Python2 with which cpp_lint.py works :-)
[cmake] fix install rpath for pycaffe
num/channnels/height/width indexing is valid.
from saved NetParameter Want to keep the param Blob shape the layer has set, and not necessarily adopt the one from the saved net (e.g. want to keep new 1D bias shape, rather than take the (1 x 1 x 1 x D) shape from a legacy net).
Blobs are N-D arrays (for N not necessarily equals 4)
When setting the mean, assert that it is either one pixel or an array with shape equal to the input data size.
Check shape of input mean
(With layers whose backwards accumlate gradients), this effectively decouples the computational batch from the SGD minibatch. Each iteration accumulates gradients over iter_size batches, then parameters are updated.
(double impl from NVIDIA dev docs; float impl included in CUDA as "atomicAdd")
EmbedLayer has a blob stores the vocabulary_size X embedding_size embeddings, during Forwards/Backward, only invloved words' embeddings are used for computation, but all the embeddings (the whole blob) are updated during solving (Solver::ComputeUpdateValue, Blob::Update). Is my understanding correct? |
@jzhang533 yes that's correct; it has the same behavior as other parameter layers ( |
@jeffdonahue thanks for clarify, am trying to learn embeddings for a large vocabulary, will try to figure out a way to avoid needless computation during solving. |
Based on #1486 (N-D blobs) and #1663 (parameter gradient accumulation). This adds
EmbedLayer
(should probably change the name toEmbeddingLayer
for consistency withPoolingLayer
etc.), which essentially learns a lookup table for integer inputs, useful for language modeling and such. Its computation is equivalent to anInnerProductLayer
where the inputs are "one-hot" vectors, but instead of explicitly representing the one-hot vectors (which wastes lots of memory), this assumes the input itself is the indices of the "hot" index of those one-hot vectors (like the label inputs for the categorical losses). This should probably be replaced with SparseInnerProduct (#937) once that's merged, assuming that's faster -- this is a more lightweight change (or at least it will be once #1486 is merged) that continues the unfortunate trend of casting floats to ints as labels.