Embed layer #2032

Merged
merged 4 commits into from Aug 25, 2015

Conversation

Projects
None yet
4 participants
Contributor

jeffdonahue commented Mar 4, 2015

(Replaces #1872)

Based on #1977 (parameter gradient accumulation). This adds EmbedLayer (should probably change the name to EmbeddingLayer for consistency with PoolingLayer etc.), which essentially learns a lookup table for integer inputs, useful for language modeling and such. Its computation is equivalent to an InnerProductLayer with "one-hot" vector inputs, but instead of explicitly representing the one-hot vectors (which wastes lots of memory), this assumes the input itself is the indices of the "hot" index of those one-hot vectors (like the label inputs for the categorical losses). This should probably be replaced with SparseInnerProduct (#937) once that's merged, assuming that's faster -- this is a more lightweight change that continues the unfortunate trend of casting floats to ints as labels.

@shelhamer shelhamer added JL ES labels Mar 7, 2015

Contributor

jeffdonahue commented Jun 6, 2015

Rebased and ready for review. (Previously depended on gradient accumulation PR #1663.)

@shelhamer shelhamer added a commit that referenced this pull request Aug 25, 2015

@shelhamer shelhamer Merge pull request #2032 from jeffdonahue/embed-layer
Embed layer for lookup table of one hot encodings
80579b8

@shelhamer shelhamer merged commit 80579b8 into BVLC:master Aug 25, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

shelhamer referenced this pull request Aug 25, 2015

Merged

TileLayer #2083

jeffdonahue deleted the jeffdonahue:embed-layer branch Aug 26, 2015

beniz commented Dec 8, 2015 edited

I am very confused by this Embed layer. My hunch is that no one uses it outside of the RNN/LSTM branch. So I doubt I'll get any answer but let's try just in case.

I've tried to use it in a simple MLP like this:

Here is an example of a typical prototxt section:

layer {
  name: "embed"
  type: "Embed"
  bottom: "data"
  top: "embed_data"
  embed_param {
    input_dim: 5454
    num_output: 200
    weight_filler {
      type: "uniform"
      min: -0.08
      max: 0.08
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "ip0"
  type: "InnerProduct"
  bottom: "embed_data"
  top: "ip0"
  inner_product_param {
    num_output: 200
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  } 
}

Filling up the data elements with the vocabulary indices of words in a sentence, naturally I get an error from the data_transformer since datum channels are now of various sizes. Then I tried padding the remaining elements to 0, as I understand it is done in https://github.com/BVLC/caffe/pull/1873/files

But in this case, there's no memory advantage of doing this vs one-hot vectors since the input dim is the same. Thus I am confused :)

Needless to say, any help is highly appreciated at this point!

Understood, of course the padding is to fix the input sequence length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment