New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get token embedding and the weighted vector in ELMo? #2245

Closed
ShomyLiu opened this Issue Dec 27, 2018 · 4 comments

Comments

Projects
None yet
2 participants
@ShomyLiu
Copy link
Contributor

ShomyLiu commented Dec 27, 2018

System (please complete the following information):

  • OS: Ubuntu 18.04
  • Python version: 3.6.1
  • AllenNLP version: v0.8.0
  • PyTorch version: 1.0

Question
There are two questions about the output of allennlp.modules.elmo
there are two outputs:

  • elmo_representations
  • mask

(1) How to get the token embedding $x_k$? since in practice, we need to combine the token embedding with the different outputs of LSTM layers.
(2) In the paper, there is a weighted linear combination of different representations of a token. And the output only contains vectors separately. So how to get the final weighted embedding?

Thanks

@nelson-liu

This comment has been minimized.

Copy link
Member

nelson-liu commented Dec 27, 2018

Hi, maybe this document helps? https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md#using-elmo-as-a-pytorch-module-to-train-a-new-model

If you just want per-token embeddings in an AllenNLP model, you can use the elmo TokenEmbedder (https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md#using-elmo-with-existing-allennlp-models). If you want per-token embeddings with the Elmo module, you can use:

from allennlp.modules.elmo import Elmo, batch_to_ids

options_file = "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options.json"
weight_file = "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5"

# Note the "1", since we want only 1 output representation for each token.
elmo = Elmo(options_file, weight_file, 1, dropout=0)

# use batch_to_ids to convert sentences to character ids
sentences = [['First', 'sentence', '.'], ['Another', '.']]
character_ids = batch_to_ids(sentences)

embeddings = elmo(character_ids)

len(embeddings["elmo_representations"]) == 1, and embeddings["elmo_representations"][0].shape == torch.Size([2, 3, 1024]), corresponding to (batch_size, max_seq_len, elmo_embedding_dim).

(1) The token embedding x_k is calculated with a learned weighted average (the ScalarMix class). see https://github.com/allenai/allennlp/blob/master/allennlp/modules/elmo.py#L171-L184

(2) The output of the Elmo class is the final weighted embedding of the 3 layers (character-convnet output, first layer lstm output, second layer lstm output).

Hope that helps! Closing this for now, but feel free to re-open if you have further questions.

@nelson-liu nelson-liu closed this Dec 27, 2018

@ShomyLiu

This comment has been minimized.

Copy link
Contributor Author

ShomyLiu commented Dec 28, 2018

@nelson-liu Thanks for your nice reply!

The parameter 1 in Elmo(options_file, weight_file, 1, dropout=0) means one output representation for each token according to your comments.

That's to say this representation is exact the final weighted embedding of the 3 layers (character-convnet output, first layer lstm output, second layer lstm output) mentioned in the paper (Eq (1)).

image

However, according to the doc of ELMo:

num_output_representations:  The number of ELMo representation layers to output.

So another question, what's the meaning when this parameter num_output_representations is greater than 1?

different weight for the linear combination about 3 layers (charcnn, lstm1, lstm2)?

Thanks very much!

@nelson-liu

This comment has been minimized.

Copy link
Member

nelson-liu commented Dec 28, 2018

yeah, exactly --- multiple output representations means that you're learning a different linear combination of the 3 layers. If you want to use elmo in two different places of your model, it's inefficient to run the forward pass of the LSTM twice just so you can re-weight the layers differently.

For instance, the BiattentiveClassificationNetwork uses Elmo in two parts of the model. To handle this in the code, we create an Elmo class with num_output_representations=2, then pop from the list to get the embeddings for use in each part of the model ( https://github.com/allenai/allennlp/blob/master/allennlp/models/biattentive_classification_network.py#L219-L227 ) .

does that clarify? I agree that the overloaded use of the word "layers" in the documentation is confusing, PRs to clarify would be great.

@ShomyLiu

This comment has been minimized.

Copy link
Contributor Author

ShomyLiu commented Dec 28, 2018

@nelson-liu Yes, very clear explanation, thank you very much.
I will try to clarify this usage in the document in PRs.

Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment