Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to concatenate a ELMo vector with the corresponding context-independent token representation? #71

Closed
choym0098 opened this issue Jul 26, 2018 · 4 comments

Comments

@choym0098
Copy link

choym0098 commented Jul 26, 2018

As described in the paper "deep contextualized word representations", before being fed into NLP tasks, elmo vectors, ELMo, are concatenated with context-independent token representations X like this: [X; ELMo]

But, how exactly are they concatenated? is it element-wise or we just combine the two vectors end-to-end?

I saw from the source codes that the lstm layers' outputs in the bilm are concatenated element-wise with tf.concat([lstm_output1, lstm_output2], axis=-1), so I feel like the concatenation between ELMo and X should be also element-wise.
But, if it is combined element-wise, then does X always have to follow the dimension of ELMo's internal lstm layers?
For example, i see that given 2 sentences and max_length of sentences being 10, vectors created by weight_layers are in shape of (2, 10, 32) with 32 being the concatenated unit of two lstm layers(forward and backward) whose dimension is 16(16x2 = 32). However, if we were to combine ELMO with X element-wise as introduced in the paper, X also needs to be in shape of (num_sentences, max_sentence_length, 32), which sort of limits the probability of X's embedding dimension size being different than 32.

As far as I understand options.json file correctly, "projection_dim" hyperparameter determines the internal lstm layer dimension.
Then, is they any way to manipulate the lstm layer dimension in the bilm (possibly through lstm { ... projection_dim = ? ... } in options.json file)? or am I missing something?
(I ask this question because when I tried to change projection_dim and ran, I came across the following error)

======================================================================
ERROR: test_weighted_layers (main.TestWeightedLayers)

Traceback (most recent call last):
File "elmo.py", line 136, in test_weighted_layers
self._check_weighted_layer(1.0, do_layer_norm=True, use_top_only=False)
File "elmo.py", line 36, in _check_weighted_layer
bilm_ops = model(character_ids)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 97, in call
max_batch_size=self._max_batch_size)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 286, in init
self._build()
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 290, in _build
self._build_word_char_embeddings()
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 415, in _build_word_char_embeddings
dtype=DTYPE)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1317, in get_variable
constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1079, in get_variable
constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 417, in get_variable
return custom_getter(**custom_getter_kwargs)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 275, in custom_getter
return getter(name, *args, **kwargs)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 394, in _true_getter
use_resource=use_resource, constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 786, in _get_single_variable
use_resource=use_resource)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2220, in variable
use_resource=use_resource)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2210, in
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2193, in default_variable_creator
constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 235, in init
constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 343, in _init_from_args
initial_value(), name="initial_value", dtype=dtype)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 770, in
shape.as_list(), dtype=dtype, partition_info=partition_info)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 246, in ret
varname_in_file, shape, weights.shape)
ValueError: Invalid shape initializing CNN_proj/W_proj, got [124, 8], expected (124, 16)


Ran 1 test in 0.099s

FAILED (errors=1)

I'm currently studying CNN so it was kinda hard for me to trace back through this error, but it looks like projection_dim depends on some other value.

To sum up, all I want to know is how to manipulate elmo's embedding dimension in order to match the size of ELMo with that of context-independent token representations.

Please correct or ask me if any of my questions is unclear or mistaken.
Thank you for any help you may provide!!

@khashei
Copy link

khashei commented Jul 30, 2018

Thanks for asking this question. I had exactly the same question. I am pretty sure that the example given in the tensorflow implementation doesn't follow the paper recommendation to concatenate the input (X) to the embedding.

@khashei
Copy link

khashei commented Jul 30, 2018

Looking more into the code, the context independent embedding is already accessible as a separate operation in the model like this:
embedding_op = BidirectionalLanguageModel(options_file, weight_file)(character_ids_placeholder)
context_independent_embedding = embedding_op["token_embeddings"]

It is already projected to match the lstm dimension. It includes token for the bos and eos that has to be dropped, like this:

tf.concat([embeddings_op["token_embeddings"][:,1:-1,:], weight_layers('input', embeddings_op, l2_coef)["weighted_op"]], axis=2)

@choym0098
Copy link
Author

choym0098 commented Jul 30, 2018

@khashei So we literally just replace the existing word embedding(like w2v or glove) with elmo embedding (= weight_layer( ... )["weighted_op"])?

(Added)
In fact, I tried all of the three options over a sentiment analysis task(with 5 different emotions): element-wise concatenation, end-to-end concatenation, and simple replacement with elmo embedding, and only the replacement with elmo embedding outperformed the existing embedding(glove) over the training set (not over the dev set tho). So, I think you may be right!
However, tho my data was so small (300 for training, 60 for testing), so if one can clarify this, it would be great!

@matt-peters
Copy link
Contributor

The code samples don't include any additional pretrained word embeddings such as GloVe.

In our paper, we used both GloVe and ELMo and concatenated them end-to-end. The ELMo representations have dimension 1024, GloVe has dimension 50, 100, 200 or 300, so it isn't possible to concat element wise without a projection layer somewhere. Instead of introducing additional parameters for the projection, we just concatenated end-to-end. When using the 300 dimensional GloVe vectors this gave a total embedding dimension of 1324 (=1024 + 300).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants