How to concatenate a ELMo vector with the corresponding context-independent token representation? #71

choym0098 · 2018-07-26T22:40:46Z

As described in the paper "deep contextualized word representations", before being fed into NLP tasks, elmo vectors, ELMo, are concatenated with context-independent token representations X like this: [X; ELMo]

But, how exactly are they concatenated? is it element-wise or we just combine the two vectors end-to-end?

I saw from the source codes that the lstm layers' outputs in the bilm are concatenated element-wise with tf.concat([lstm_output1, lstm_output2], axis=-1), so I feel like the concatenation between ELMo and X should be also element-wise.
But, if it is combined element-wise, then does X always have to follow the dimension of ELMo's internal lstm layers?
For example, i see that given 2 sentences and max_length of sentences being 10, vectors created by weight_layers are in shape of (2, 10, 32) with 32 being the concatenated unit of two lstm layers(forward and backward) whose dimension is 16(16x2 = 32). However, if we were to combine ELMO with X element-wise as introduced in the paper, X also needs to be in shape of (num_sentences, max_sentence_length, 32), which sort of limits the probability of X's embedding dimension size being different than 32.

As far as I understand options.json file correctly, "projection_dim" hyperparameter determines the internal lstm layer dimension.
Then, is they any way to manipulate the lstm layer dimension in the bilm (possibly through lstm { ... projection_dim = ? ... } in options.json file)? or am I missing something?
(I ask this question because when I tried to change projection_dim and ran, I came across the following error)

======================================================================
ERROR: test_weighted_layers (main.TestWeightedLayers)

Traceback (most recent call last):
File "elmo.py", line 136, in test_weighted_layers
self._check_weighted_layer(1.0, do_layer_norm=True, use_top_only=False)
File "elmo.py", line 36, in _check_weighted_layer
bilm_ops = model(character_ids)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 97, in call
max_batch_size=self._max_batch_size)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 286, in init
self._build()
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 290, in _build
self._build_word_char_embeddings()
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 415, in _build_word_char_embeddings
dtype=DTYPE)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1317, in get_variable
constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1079, in get_variable
constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 417, in get_variable
return custom_getter(**custom_getter_kwargs)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 275, in custom_getter
return getter(name, *args, **kwargs)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 394, in _true_getter
use_resource=use_resource, constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 786, in _get_single_variable
use_resource=use_resource)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2220, in variable
use_resource=use_resource)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2210, in
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2193, in default_variable_creator
constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 235, in init
constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 343, in _init_from_args
initial_value(), name="initial_value", dtype=dtype)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 770, in
shape.as_list(), dtype=dtype, partition_info=partition_info)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 246, in ret
varname_in_file, shape, weights.shape)
ValueError: Invalid shape initializing CNN_proj/W_proj, got [124, 8], expected (124, 16)

Ran 1 test in 0.099s

FAILED (errors=1)

I'm currently studying CNN so it was kinda hard for me to trace back through this error, but it looks like projection_dim depends on some other value.

To sum up, all I want to know is how to manipulate elmo's embedding dimension in order to match the size of ELMo with that of context-independent token representations.

Please correct or ask me if any of my questions is unclear or mistaken.
Thank you for any help you may provide!!

khashei · 2018-07-30T03:05:40Z

Thanks for asking this question. I had exactly the same question. I am pretty sure that the example given in the tensorflow implementation doesn't follow the paper recommendation to concatenate the input (X) to the embedding.

khashei · 2018-07-30T14:18:19Z

Looking more into the code, the context independent embedding is already accessible as a separate operation in the model like this:
embedding_op = BidirectionalLanguageModel(options_file, weight_file)(character_ids_placeholder)
context_independent_embedding = embedding_op["token_embeddings"]

It is already projected to match the lstm dimension. It includes token for the bos and eos that has to be dropped, like this:

tf.concat([embeddings_op["token_embeddings"][:,1:-1,:], weight_layers('input', embeddings_op, l2_coef)["weighted_op"]], axis=2)

choym0098 · 2018-07-30T17:08:19Z

@khashei So we literally just replace the existing word embedding(like w2v or glove) with elmo embedding (= weight_layer( ... )["weighted_op"])?

(Added)
In fact, I tried all of the three options over a sentiment analysis task(with 5 different emotions): element-wise concatenation, end-to-end concatenation, and simple replacement with elmo embedding, and only the replacement with elmo embedding outperformed the existing embedding(glove) over the training set (not over the dev set tho). So, I think you may be right!
However, tho my data was so small (300 for training, 60 for testing), so if one can clarify this, it would be great!

matt-peters · 2018-07-30T22:33:29Z

The code samples don't include any additional pretrained word embeddings such as GloVe.

In our paper, we used both GloVe and ELMo and concatenated them end-to-end. The ELMo representations have dimension 1024, GloVe has dimension 50, 100, 200 or 300, so it isn't possible to concat element wise without a projection layer somewhere. Instead of introducing additional parameters for the projection, we just concatenated end-to-end. When using the 300 dimensional GloVe vectors this gave a total embedding dimension of 1324 (=1024 + 300).

matt-peters closed this as completed Jul 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to concatenate a ELMo vector with the corresponding context-independent token representation? #71

How to concatenate a ELMo vector with the corresponding context-independent token representation? #71

choym0098 commented Jul 26, 2018 •

edited

Loading

khashei commented Jul 30, 2018 •

edited

Loading

khashei commented Jul 30, 2018 •

edited

Loading

choym0098 commented Jul 30, 2018 •

edited

Loading

matt-peters commented Jul 30, 2018

How to concatenate a ELMo vector with the corresponding context-independent token representation? #71

How to concatenate a ELMo vector with the corresponding context-independent token representation? #71

Comments

choym0098 commented Jul 26, 2018 • edited Loading

====================================================================== ERROR: test_weighted_layers (main.TestWeightedLayers)

khashei commented Jul 30, 2018 • edited Loading

khashei commented Jul 30, 2018 • edited Loading

choym0098 commented Jul 30, 2018 • edited Loading

matt-peters commented Jul 30, 2018

choym0098 commented Jul 26, 2018 •

edited

Loading

======================================================================
ERROR: test_weighted_layers (main.TestWeightedLayers)

khashei commented Jul 30, 2018 •

edited

Loading

khashei commented Jul 30, 2018 •

edited

Loading

choym0098 commented Jul 30, 2018 •

edited

Loading