Order dependence in ElmoEmbedder? #1169

ngoodman · 2018-05-02T16:50:53Z

Apologies for opening an issue with what is likely a conceptual misunderstanding on my part!

I'm playing around with the pre-trained elmo embeddings (which are cool, thanks!) and noticing that the embedder seems to be stateful. That is, if i embed the same sentence twice, i don't get the same result:

import numpy as np
import scipy as sp
from allennlp.commands.elmo import ElmoEmbedder

ee = ElmoEmbedder()

v1 = np.squeeze(ee.embed_sentence("hello my name is mud .".split())[2,0,:])
v2 = np.squeeze(ee.embed_sentence("hello my name is mud .".split())[2,0,:])

print("embed twice test: ", sp.spatial.distance.cosine(v1,v2))

This gives a cosine dist of about 0.02 -- not huge, but problematic for the same sentence!

Where does the statefulness come from? Am i mis-using the embedder?

The text was updated successfully, but these errors were encountered:

schmmd · 2018-05-02T17:04:57Z

Hi @ngoodman that's expected behavior. ELMo has internal state and adapts to your domain over time. We've been thinking about how to make the output more consistent--as this is unexpected behavior for our users.

ngoodman · 2018-05-02T17:14:53Z

Oh, i see! So this is giving me the embedding in the context of the "corpus" of sentences i've asked it to embed so far?

Testing my understanding i tried the above test using two separate instances of ElmoEmbedder, and indeed got the same embedding. This seems to be an infeasible approach in practice for lots of sentences, because it takes a long time to make the ElmoEmbedder()... Any workaround?

It's certainly true that this was unexpected for me, but a few small changes would have clued me in. E.g. if the call had been ee. embed_next_sentence(..) and/or there were a note in the docs.

At any rate, thanks for the super fast response, and the nice open software!

schmmd · 2018-05-02T17:24:29Z

@ngoodman here's an attempt at improving the docs. I'll also see what I can do about a more comprehensive solution. #1169

matt-peters · 2018-05-02T21:49:48Z

@ngoodman - I added a longer description of the statefulness to this PR: #1167

The TLDR; is that the stateful aspect is a consequence of how the biLM was originally trained.
Except for the first few batch, the predictions won't vary that much from batch to batch assuming you are using the same ElmoEmbedder instance. The recommended usage is to load only one ElmoEmbedder instance (it holds the internal states) and send all batches through it. If you are concerned about the non-determinism then run a batch or two though it to "warmup the states", then start making predictions with your data

For example, modifying your code to run the same sentence multiple times, the embeddings are constant after the first batch:

distances = []
v1 = np.squeeze(ee.embed_sentence("hello my name is mud .".split())[2,0,:])
for k in range(5):
    v2 = np.squeeze(ee.embed_sentence("hello my name is mud .".split())[2,0,:])
    distances.append(sp.spatial.distance.cosine(v1,v2))
    v1 = v2

print(distances)

Displays [0.02286398410797119, 3.7550926208496094e-06, 0.0, 5.960464477539063e-08, 0.0].

zpaines · 2018-10-24T22:54:02Z

@matt-peters
Is the statefulness simply a consequence of calling _get_initial_states in encoder_base.py#sort_and_run_forward?

These states simply represent the memory and output for each timestep in the batch, correct? What does it mean that they "adapt to the domain"? Is there a human understandable version of the information that they are storing, or is simply some weighted product of internal states?

zpaines · 2018-10-26T21:24:12Z

@schmmd is there a human understandable description of what this "context" describes? I think I understand what the LSTMs are doing, they essentially try to predict the next word in a given sentence given the previous (or following in the backwards case) words. But what's not clear to me is how this adapts to the given domain (beyond just predicting 'x' after 'y' if it ends to appear that way in past inputs).

schmmd mentioned this issue May 2, 2018

Add a comment about non-determinism #1170

Closed

schmmd closed this as completed May 2, 2018

kwhumphreys mentioned this issue Oct 26, 2018

Can allennlp be used as a similarity comparison between two sentences? #1974

Closed

sacdallago mentioned this issue Oct 23, 2020

ELMo has a cold start problem sacdallago/bio_embeddings#76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Order dependence in ElmoEmbedder? #1169

Order dependence in ElmoEmbedder? #1169

ngoodman commented May 2, 2018

schmmd commented May 2, 2018

ngoodman commented May 2, 2018

schmmd commented May 2, 2018

matt-peters commented May 2, 2018

zpaines commented Oct 24, 2018

zpaines commented Oct 26, 2018

Order dependence in ElmoEmbedder? #1169

Order dependence in ElmoEmbedder? #1169

Comments

ngoodman commented May 2, 2018

schmmd commented May 2, 2018

ngoodman commented May 2, 2018

schmmd commented May 2, 2018

matt-peters commented May 2, 2018

zpaines commented Oct 24, 2018

zpaines commented Oct 26, 2018