add sampled softmax loss #2042

joelgrus · 2018-11-12T17:22:08Z

this is largely just a port / cleanup of the corresponding calypso code. two important things:

(1) after talking with mattp, I removed all the tie-embeddings code paths. he said they were not necessary for a MVP language model, and they were making the code vastly more complicated. Possibly it makes sense to add them later, possibly it doesn't.

(2) I included the cython code for the fast sampler, but not the setup.py rules to build it (so it's in essence not there). I wasted my entire day friday trying to get the cython to compile (including but not limited to upgrading my operating system and my XCode and my XCode command line tools) and I essentially made no progress. According to the internet there are many other people with similar issues, and none of their proposed solutions worked for me.

I suspect it compiles fine on linux, so possibly the solution is to make a setup.py rule that only compiles it there (and then carefully imports it), but I didn't do that here.

brendan-ai2

Oof, getting the cython building sounds rough. I really wish Macs were better as a dev environment. :(

Left some initial comments. I'll try to give sampled_softmax_loss.py a closer pass tomorrow. Thanks for the PR, @joelgrus!

allennlp/modules/_fast_sampler.pyx

allennlp/tests/models/bidirectional_lm_test.py

allennlp/tests/modules/sampled_softmax_loss_test.py

brendan-ai2 · 2018-11-13T00:48:04Z

allennlp/tests/modules/sampled_softmax_loss_test.py

+from allennlp.modules.sampled_softmax_loss import _choice, SampledSoftmaxLoss
+
+
+class TestSampledSoftmax(AllenNlpTestCase):


s/TestSampledSoftmax/TestSampledSoftmaxLoss/ ?

allennlp/tests/fixtures/bidirectional_lm/experiment.jsonnet

brendan-ai2 · 2018-11-13T01:23:25Z

allennlp/modules/sampled_softmax_loss.py

+
+    NOTE num_words DOES NOT include padding id.
+    NOTE: In all cases except (tie_embeddings=True and use_character_inputs=False)
+        the weights are dimensioned as num_words and do not include an entry for the padding (0) id.


Is this correct in our current setup?

I believe so, the second comment sort of doesn't apply, since tie_embeddings isn't implemented, but I don't necessarily want to delete it in case we do implement it later

Sorry, I might have been unclear. I was thinking that since tie_embeddings=False for us that would mean we don't have an entry for padding. Which struck me as strange. But if that's expected, great!

brendan-ai2

LGTM. There is some incidental (IIUC) breakage and I left a few minor comments for you to address as you see fit, but after that ship it! Thanks again for the PR!

Breakage: http://build.allennlp.org/viewLog.html?buildId=7529&buildTypeId=AllenNLP_AllenNLPPullRequests&tab=buildLog&_focus=3239

brendan-ai2 · 2018-11-14T01:05:42Z

allennlp/modules/sampled_softmax_loss.py

+
+def _choice(num_words: int, num_samples: int) -> Tuple[np.ndarray, int]:
+    """
+    Calls np.random.choice(n_words, n_samples, replace=False),


I think this comment is stale. There does not appear to be any call to np.random.choice.

Ideally this comment would clarify what its sampling (indices up to num_words - 1 I think?) and what a "try" is. Bit hazy as-is.

brendan-ai2 · 2018-11-14T01:07:34Z

allennlp/modules/sampled_softmax_loss.py

+
+    NOTE num_words DOES NOT include padding id.
+    NOTE: In all cases except (tie_embeddings=True and use_character_inputs=False)
+        the weights are dimensioned as num_words and do not include an entry for the padding (0) id.


Sorry, I might have been unclear. I was thinking that since tie_embeddings=False for us that would mean we don't have an entry for padding. Which struck me as strange. But if that's expected, great!

brendan-ai2 · 2018-11-14T01:23:08Z

allennlp/tests/modules/sampled_softmax_loss_test.py

+
+class TestSampledSoftmaxLoss(AllenNlpTestCase):
+    def test_choice(self):
+        sample, num_samples = _choice(num_words=1000, num_samples=50)


Nit: num_samples should really be num_tries. Best I understand there will definitely be 50 samples, but may be more tries.

brendan-ai2 · 2018-11-14T01:24:42Z

allennlp/tests/modules/sampled_softmax_loss_test.py

+
+        # Should be really close
+        pct_error = (sampled_loss - full_loss) / full_loss
+        assert abs(pct_error) < 0.01


Great, thanks!

Just looking at this PR now, this test actually shouldn't pass because the SampledSoftmaxLoss and the _SoftmaxLoss have different parameters! I'm guessing the < 0.01 percent error is far too loose of a check, the values should be much closer.

To implement this test, it's necessary to set manually set the weights on one of the softmax instances to match the weights on the other after initializing.

Then we should observe:

in eval mode, the loss should be exactly the same for SampledSoftmax and Softmax (to near machine precision, e.g. absolute difference < 1e-6 for float32)

the loss for SampledSoftmax in eval mode is > SampledSoftmax loss in train mode

To check SampledSoftmax in train mode, I'd recommend stubbing out the sampler (choice function) with something deterministic that returns a list of known ids. Then set all of the bias parameters for the ids not in this list of something very small (-10000) so their contribution to the full softmax term is zero. Then, using the deterministic sampler the loss in train mode should be the same as loss in eval mode.

Right! Thanks for catching this, @matt-peters.

joelgrus added 30 commits September 17, 2018 15:04

first stab at contextual encoder wrappers

20afb82

contextual encoders

c340c22

remove sru encoder

f8ea353

pr comments

b4ee3ab

replace _ElmoCharacterEncoder with CharacterEncoder

cff9976

docs

bbb56e0

Merge branch 'master' into cew

a5b0124

sphinx stuff

9338772

address pr comments

c1703b6

address more PR comments

7e0ab04

make sphinx happy

af7686e

Merge branch 'master' into cew

5ca645f

iterate

3c18f42

make parameters required

6b6ab42

Merge branch 'master' into cew

fd7f583

this is still wip

6327846

wip

b171312

Merge branch 'master' into cew

52d185c

bidirectional-lm proof of concept

df4bf11

progress

9b1a3da

revert elmo

673a35e

revert elmo test

ed266b0

revert elmo token embedder

9caa6de

cnn_highway_encoder -> seq2vec

63fb745

remove contextual encoders

ec079d4

fix docs

07e9ee7

Merge branch 'master' into cew

343e0e7

remove print

26913d5

Merge branch 'master' into cew

7b530f7

address more feedback

05de666

joelgrus added 22 commits October 4, 2018 16:05

fix imports

1523ee7

unused import

dd7a6d4

Merge branch 'master' into cew

99d4a5e

Merge branch 'master' into cew

e4d3597

Merge branch 'cew' of https://github.com/joelgrus/allennlp into cew

d7b8b6a

progress

3cd6154

Merge branch 'master' into cew

37eaf04

use brendan's dataset reader

1c7faaa

fix merge conflict

d8b82ad

Merge remote-tracking branch 'upstream/master'

98e3689

Merge remote-tracking branch 'upstream/master'

3297e3b

Merge remote-tracking branch 'upstream/master'

2060b61

Merge remote-tracking branch 'upstream/master'

b5730c9

Merge remote-tracking branch 'upstream/master'

c0dbc72

Merge remote-tracking branch 'upstream/master'

5d6fd91

checkpoint

8136096

fun

49897fd

remove tie_embeddings code paths + fast_sampler

87358f9

fix tests

3ba6e0d

revert setup.py

9ffbb80

Merge remote-tracking branch 'upstream/master'

b19ffa6

Merge branch 'master' into sampled-softmax

19e13e9

joelgrus requested a review from brendan-ai2 November 12, 2018 17:23

brendan-ai2 reviewed Nov 13, 2018

View reviewed changes

address PR feedback

907ebff

brendan-ai2 approved these changes Nov 14, 2018

View reviewed changes

joelgrus added 2 commits November 15, 2018 11:08

address pr feedback

e181ad1

Merge branch 'master' into sampled-softmax

776733e

joelgrus merged commit 86da880 into allenai:master Nov 15, 2018

matt-peters mentioned this pull request Nov 15, 2018

add flaky decorator + increase tolerance #2060

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add sampled softmax loss #2042

add sampled softmax loss #2042

joelgrus commented Nov 12, 2018

brendan-ai2 left a comment •

edited

brendan-ai2 Nov 13, 2018

brendan-ai2 Nov 13, 2018

joelgrus Nov 13, 2018

brendan-ai2 Nov 14, 2018

brendan-ai2 left a comment

brendan-ai2 Nov 14, 2018

brendan-ai2 Nov 14, 2018

brendan-ai2 Nov 14, 2018

brendan-ai2 Nov 14, 2018 •

edited

brendan-ai2 Nov 14, 2018

matt-peters Nov 15, 2018 •

edited

brendan-ai2 Nov 15, 2018

		from allennlp.modules.sampled_softmax_loss import _choice, SampledSoftmaxLoss


		class TestSampledSoftmax(AllenNlpTestCase):

add sampled softmax loss #2042

add sampled softmax loss #2042

Conversation

joelgrus commented Nov 12, 2018

brendan-ai2 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brendan-ai2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brendan-ai2 Nov 14, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matt-peters Nov 15, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brendan-ai2 left a comment •

edited

brendan-ai2 Nov 14, 2018 •

edited

matt-peters Nov 15, 2018 •

edited