Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

add sampled softmax loss #2042

Merged
merged 63 commits into from Nov 15, 2018
Merged

add sampled softmax loss #2042

merged 63 commits into from Nov 15, 2018

Conversation

joelgrus
Copy link
Contributor

this is largely just a port / cleanup of the corresponding calypso code. two important things:

(1) after talking with mattp, I removed all the tie-embeddings code paths. he said they were not necessary for a MVP language model, and they were making the code vastly more complicated. Possibly it makes sense to add them later, possibly it doesn't.

(2) I included the cython code for the fast sampler, but not the setup.py rules to build it (so it's in essence not there). I wasted my entire day friday trying to get the cython to compile (including but not limited to upgrading my operating system and my XCode and my XCode command line tools) and I essentially made no progress. According to the internet there are many other people with similar issues, and none of their proposed solutions worked for me.

I suspect it compiles fine on linux, so possibly the solution is to make a setup.py rule that only compiles it there (and then carefully imports it), but I didn't do that here.

Copy link
Contributor

@brendan-ai2 brendan-ai2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, getting the cython building sounds rough. I really wish Macs were better as a dev environment. :(

Left some initial comments. I'll try to give sampled_softmax_loss.py a closer pass tomorrow. Thanks for the PR, @joelgrus!

allennlp/modules/_fast_sampler.pyx Outdated Show resolved Hide resolved
allennlp/tests/models/bidirectional_lm_test.py Outdated Show resolved Hide resolved
from allennlp.modules.sampled_softmax_loss import _choice, SampledSoftmaxLoss


class TestSampledSoftmax(AllenNlpTestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/TestSampledSoftmax/TestSampledSoftmaxLoss/ ?


NOTE num_words DOES NOT include padding id.
NOTE: In all cases except (tie_embeddings=True and use_character_inputs=False)
the weights are dimensioned as num_words and do not include an entry for the padding (0) id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct in our current setup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so, the second comment sort of doesn't apply, since tie_embeddings isn't implemented, but I don't necessarily want to delete it in case we do implement it later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I might have been unclear. I was thinking that since tie_embeddings=False for us that would mean we don't have an entry for padding. Which struck me as strange. But if that's expected, great!

Copy link
Contributor

@brendan-ai2 brendan-ai2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. There is some incidental (IIUC) breakage and I left a few minor comments for you to address as you see fit, but after that ship it! Thanks again for the PR!

Breakage: http://build.allennlp.org/viewLog.html?buildId=7529&buildTypeId=AllenNLP_AllenNLPPullRequests&tab=buildLog&_focus=3239


def _choice(num_words: int, num_samples: int) -> Tuple[np.ndarray, int]:
"""
Calls np.random.choice(n_words, n_samples, replace=False),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is stale. There does not appear to be any call to np.random.choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this comment would clarify what its sampling (indices up to num_words - 1 I think?) and what a "try" is. Bit hazy as-is.


NOTE num_words DOES NOT include padding id.
NOTE: In all cases except (tie_embeddings=True and use_character_inputs=False)
the weights are dimensioned as num_words and do not include an entry for the padding (0) id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I might have been unclear. I was thinking that since tie_embeddings=False for us that would mean we don't have an entry for padding. Which struck me as strange. But if that's expected, great!


class TestSampledSoftmaxLoss(AllenNlpTestCase):
def test_choice(self):
sample, num_samples = _choice(num_words=1000, num_samples=50)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: num_samples should really be num_tries. Best I understand there will definitely be 50 samples, but may be more tries.


# Should be really close
pct_error = (sampled_loss - full_loss) / full_loss
assert abs(pct_error) < 0.01
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looking at this PR now, this test actually shouldn't pass because the SampledSoftmaxLoss and the _SoftmaxLoss have different parameters! I'm guessing the < 0.01 percent error is far too loose of a check, the values should be much closer.

To implement this test, it's necessary to set manually set the weights on one of the softmax instances to match the weights on the other after initializing.

Then we should observe:

  • in eval mode, the loss should be exactly the same for SampledSoftmax and Softmax (to near machine precision, e.g. absolute difference < 1e-6 for float32)
  • the loss for SampledSoftmax in eval mode is > SampledSoftmax loss in train mode

To check SampledSoftmax in train mode, I'd recommend stubbing out the sampler (choice function) with something deterministic that returns a list of known ids. Then set all of the bias parameters for the ids not in this list of something very small (-10000) so their contribution to the full softmax term is zero. Then, using the deterministic sampler the loss in train mode should be the same as loss in eval mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Thanks for catching this, @matt-peters.

@joelgrus joelgrus merged commit 86da880 into allenai:master Nov 15, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants