Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, getting the cython building sounds rough. I really wish Macs were better as a dev environment. :(
Left some initial comments. I'll try to give sampled_softmax_loss.py a closer pass tomorrow. Thanks for the PR, @joelgrus!
from allennlp.modules.sampled_softmax_loss import _choice, SampledSoftmaxLoss | ||
|
||
|
||
class TestSampledSoftmax(AllenNlpTestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/TestSampledSoftmax/TestSampledSoftmaxLoss/ ?
|
||
NOTE num_words DOES NOT include padding id. | ||
NOTE: In all cases except (tie_embeddings=True and use_character_inputs=False) | ||
the weights are dimensioned as num_words and do not include an entry for the padding (0) id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct in our current setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so, the second comment sort of doesn't apply, since tie_embeddings isn't implemented, but I don't necessarily want to delete it in case we do implement it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I might have been unclear. I was thinking that since tie_embeddings=False
for us that would mean we don't have an entry for padding. Which struck me as strange. But if that's expected, great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. There is some incidental (IIUC) breakage and I left a few minor comments for you to address as you see fit, but after that ship it! Thanks again for the PR!
|
||
def _choice(num_words: int, num_samples: int) -> Tuple[np.ndarray, int]: | ||
""" | ||
Calls np.random.choice(n_words, n_samples, replace=False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is stale. There does not appear to be any call to np.random.choice
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this comment would clarify what its sampling (indices up to num_words - 1 I think?) and what a "try" is. Bit hazy as-is.
|
||
NOTE num_words DOES NOT include padding id. | ||
NOTE: In all cases except (tie_embeddings=True and use_character_inputs=False) | ||
the weights are dimensioned as num_words and do not include an entry for the padding (0) id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I might have been unclear. I was thinking that since tie_embeddings=False
for us that would mean we don't have an entry for padding. Which struck me as strange. But if that's expected, great!
|
||
class TestSampledSoftmaxLoss(AllenNlpTestCase): | ||
def test_choice(self): | ||
sample, num_samples = _choice(num_words=1000, num_samples=50) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: num_samples
should really be num_tries
. Best I understand there will definitely be 50 samples, but may be more tries.
|
||
# Should be really close | ||
pct_error = (sampled_loss - full_loss) / full_loss | ||
assert abs(pct_error) < 0.01 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looking at this PR now, this test actually shouldn't pass because the SampledSoftmaxLoss
and the _SoftmaxLoss
have different parameters! I'm guessing the < 0.01 percent error is far too loose of a check, the values should be much closer.
To implement this test, it's necessary to set manually set the weights on one of the softmax instances to match the weights on the other after initializing.
Then we should observe:
- in
eval
mode, the loss should be exactly the same for SampledSoftmax and Softmax (to near machine precision, e.g. absolute difference < 1e-6 for float32) - the loss for SampledSoftmax in eval mode is > SampledSoftmax loss in train mode
To check SampledSoftmax in train mode, I'd recommend stubbing out the sampler (choice
function) with something deterministic that returns a list of known ids. Then set all of the bias parameters for the ids not in this list of something very small (-10000) so their contribution to the full softmax term is zero. Then, using the deterministic sampler the loss in train
mode should be the same as loss in eval
mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! Thanks for catching this, @matt-peters.
this is largely just a port / cleanup of the corresponding calypso code. two important things:
(1) after talking with mattp, I removed all the tie-embeddings code paths. he said they were not necessary for a MVP language model, and they were making the code vastly more complicated. Possibly it makes sense to add them later, possibly it doesn't.
(2) I included the cython code for the fast sampler, but not the setup.py rules to build it (so it's in essence not there). I wasted my entire day friday trying to get the cython to compile (including but not limited to upgrading my operating system and my XCode and my XCode command line tools) and I essentially made no progress. According to the internet there are many other people with similar issues, and none of their proposed solutions worked for me.
I suspect it compiles fine on linux, so possibly the solution is to make a setup.py rule that only compiles it there (and then carefully imports it), but I didn't do that here.