New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sampling chooses vocab index that does not exist with certain random seeds #866
Comments
Interesting, thanks for opening this issue. Can you disable hybridization and check for the presence of |
@fhieber I assume you mean something like this?
Yes, 7525 is actually sampled there. |
Very strange, what is the input shape of |
Which I think is correct and should mean that 7525 cannot be sampled btw, I did not find a minimal reproducible example for this yet, since this only happens for certain models and only 1 random seed out of 30. Also, this index only appears 1 time in |
That points very much to a bug in the implementation of You could consider trying MXNet 1.7 to see whether the operator behaves differently there: https://pypi.org/project/mxnet-cu102/ |
Sure, I'll try to isolate the problem a bit more (= strip away Sockeye) and then open an MXnet issue |
I am at a loss (not a benign cross-entropy one!) with this bug;
Here is a gist that I can run on a clean instance and that throws the error: https://gist.github.com/bricksdont/58dfa0964201c83961a30f23406baa5d I would be glad to know if someone can reproduce this error at all. |
Is there a way to store the random generator state of MXNet somehow? I guess you cannot reproduce in isolation because it depends on the previous inputs. If you would run multinomial on smaller random input data, 1000k times and always check for sampled index not being out of bounds, would you see the error? |
Finally found the problem I think:
MXnet also does not check if inputs are indeed distributions. If I renormalize
then the problem never occurs. Does that make sense as an explanation / is applying softmax in that spot a good solution? |
Thats a good find! This is actually also part of the documentation for But what I don't understand yet is why Edit: are you somehow skipping the softmax operation earlier (beam-size 1)? Edit2: or this is some VERY unlikely issue with numerical precision of |
Yes, it's a somewhat confusing implicit assumption of I haven't looked at the code (that I wrote :)), but it seems clear to me that you want to sample from the output distribution at the current step, and you should not be incorporating the summed scores so far. This might not actually be what I wrote. If that is the case, I don't know why I did it that way. |
Thanks @mjpost. The code is correct, as far as I can tell. Given that this seems to be such an unlikely event, I would think this is a numerical precision issue. |
"but it seems clear to me that you want to sample from the output distribution at the current step, and you should not be incorporating the summed scores so far" @mjpost The shape of the distributions as input for multinomial sampling is exactly "Edit: are you somehow skipping the softmax operation earlier (beam-size 1)?" @fhieber " I would think this is a numerical precision issue." After I re-normalize with softmax, some distributions still do not sum to 1.0 exactly, but |
@bricksdont Another thing to check—are the values sorted from highest to lowest? If the algorithm works the way I suspect, they would have to be sorted. Edit: the documentation says nothing about sorting, and the examples suggest it's not necessary. If so, then I'd be curious to see the cdf of the slice. Output distributions are typically pretty peaked, so it seems very unlikely the last item would get selected. |
@mjpost It's not the last item that gets selected; if I can still check if values are sorted of course. |
So, can you confirm that most distributions in your case significantly do not sum up to 1? Could you add a line that asserts on the sum with some epsilon? |
All distributions are within a small tolerance of
Sorry for jumping to conclusions about this! A different explanation is that re-normalizing the values (applying softmax in the edit: Problem re-appeared even with distributions I have re-normalized in |
Update: the behaviour is exactly the same with MXnet 1.7.0 |
I came across the same error in Sockeye-1.18.115. I've started to write a issue/ticket when I realize this one was about the same thing. I'll add what I was in the process of writing here in case it helps. Invalid token id during n-best translationHi, ObservationsWhile writing the bug report, I noticed that the target vocabulary has
Translation Command
Error Message
|
I still believe this is an MXnet bug, but don't know how to reduce the problem to the single RNG state and input that cause @KellenSunderland we could need some MXnet expertise here, if you are interested in tackling this. |
Closing for now as this applies to an older version of Sockeye. |
Running into the following error while sampling with certain seeds:
I am calling Sockeye with a script such as
Sockeye and Mxnet versions:
Details that may be relevant:
--seed
smxnet-cu102mkl==1.6.0.post0
.The vocabulary does not have this index:
I suspect that the sampling procedure somehow assumes 1-based indexing, whereas the vocabulary is 0-indexed. This would mean that there is a small chance that
max_vocab_id+1
is picked as the next token.Looking at the inference code, I am not sure yet why this happens.
The text was updated successfully, but these errors were encountered: