Inconsistent results of forward (training) and step (inference) #52

cycycycywu · 2022-06-29T02:46:01Z

Hi, I did a simple test to verify the difference between forward and step (mode="dense") on a single unidirectional S4 layer. Given a random sequence, there difference, the absolute error is around 1e-2 and the square error is around 1e-4. I suspect these results are wrong. My verification follows test_step() in //src/models/sequence/ss/kernel.py. I'd love to know if you have examples that clearly compares their difference. Thanks:)

cycycycywu · 2022-06-29T02:53:18Z

I suspect Cauchy kernel results in this precision issue.

albertfgu · 2022-06-29T06:20:27Z

Thanks for the report! To clarify, your test shows that there may be numerical issues, but the code should be correct assuming high precision? Has this resulted in downstream performance issues?

cycycycywu · 2022-06-29T07:23:20Z

Thanks for the report! To clarify, your test shows that there may be numerical issues, but the code should be correct assuming high precision? Has this resulted in downstream performance issues?

Thanks Albert. It is likely to be a numerical issue; but I am not sure whether it comes from fft/ifft or cauchy kernel, or both; will let you know when I got some updates. For downstream performance, I have no numbers yet, but this level of mismatch is likely to influence the last softmax. Did you use uni-s4 on some task and compare step and forward downstream performance?

albertfgu · 2022-06-29T07:41:44Z

Yes, we used the CNN mode (forward) for training and RNN mode (step) for inference in Sashimi. We generated sequences of over 100000 samples in step mode, so it has worked fine in practice. Perhaps the random inputs in your synthetic tests are an edge case.

cycycycywu · 2022-06-29T20:52:04Z

Yes, we used the CNN mode (forward) for training and RNN mode (step) for inference in Sashimi. We generated sequences of over 100000 samples in step mode, so it has worked fine in practice. Perhaps the random inputs in your synthetic tests are an edge case.

Thanks, Albert. For "it has worked fine in practice", is the test NLL in the paper given by forward or step? The difference of waveforms can be marginal from human evaluation between the two forward schemes.

albertfgu · 2022-06-29T23:13:24Z

The test NLL is calculated the same way training is done, with "teacher forcing", so it uses forward. Step only makes sense for autoregressive generation in which case it doesn't make sense to calculate NLL the usual way.

cycycycywu · 2022-07-05T22:17:55Z

I got the reason about the mismatch. It is a precision issue of the fast cauchy kernel. The symmetric trick you developed causes issues on float32, not float64. A fast way to verify it is:
Let A=z-w_half, B=z-w_half.conj(),
v_half/A+v_half.conj()/B ≠ v_half*B/A*B+v_half.conj()*A/A*B
the mismatch is about 1e-3 on float32. On float64, there is no difference at all.

albertfgu · 2022-07-06T00:33:24Z

Just to double check, are you talking about the pykeops kernel or the CUDA extension?

cycycycywu · 2022-07-06T00:55:24Z

CUDA extension. But, you can verify the mismatch using cauchy_mult_torch

albertfgu · 2022-07-06T03:23:19Z

Cool, good to know!

cycycycywu closed this as completed Jul 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent results of forward (training) and step (inference) #52

Inconsistent results of forward (training) and step (inference) #52

cycycycywu commented Jun 29, 2022

cycycycywu commented Jun 29, 2022 •

edited

albertfgu commented Jun 29, 2022

cycycycywu commented Jun 29, 2022

albertfgu commented Jun 29, 2022

cycycycywu commented Jun 29, 2022

albertfgu commented Jun 29, 2022

cycycycywu commented Jul 5, 2022 •

edited

albertfgu commented Jul 6, 2022

cycycycywu commented Jul 6, 2022 •

edited

albertfgu commented Jul 6, 2022

Inconsistent results of forward (training) and step (inference) #52

Inconsistent results of forward (training) and step (inference) #52

Comments

cycycycywu commented Jun 29, 2022

cycycycywu commented Jun 29, 2022 • edited

albertfgu commented Jun 29, 2022

cycycycywu commented Jun 29, 2022

albertfgu commented Jun 29, 2022

cycycycywu commented Jun 29, 2022

albertfgu commented Jun 29, 2022

cycycycywu commented Jul 5, 2022 • edited

albertfgu commented Jul 6, 2022

cycycycywu commented Jul 6, 2022 • edited

albertfgu commented Jul 6, 2022

cycycycywu commented Jun 29, 2022 •

edited

cycycycywu commented Jul 5, 2022 •

edited

cycycycywu commented Jul 6, 2022 •

edited