Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent results of forward (training) and step (inference) #52

Closed
cycycycywu opened this issue Jun 29, 2022 · 10 comments
Closed

Inconsistent results of forward (training) and step (inference) #52

cycycycywu opened this issue Jun 29, 2022 · 10 comments

Comments

@cycycycywu
Copy link

Hi, I did a simple test to verify the difference between forward and step (mode="dense") on a single unidirectional S4 layer. Given a random sequence, there difference, the absolute error is around 1e-2 and the square error is around 1e-4. I suspect these results are wrong. My verification follows test_step() in //src/models/sequence/ss/kernel.py. I'd love to know if you have examples that clearly compares their difference. Thanks:)

@cycycycywu
Copy link
Author

cycycycywu commented Jun 29, 2022

I suspect Cauchy kernel results in this precision issue.

@albertfgu
Copy link
Contributor

Thanks for the report! To clarify, your test shows that there may be numerical issues, but the code should be correct assuming high precision? Has this resulted in downstream performance issues?

@cycycycywu
Copy link
Author

Thanks for the report! To clarify, your test shows that there may be numerical issues, but the code should be correct assuming high precision? Has this resulted in downstream performance issues?

Thanks Albert. It is likely to be a numerical issue; but I am not sure whether it comes from fft/ifft or cauchy kernel, or both; will let you know when I got some updates. For downstream performance, I have no numbers yet, but this level of mismatch is likely to influence the last softmax. Did you use uni-s4 on some task and compare step and forward downstream performance?

@albertfgu
Copy link
Contributor

Yes, we used the CNN mode (forward) for training and RNN mode (step) for inference in Sashimi. We generated sequences of over 100000 samples in step mode, so it has worked fine in practice. Perhaps the random inputs in your synthetic tests are an edge case.

@cycycycywu
Copy link
Author

Yes, we used the CNN mode (forward) for training and RNN mode (step) for inference in Sashimi. We generated sequences of over 100000 samples in step mode, so it has worked fine in practice. Perhaps the random inputs in your synthetic tests are an edge case.

Thanks, Albert. For "it has worked fine in practice", is the test NLL in the paper given by forward or step? The difference of waveforms can be marginal from human evaluation between the two forward schemes.

@albertfgu
Copy link
Contributor

The test NLL is calculated the same way training is done, with "teacher forcing", so it uses forward. Step only makes sense for autoregressive generation in which case it doesn't make sense to calculate NLL the usual way.

@cycycycywu
Copy link
Author

cycycycywu commented Jul 5, 2022

I got the reason about the mismatch. It is a precision issue of the fast cauchy kernel. The symmetric trick you developed causes issues on float32, not float64. A fast way to verify it is:
Let A=z-w_half, B=z-w_half.conj(),
v_half/A+v_half.conj()/B ≠ v_half*B/A*B+v_half.conj()*A/A*B
the mismatch is about 1e-3 on float32. On float64, there is no difference at all.

@albertfgu
Copy link
Contributor

Just to double check, are you talking about the pykeops kernel or the CUDA extension?

@cycycycywu
Copy link
Author

cycycycywu commented Jul 6, 2022

CUDA extension. But, you can verify the mismatch using cauchy_mult_torch

@albertfgu
Copy link
Contributor

Cool, good to know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants