New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix wrong csum for non-first ipv4 fragments #13476
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
Upon first review, I was confused why you would avoid NATing for second and subsequent fragments since the L3 addresses still need to be translated. However upon closer read, this PR is avoiding port translation (and corresponding checksum calculation for L4) on second/subsequent fragments. Please update the commit message and PR description to reflect this, it will help avoid confusion in future.
Do you have a reproducer to regression-test this issue? It seems like something we would want tests in the codebase for.
Misc additional cosmetic feedback below.
Done
I was doing manual testing with a cluster with a udp_echo application. I don't know about regression-tests. do you have more info? |
df2b533
to
643923f
Compare
@liuyuan10 We have fragmentation tests under |
test-me-please |
I see. let me look into that. can we merge this PR first? |
Yep, once it passes the full CI. Feel free to ignore the Cilium-Ginkgo-GKE job failure for now, there is an infrastructure issue. When we get an update on that being fixed, we can re-trigger that job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find! I'm wondering why it was not caught by the testIPv4FragmentSupport
in test/k8sT/Services.go
? /cc @qmonnet
The 4.9 build is failing because of a new complexity issue:
|
In the tests, we check that fragments are effectively processed and sent to the backend by looking for updates on the counters in the CT map:
We don't check for valid checksums on the receiving end at the moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the revalidate_data()
are necessary, but the rest of the code looks good, thank you!
Before this commit, natting are blindly changing l4 ports to non-first ipv4 fragments, causing wrong csum. Also fixes the same issue during the rev natting for dsr. Signed-off-by: Yuan Liu <liuyuan@google.com>
Let's see if the complexity is still a concern first: |
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes!
Before this commit, natting are blindly changing l4 ports to non-first ipv4
fragments, causing wrong csum.
Also fixes the same issue during the rev natting for dsr.
Signed-off-by: Yuan Liu liuyuan@google.com