-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jool 4.1.7/Ubuntu 22.10 (kernel 5.19): dst_output() returned errcode 1 #400
Comments
The problem looks similar to #382, but when I cherry-picked 8980f79 and built Jool from the latest git source, I get some really strange
|
Well, for one thing, jool.mx is now a complete liability. I lost control on it about a year ago. Please use the Github version instead and update your bookmarks (if any). The latest Jool is 4.1.9, not 4.1.7. Looking at the compatibility table, 4.1.7 is indeed not necessarily expected to run properly on kernel 5.19. Might want to try updating first. |
@ydahhrk thank you. It might not have been clear from my last message, however I have already tried building the code from the latest git master + the cherry-picked commit to print packet details. It still misbehaves. |
Ah. Sorry about that. Well, reading #382 again, we stopped looking at those packet lengths because 8980f79 had a bug I later patched in 492e462, which was (presumably) causing them to break. (Although it IS very strange that they're the only packet fields that break.) But either way, I'd recommend sticking to the latest commit of the debug branch, rather than cherry-picking individual debug commits. But, assuming those packet lengths are still breaking... well, I haven't been able to find where this is happening, but it should be relatively easy to pinpoint by monitoring them. I just uploaded a commit (to the same debug branch) that checks the length in a bunch of locations. Try getting the error, then provide lines from
These lines are printed as soon as the code detects the packet lengths have been corrupted. Each errored packet should only print each line once, so it shouldn't flood the logs too badly. |
I have built the code in the
But now I'm getting
|
Now I have tried to update the NAT64 prefix to
|
Just letting you know: It seems I'm now reproducing the problem reliably. I'll need to compile a custom kernel, so the patch might take a day, but it's probably coming. |
Wait. What is this? Everything works when I remove it. |
Well... here's what I gathered: It seems the "Fair Queue" packet scheduler is giving up here. And that would be because Jool's IPv4 outgoing packet timestamp (
And the reason why Jool's IPv4 outgoing packet timestamp is very large is... because the incoming IPv6 packet timestamp is very large. Jool simply copies that number. So I'm very lost. Does any of this make sense to you? |
Thank you for your detailed investigation. I only had time to follow up on this issue today. I do confirm that when I use a different From what I see in your post and in that kernel commit, It doesn't make any sense to me to compare epoch timestamp to the kernel timer. That comparison will never succeed.
This sounds interesting. From where does Jool take the timestamp, or in which function does it copy it over (I may be bad at grepping, but I couldn't find it)? If it's provided by the kernel, then perhaps there's a bug in the kernel code somewhere. |
I suppose the IPv4 timestamp uses a different epoch than the IPv6 timestamp. I'll try printing a few, see if the pattern is consistent.
Here. Specifically, |
Theory doesn't stand:
|
This patch series implies Maybe simply clearing the timestamp ( |
actually |
Makes sense. Previous to 5.18, both of the kernel's forwarding functions employed So yes, it would seem the kernel is expecting Jool to do the same.
It seems to me the whole reason why the author of the patch states it's inefficient is because it breaks Then again, I uploaded the patch to branch issue400. Might want to try it out. It now works in my Debian 11, kernel 5.10, with FQ enabled. (Didn't before the patch.) I don't know if it compiles in the other kernels; will test that tomorrow. |
It works for me on RHEL 8.7 (Kernel 4.18.0-425.19.2.el8_7.x86_64), thanks! |
My interpretation is it is failure to clear tstamp at all that causes The ineffeciency was from setting tstamp = 0 when it doesn't need to be reset, ie. it is already on an egress path and contains a delivery_time type of timestamp, causing unnecessary recalculations. The patchset adds the necessary logic to avoid resetting in those cases, and So I don't agree that this "might not the best solution". |
I do confirm that the patched version from the |
Guess it's ready for release then. |
4.1.10 released; closing. |
I am running a Jool.mx Netfilter instance in a separate namespace. This configuration was working for several years, but suddenly it is unable to translate packets. I have tested self-built DKMS module as well as the prebuilt 4.1.7 Debian packages, still no luck.
The error I am seeing in
dmesg
when enablingjool global update logging-debug true
is:The full packet dump from
dmesg
is:The setup:
[IPv6, internal NAT64-enabled LAN] -> [IPv6, router main namespace, LAN port] -> [IPv6, namespace joolns] -> [jool] -> [IPv4, namespace joolns] -> [IPv4, router main namespace, NAT to the egress interface] -> [the Internet]
I see the IPv6 packets incoming in the
joolns
namespace, but the IPv4 packets never appear leaving the namespace (the in/out interface is just one).The init script:
(Outgoing NAT on the router is handled outside of this script.)
Interface config within the namespace:
Jool status:
I have tried setting
rp_filter
to 0, but I still have no luck and packets get dropped.uname -a
:Is there any known issue with these newer kernels? I have checked and there is no newer version of Jool available on Jool.mx.
Thank you.
The text was updated successfully, but these errors were encountered: