Switching frameworks might immediately solve several other issues #140

Open
ydahhrk opened this Issue Mar 13, 2015 · 36 comments

Projects

None yet

7 participants

@ydahhrk
Member
ydahhrk commented Mar 13, 2015 edited

(As you will see, I still haven't finished writing this. I would, however, like this in the public domain in case someone has something interesting to say. I will come back and analyse this further once I've finished a lot of post-release and planning paperwork I need to flush from my desk.)

Being in the middle of Netfilter, we break Netfilter's assumptions.

As far as I can tell, the people who preceded me decided it would make sense for Jool to be a Netfilter/iptables module, because it's similar to NAT, and NAT is an iptables module.

Personally, I feel like we've hit a wall when it comes to pushing Netfilter's versatility, and we should find a way to more elegantly merge Jool with the kernel.

We seem to have the following options:

  1. Become a network (pseudo-)device driver (ie. look like an interface).
  2. Move over to userspace (follow Tayga's steps).
  3. Become an iptables module.
  4. Remain a Netfilter module and find workarounds for our compliance issues.

Both 1) and 2) appear to solve all of the following current annoyances:

  1. Filtering. Because doc from iptables discourages filtering on mangle, I'm renuent to ask users to do so (Even though I don't know what's the problem with mangle filtering, other than it looking somewhat counter-intuitive).
    Because Jool would look like an interface (1) or some userspace daemon (2), packets would not skip either the INPUT or the FORWARD chain, and therefore they would be filtered normally.
    This was already fixed using namespaces.
  2. Host-Based Edge Translation. 1) and 2) will naturally let the kernel know a route towards the RFC6052 prefix/EAM records/etc, so packets will survive ingress filtering.
    Currently, Jool cannot post a packet for local reception because it switches the layer-3 protocol of the packet. Linux goes "This is an IPv6 packet, but it came from an IPv4-only interface. Dropping."
    This can maybe currently be forced to work, but I don't think it's going to be pretty.
    This was already implemented using namespaces.
  3. --minMTU6. We can't ask the kernel to fragment to a particular size; ip_fragment() infers the MTU from the cached route, which is not --minMTU6-sensitive (though whether that's not better than --minMTU6 is still to be looked upon - another TODO).
    I decided to start deferring fragmentation to the kernel because the code is tricky to get right by ourselves and atrocious to learn and maintain.
    If we left Netfilter we would be free from the kernel's fragment representation and would be able to do it a lot easier.
    (though it would be best if the kernel exported a fragmentation function which received MTU as an argument, but that's not going to happen, particularly for old kernels.)
  4. Perhaps we would get rid of the need for two separate IPv4 addresses in stateful NAT64 mode. Not sure on this one; I need to think this more thoroughly - TODO pool4 port ranges fix this.

Less important but still worth mentioning:

  1. blacklist would be able to stop returning loopback and other evil addresses since, being far from pre-routing, Jool would naturally stop seeing these packets.

In my opinion, 1) is the most elegant option. This is because Host-Based Edge Translation forces the other options to include a dummy interface (so processes have an IPv4 address to snap themselves to). If an interface is necessary no matter the configuration, it would be cleanest if Jool itself "were" the interface.

Perhaps by adopting 2) we would attract new users who would not trust their kernels to us. On the other hand, it looks like a lot more work (I do not know to what extent is Jool married to kernel-only routines). It's also bound to make Jool somewhat slower, since packets need to be copied whenever they get in or out of kernelspace.

Other than perhaps get rid of the pools, I think there's not much to be earned from 3). Though we will look more like NAT, we will probably face roughly the same limitations as a Netfilter module (or perhaps more, since I'm not sure how NF_HOOK_THRESH() would behave when called from an iptables module).

3 and 4 sound like the most performance-friendly options (since there's less routing and no copying), and I feel like their symmetry with the kernel's NATting would make it the most elegant solution from the eyes of the kernel devs (which is important if we ever want to push Jool into Linux). I'm just wild guessing, though. Perhaps they want to keep Netfilter free of any more hacks and they'd prefer some of the other options better - TODO ask them.

Due to lack of experience, we're currently not aware of any roadblocks we might run into. More planning is necessary - TODO.

Criticism (on this post) and more ideas welcomed.

@ydahhrk
Member
ydahhrk commented Mar 13, 2015

Fifth option:

  1. All (or several) of the above. Interface to any of the other frameworks via wrappers. Let the user decide which should be compiled.

Most work, more complicated for the user to install, maximum versatility.

@ydahhrk ydahhrk modified the milestone: 4.4.0, 4.0.0 Mar 13, 2015
@toreanderson
Contributor

Performance is an important concern. Make sure to go for a approach that lets you make use all the CPU cores in the machine. I'm wondering if today's framework might be the best performing since a packet only have to make one pass through the routing system. Going in and out of a virtual interface (either a device driver or connected to a user-space process) would probably mean the packet would be routed twice.

On the other hand, using DPDK in user-space is supposedly how you really push the envelope of how fast you can make a machine push packets. Maybe that would be something worth looking into, too.

When it comes to operational convenience (installation, setup, etc): Having it in the upstream kernel (i.e., the distro packages) is preferable to having it in user-space, which in turn is preferable to having it a stand-alone kernel module.

Finally I'd like to point out that if you solve the Host-Based Edge Translation use case, you've certainly solved the 464XLAT CLAT use case, too.

@mcr
mcr commented Mar 14, 2015

My concern is that it go upstream, that it be integrated with ip/nffilter,
and that problem of sharing IP address with the host will go away.
(I tried to use 192.168.2.1, and then use iptables to MASQUERADE that, but
that doesn't work)
I will blog my solution for getting a second IP using macvlan, but
there are a number of situations where a second IP won't be available.

While someone might want to put this into DPDK, the more interesting
situations will be getting it into NAT hardware.

@mcr
mcr commented Mar 14, 2015

Having a virtual interface as the way that to route traffic into jool
would be more clearer conceptually. I think that it more clearly deals
with MTU issues.
I don't know what Host Based Edge Translation means.
I don't think that anyone cares if it's in-kernel or not. One would have to
have root, and be able to hook stuff up anyway to get it to work...

I think that having an iptables module which is attached (-i jool0) to
a dummy interface which handles the MTU and routing would be the best.
Perhaps one could overload the ipv4 address list of the dummy interface
to provide the pool of v4. That might screw up the IPv4 routing table, so
maybe it's a bad idea.

@ydahhrk
Member
ydahhrk commented Mar 17, 2015

Going in and out of a virtual interface (either a device driver or connected to a user-space process) would probably mean the packet would be routed twice.

Correct.

On the other hand, using DPDK in user-space is supposedly how you really push the envelope of how fast you can make a machine push packets. Maybe that would be something worth looking into, too.

Thank you :)

My concern is that it go upstream, that it be integrated with ip/nffilter, and that problem of sharing IP address with the host will go away.

AFAIK there is very little difference between being a Netfilter/iptables module (ie. Jool now) than being integrated to Netfilter/iptables. It seems like the second address is a result of us doing something wrong, but I can't put my finger on what it is ATM.

It's something I've wanted to truly sit down and think about since a long time ago, but I've always had more pressing matters to attend.

I will blog my solution for getting a second IP using macvlan, but there are a number of situations where a second IP won't be available.

Thank you :)

Having a virtual interface as the way that to route traffic into jool would be more clearer conceptually.

Thank you :). I guess it'd be better to explain to operators if it feels more natural.

I don't know what Host Based Edge Translation means.

It's an SIIT within an end node, and it's similar to 464XLAT's "Wireless 3GPP" Network setup. Jool's 464XLAT tutorial complains about Jool not supporting it:

There are rather several ways to do this. Unfortunately, one of them (making n6 the CLAT) is rather embarrassingly not yet implemented by Jool.

The point, I gather, is to not depend on an SIIT service elsewhere when you need translation.

I don't think that anyone cares if it's in-kernel or not. One would have to have root, and be able to hook stuff up anyway to get it to work...

I think it's mostly a problem with stability. If an userspace service crashes, it dies alone. If a kernel module crashes, it compromises the entire system.

Of course, we aim to never crash, but we're humans.

@ydahhrk ydahhrk added a commit that referenced this issue Jul 8, 2015
@ydahhrk ydahhrk Fixing bug reported by Andrew Yourtchenko: When translating a packet …
…born in a separate network namespace, Jool would spit an incorrect layer-4 checksum. (Actually, the incoming packet already had an incorrect checksum.)

There were two problems:

1. The checksum was incorrect because it was unset. Jool wasn't handling CHECKSUM_PARTIAL differently; it started with an unset incorrect checksum, and ended with a set incorrect checksum.
2. Jool was intercepting packets in all namespaces. This triggered fake hairpinning symptoms, which in turn yielded misled packet drops. I fixed this poorly by making Jool only global-namespace sensitive. This solution is a very dirty patch, but I can't solve this better until #140 is fixed.
e726a87
@toreanderson
Contributor

I'm toying with the idea of integrating SIIT-DC into OpenStack. In case you're familiar with OpenStack, what I'm thinking of doing is to integrate stateless translator support (SIIT/SIIT-EAM) in the virtual routers created by the Neutron L3 Agent. However, since these virtual routers live inside their own dedicated Linux network namespace, I can't do it with Jool as far as I can tell. I can with TAYGA, but Jool would of course be preferred... :-)

I don't know if you've decided yet on how the new framework will work, but I'm hoping you'll take this use case into consideration. The requirement would simply be to be able to start a distinct instance of Jool inside each network namespace (i.e., one per virtual router). It would also be useful to be able to run a Jool instance in Stateful NAT64 mode and another Jool instance in stateless mode inside a single network namespace at the same time.

@ydahhrk
Member
ydahhrk commented Jul 22, 2015

Hmmm, no. I'm not familiar with OpenStack. Need me to read on the subject?

I don't know if you've decided yet on how the new framework will work

I'm waiting for the 3.4 code to be ready to start making decisions on this.

That said, as far as SIIT goes, my current thinking is that options 1 and 2 (network (pseudo-)device driver and userspace) are dominant strategies hands down, performance notwithstanding. These solutions would also solve your first requirement (what with being able to have any number of Jools per namespace).

NAT64 is more fuzzy. There's actually a sixth option:

  1. Drop the NAT64 code and make a really good tutorial on how to mix SIIT and NAT to pull NAT64 off.

This is probably best in the long run, and I'm thinking it would also address your problem. RFC6146 compliance would have to be tested all over again, though.

The requirement would simply be to be able to start a distinct instance of Jool inside each network namespace (i.e., one per virtual router).

Yes, this might prove important whether Jool switches frameworks or not.

Recognizing a packet's namespace shouldn't be too hard, so if you're in a hurry, I could assign this to my new coworker as his first assignment, and release this in Jool 3.4. It would most likely work completely different as it will in Jool 4.0, though.

It would also be useful to be able to run a Jool instance in Stateful NAT64 mode and another Jool instance in stateless mode inside a single network namespace at the same time.

Hmmm. The inability to have a SIIT and a NAT64 simultaneously is the Netlink socket's fault. This should probably be considered a bug.

@ydahhrk
Member
ydahhrk commented Jul 22, 2015

It would also be useful to be able to run a Jool instance in Stateful NAT64 mode and another Jool instance in stateless mode inside a single network namespace at the same time.

Which instance should intercept packets earlier?

@toreanderson
Contributor

I don't think you need to read up on OpenStack unless you feel like it. As long as it can work with network namespaces it should work with OpenStack. If I can spin up multiple instances that are connected to its own virtual network device (much like a TAYGA process is connected to its own TUN interface), that ought to do the trick. Then I could do something like this:

jool --create-instance jool123
ip netns create virtualrouter42
ip link set jool123 netns virtualrouter42

Or by creating the instance directly in the namespace:

ip netns create virtualrouter42
ip netns exec virtualrouter42 jool --create-instance jool123

With regards to dropping NAT64, don't do that - you can't simply mix SIIT + iptables NAPT44 to create a fully featured NAT64. For starters, you have 2^128 potential IPv6 clients accessing the NAT64, so you simply cannot map them into an IPv4 source address in a stateless manner.

If you're going down the virtual network device path the answer to your question on which instance should go first is easy - the routing table will decide what goes where. For example:

jool --create-instance jnat64 --mode nat64
jool --create-instance jsiit --mode siit
ip route add 64:ff9b::/96 dev jnat64
ip route add 2001:db8::/96 dev jsiit
jool --instance jnat64 --pool6 64:f9b::/96
jool --instance jsiit --pool6 2001:db8::/96
[....]

I'm not in a hurry. :-) BTW: I'm at the IETF93 meeting at the moment and I saw that there are two people from NIC Mexico attending too: Julio Cossio and Jorge Cano. Are they involved in Jool development? If so I'd like to locate them and say hi...

@JAORMX
Contributor
JAORMX commented Jul 22, 2015

The reason this was initially implemented as a kernel-space tool was mostly because of performance. We knew there existed a userland tool but at the time it didn't meet the performance requirements Dr. Nolazco might recall something of that. Anyway, seems to me like those performance issues would now be solved using DPDK, though, that would tie the project to x86. Your call though.

@ydahhrk
Member
ydahhrk commented Jul 22, 2015

With regards to dropping NAT64, don't do that - you can't simply mix SIIT + iptables NAPT44 to create a fully featured NAT64. For starters, you have 2^128 potential IPv6 clients accessing the NAT64, so you simply cannot map them into an IPv4 source address in a stateless manner.

Oh yeah, I had a NAT66 in mind without realizing it. How silly. Scratch that, then :)

I'm not in a hurry. :-) BTW: I'm at the IETF93 meeting at the moment and I saw that there are two people from NIC Mexico attending too: Julio Cossio and Jorge Cano. Are they involved in Jool development? If so I'd like to locate them and say hi...

Wanna jabber this?

The reason this was initially implemented as a kernel-space tool was mostly because of performance.

Thank you. Standards compliance takes precedence, though.

Not that I'd get angry if a way to fix the issues without having to switch frameworks appeared.

Anyway, seems to me like those performance issues would now be solved using DPDK, though, that would tie the project to x86. Your call though.

Well, they seem to be wanting to increase their supported architectures, so this annoyance might hopefully be temporary.

(On the other hand, DPDK's installation procedure looks bananas. Sounds like efforts towards #163 will be in vain.)

Hmmm.

@toreanderson
Contributor

I just wanted to add here a discussion I recently had with @fingon and @sbyx from the OpenWrt project about the possibility about adding support for Stateful NAT64. It would appear that they have some problems with the current framework that prevents them from implementing that using Jool in a sensible manner. I was thinking that when deciding on an approach for the new framework, you might want to reach out to them to ensure the chosen new approach resolves their issues.

At least I think it would have been really nice to have Jool in OpenWrt, which could then be used for 464XLAT (both PLAT/NAT64 and CLAT functions) as well as for MAP-T (probably).

< tore_> (that would actually have been a cool feature for folks like me, the ability to do nat64/dns64 on the internet-connected router instead of nat44 and keep the LAN v6only)
< tore_> oh well
< tore_> (is it possible to force v4 off even though isp gives dhcpv4 /32?)
< cyrusff> no, each router decides on its own if it likes to introduce a v4 prefix
< cyrusff> but you could tell indidivudal routers to not assign v4 prefixes on certain interfaces via config
< cyrusff> nat64 is interesting
< cyrusff> though i'm still in need of a useful kernel implementation
< cyrusff> tayga is meh since its userspace and thus slowish
< tore_> cyrusff: I'm very happy with jool for my nat64 needs
< tore_> just replaced a few tayga+iptables-based boxes
< cyrusff> tore_: problem with jool for me is that its "all or nothing"
< cyrusff> i can only have one instance and it catches all traffic
< cyrusff> since it hooks into netfilter
< cyrusff> ideally i need an interface which i can "route" to or a netfilter action which does the magic which i can apply selectively
< tore_> v3.4.0 will allow you to specify port ranges of pool4
< tore_> but yeah, they're thinking about changing the framework
< idli> oddly enough just yesterday someone requested dns64 + nat64 feature for homenet stuff from me :)
< idli> he considered ipv4 legacy kept outside home

@ydahhrk
Member
ydahhrk commented Sep 21, 2015

Question

I can easily see SIIT moving over to the interface model, but NAT64 is weird (from IPv4 it looks more like NAT than SIIT).

Since each interface is normally connected to different networks, won't it mean the user will have to define a separate address block for pool4?
I sort of see the user thinking about using private addresses [I don't anymore, unless they're NAT'd again], but it sounds like awkward/more configuration. I guess it won't be strange if users are used to this kind of thing, but are they?

@sbyx
sbyx commented Sep 22, 2015

Well my point is ideally I would be able to have one NAT64 instance per outgoing (IPv4) interface that i want to NAT too and I am by some means able to decide which incoming interfaces are NAT64'ed and to which outgoing interface.

@fingon
fingon commented Sep 22, 2015

As discussed on IRC, ideally NAT64 = NAT66 + SIIT + NAT44. BTW: 'move to userspace' option noted in original post kills performance, so I do not consider it an option.

@mcr
mcr commented Sep 22, 2015

As discussed on IRC, ideally NAT64 = NAT66 + SIIT + NAT44. BTW: 'move to

Can you explain each step? I don't see what the NAT66 step does.

] Never tell me the odds! | ipv6 mesh networks [
] Michael Richardson, Sandelman Software Works | network architect [
] mcr@sandelman.ca http://www.sandelman.ca/ | ruby on rails [

@sbyx
sbyx commented Sep 22, 2015
  1. NAT66 public IPv6 source address to some private IPv4-mapped IPv6 address (e.g. ::ffff:192.168.x.y)
  2. SIIT from IPv4-mapped IPv6 address to actual private IPv4 address
  3. Route to your v4-uplink (where it might get NAT44ed like regular outgoing IPv4 traffic)

Especially step 3 is important since it lets you use a shared NAT-state / port-space for the IPv4 NAT, you don't have to worry about distinct port-spaces for regular NAT44 and NAT64 and you don't have to worry about what happens if you don't have a "full" IPv4-address (i.e. MAP-E / MAP-T / LW4over6) or if the ISP does the NAT for you (DS-Lite).

@ydahhrk
Member
ydahhrk commented Sep 22, 2015

ideally NAT64 = NAT66 + SIIT + NAT44. BTW: 'move to userspace' option noted in original post kills performance, so I do not consider it an option.

So I guess it's not strange.

Good, I guess. :-)

This is the current direction of this development, then.

Doesn't all that routing also hamper performance, though?

  1. Packet appears. Route to NAT66 interface.
  2. Mask (Binding Information Base lookup included).
  3. Route from NAT66 interface to SIIT interface.
  4. Translate.
  5. Route from SIIT to NAT44.
  6. Translate (Binding Information Base lookup included).
  7. Route outside.
@fingon
fingon commented Sep 22, 2015

'route' is not probably the correct word here. Or well, it could be, but I would not design it that way.

You could even chain these 3 steps as single netfilter chain ('NAT64' = NAT66 + SIIT + MASQUERADE(ish) steps; in the other direction, there would be probably de-NAT, and then the SIIT+NAT66 steps), so there would be just one netfilter match (dst=/96 given to NAT64 for IPv4 mapped to IPv6) and then just bunch of matching rules without their own matching.

Correct design would be probably something slightly less efficient and more generic; I haven't really thought it through, but in general, even if you do lookup or two more in kernel, it is much cheaper than going to userland and back. Separating the steps would probably result in better modularity/configurability..

@toreanderson
Contributor
  1. NAT66 public IPv6 source address to some private IPv4-mapped IPv6 address (e.g. ::ffff:192.168.x.y)
  2. SIIT from IPv4-mapped IPv6 address to actual private IPv4 address
  3. Route to your v4-uplink (where it might get NAT44ed like regular outgoing IPv4 traffic)

I think you'll end up with kind of mongrel NAT64 this way. RFC6146 compliance will most likely go out the window.

One obvious example: A NAT64 is supposed to have a Binding Information Base for each protocol it supports. Each entry contains the address (X') and source port (x) of the IPv6 client , and the IPv4 transport address (T; «SNAT address») and transport port (t) it is mapped to. Thus: (X',x) <--> (T,t). However, in this stacked approach only step 1 is aware of the value of (X',x) and only step 3 is aware of the value of (T,t). So given the above approach, how and where can you query the BIB contents à la jool --bib?

tore@nat64gw1-osl2:~$ jool --bib -n | head -5
TCP:
[Dynamic] 192.0.2.240#1024 - 2001:db8:402:2:216:3eff:feba:3cd#48832
[Dynamic] 192.0.2.240#1029 - 2001:db8:202:2:216:3eff:febb:bd63#37221
[Dynamic] 192.0.2.240#1032 - 2001:db8:402:2:216:3eff:fe36:c893#50971
[Dynamic] 192.0.2.240#1034 - 2001:db8:202:a:18:59ff:fe3a:3953#52116
@fingon
fingon commented Sep 23, 2015

I do not like RFC6146 anyway - e.g. SIIT defines better fragment handling semantics. You could synthesize BIB-like information out of NAT66 + SIIT + NAT44 state if it was helpful (for user experience), but obviously implementation would not follow RFC6146 processing rules etc as they are defined in terms of BIB and not in terms of what actually needs to be done.

For the end user though, the result would not be different though; packets would come in via IPv6 and wind up IPv4 :-) (And fragmentation would actually work better, or at least, in case of NAT64, it is underspecified but SIIT defines relatively sane handling for it, including ICMP blackhole logic.)

@toreanderson
Contributor

Might be that a home user won't care, but the situation might be different for people who operate NAT64 that serve other environments like ISPs, data centres, or enterprise networks....I do care that my NAT64 gateways operate in a compliant manner that's easy to understand if I need to debug anything.

I think it you'd be hard pressed to correctly implement Address-Independent Filtering with the stacked approach too, and it wouldn't surprise me if it significantly complicated the implementation of ALGs (#114).

Also static BIB entries (i.e., port forwards for IPv4-to-IPv6 traffic) would be more complicated as you'd need to install them both in the NAPT44 and NAPT66 parts. Which reminds me, the NAPT44 and NAPT66 component would need to maintain their own separate session tables, so you'll end up keeping much more dynamic state than you really need to. In my experience keeping too much state is something that often becomes a bottleneck (I assume we've all seen the dreaded nf_conntrack: table full, dropping packet Linux kernel error on several occasions).

So I think you'd gain a lot of complexity while losing features and compliance by such an approach. While I don't know the OpenWrt internals, keeping NAT64 and NAPT44 as separate functions and assigning them non-overlapping IPv4:port-range pools to work with (or simply do not allow them to co-exist), does seem to me like the cleanest approach.

@sbyx
sbyx commented Sep 23, 2015

Cleaner? maybe. Practical? definitly not. As noted before we have to deal with a variety of possible IPv4 uplink scenarios, including ones where we don't have a full IPv4 portspace (map, lw4over6) or where the ISP does CGN (dslite). To support these correctly we do need to separate the v4 NAT and the v6 translation to some degree, unless you come up with a clean and easy solution to handle all the special cases.

Also OpenWrt is an underfunded open-source effort and not comparable to a data center or enterprise network.

@toreanderson
Contributor

Fully aware that a CPE is not the same as a data centre router, I'm just interested in not throwing the baby out with the bathwater, i.e., avoiding breaking compliance and existing use-cases in order to support a new one.

So as far as solutions go, here are some I can think of:

  • Implement NAPT44 and NAT64 as either/or. That way, you can assign all available source ports to whatever solution is being used.
  • Divide the available source ports between NAPT44 and NAT64. So assuming the device has access to all 2^16 ports (i.e., no MAP or lw4o6), give 2^15 to NAPT44 and 2^15 to NAT64.
    • Even if MAP or lw4o6 is used and ports are restricted, you can still do something similar. E.g., if the device has ports 10000-11999 available, give 10000-10999 to NAPT44 and 11000-11999 to NAT64. (With MAP and lw4o6 you have to restrict the source ports you use anyway, so it can't be that complicated to take the assigned port space and "divide by 2", can it?)

I'll be happy to try and take a look at implementing any of these if you think they would be acceptable. However note that I haven't hacked on OpenWrt before so don't expect a pull request this week. :-)

In the case of DS-Lite, I'm not sure I fully understand the problem. With DS-Lite you shouldn't be doing NAPT44 in the home gateway at all, so there shouldn't be any problems with overlapping port-spaces. NAT64 could just use the default outgoing IPv4 address of the router, just as it can with a native IPv4 uplink. (The fact that this address is an RFC1918 one that would be routed into the B4 and then undergoing NAPT44 at the ISP's AFTR doesn't really seem relevant here.)

@ydahhrk
Member
ydahhrk commented Sep 23, 2015

I do not like RFC6146 anyway - e.g. SIIT defines better fragment handling semantics.
(...)
And fragmentation would actually work better, or at least, in case of NAT64, it is underspecified but SIIT defines relatively sane handling for it, including ICMP blackhole logic.

SIIT fragment handling cannot be correctly applied to NAT64, though. Doesn't NAT66 + SIIT + NAT44 inherit RFC6146 fragment semantics?

SIIT doesn't mangle ports, so fragments aren't an issue - each fragment can be easily translated separately.

A packet needs ports to be NAT64'd, otherwise the translator can't find the relevant binding and session. Since only the first fragment carries ports, 6146 needs an unspecified level of defragmentation. In practice, it's the same NAT44 uses, really.

Cleaner? maybe. Practical? definitely not. As noted before we have to deal with a variety of possible IPv4 uplink scenarios, including ones where we don't have a full IPv4 portspace (map, lw4over6) or where the ISP does CGN (dslite). To support these correctly we do need to separate the v4 NAT and the v6 translation to some degree, unless you come up with a clean and easy solution to handle all the special cases.

Pardon my ignorance, but what's the problem with NAT64 then NAT44? As in you NAT64, ISP NAT44. It'd be like [NAT66 + SIIT + NAT44] + NAT44, no?

Perhaps wrongly, Jool currently allows translation into IPv4 private space, and also starting from version 3.4, it'll also be able to limit the port ranges it can use.

RFC6146 compliance will most likely go out the window.

I haven't tested Simultaneous Open of TCP Connections in NAT44, but from the fact it has no qualms with using the ephemeral port range by default, it seems it'll break too.


For the record, my boss prefers RFC compliance, though I'm fine with a compromising consensus.

@toreanderson
Contributor

Pardon my ignorance, but what's the problem with NAT64 then NAT44? As in you NAT64, ISP NAT44. It'd be like [NAT66 + SIIT + NAT44] + NAT44, no?

That's a very valid point, actually. You could daisy-chain an (RFC compliant) NAT64 and a standard NAPT44 - in the same (OpenWrt) device. Jool's NAT64 would simply use one or more private IPv4 addresses for its IPv4 transport address pool. After IPv6->IPv4 translation, the packet would be sent through iptables' MASQUERADE or SNAT targets for NAPT44 towards the public IPv4 source address.

That would prevent NAPT44 having to share the pool of public IPv4 addresses and ports with NAT64.

It would cause the same level of degradation in functionality as the NAPT66->SIIT->NAPT44 suggestion (i.e., causing double translation, redundant state, probably complicating the insertion of static or UPnP-provisioned port forwards for IPv4-initiated traffic destined for an IPv6 host, etc.), but I think it would work just as well. The overall solution wouldn't be RFC compliant, but Jool itself could continue to be.

Also, if NAPT44 isn't use, Jool could use the public addresses/ports as its IPv4 transport pool. I'm guessing that for most people, NAPT44 and NAT64 would be an either/or really. I can't think of a normal use case for running both simultaneously.

Perhaps wrongly, Jool currently allows translation into IPv4 private space

It's not wrong to use private IPv4 space as transport addresses. What RFC6052 section 3.1 forbids, is the use of IPv4-converted addresses that embeds RFC1918 space, and only for the WKP 64:ff9b::/96. So an IPv6 packet destined for 64:ff9b::192.168.1.1 is supposed to be dropped. A packet destined for 2001:db8:64::192.168.1.2 is on the other hand completely legitimate. It is also legitimate for an IPv6 packet sourced from 2001:db8::123#12345 can also be assigned a BIB entry mapping it to 192.168.1.1#23456.

@mcr
mcr commented Sep 23, 2015

My preference is that NAT64 becomes integrated into the current Linux NAT44
code, such that all of the NAT?4 code and datastructures are common, and it's
just how the pre-mangled packets are classified into conntracks is different.

I think that this is the cleanest way.
From an operational point of view, I'm happy if the IPv6 traffic appears to
disappear into a magic virtual interface, and appear from it. I'm actually
happiest if we do it that way, (supporting something like: "ip6tables -t nat
-o siit0 -s abcd::/xx") such that we can more clearly using routing
daemons/etc. to decide which traffic get into the NAT64 wormhole, and what doesn't.

] Never tell me the odds! | ipv6 mesh networks [
] Michael Richardson, Sandelman Software Works | network architect [
] mcr@sandelman.ca http://www.sandelman.ca/ | ruby on rails [

@ydahhrk
Member
ydahhrk commented Nov 12, 2015

Now that Node-Based Translation and Filtering don't depend on this (see scratched text above), the urgency of supporting other or more paradigms/frameworks seems less overwhelming. Also, @rolivasnic has made significant progress on NAT64 database redundancy (#113) and atomic configuration (#164).

Is it reasonable to sandwich another new-features release (3.5) between 3.4 and 4.0 (this)? I would be tempted to add --minimum-ipv6-mtu (#136) and namespaces (#187) to 3.5 too.

@toreanderson
Contributor

Are you asking me? I'm going to assume you are. 😄

I'm very happy with the way Jool currently works so as far as I'm concerned it could stay like it is in the future. Especially considering that if you need a «Jool net-device» for whatever reason you can accomplish that easily using a network namespace.

Therefore I'd be much more interested in features such as #114, #164, and #187. For integration in OpenStack I'd also need the possibility of making SIIT and NAT64 Jool be able to co-exist (within the same network namespace). Is that already considered part of #187?

@ydahhrk
Member
ydahhrk commented Nov 13, 2015

Are you asking me? I'm going to assume you are. :D

Sure, but again, some other folks see more future in this fix.

I'm concerned people might think the current framework does not scale well and this might be a blocker for using Jool.

This seems to be the case for the OpenWrt people. I'm hoping for an answer on whether the [NAT64] + NAT44 idea and the #187 improvements are comfortable workarounds for Jool not being a net device. Or an iptables module.

For integration in OpenStack I'd also need the possibility of making SIIT and NAT64 Jool be able to co-exist (within the same network namespace). Is that already considered part of #187?

Yes, but are you sure this will work as advertised? SIIT Jool is particularly infamous in that it tries to swallow all received traffic (especially in IPv4 and if the RFC 6052 prefix is present), which makes me think anything else you might want to do in the same namespace will just be getting leftovers.

(Unless it's chained before... but it's still weird. Still, I don't actually know what you're doing.)

It's what upsets me the most about SIIT Jool right now (and is a direct consequence of Netfilter).

@toreanderson
Contributor

Yes, but are you sure this will work as advertised? SIIT Jool is particularly infamous in that it tries to swallow all received traffic (especially in IPv4 and if the RFC 6052 prefix is present), which makes me think anything else you might want to do in the same namespace will just be getting leftovers.

Maybe. My initial idea would simply avoid overlapping addresses for --pool4, --pool6, --eamt, and «anything else». Perhaps I'd need a dash of --blacklist too...

@ydahhrk
Member
ydahhrk commented Nov 18, 2015

Maybe. My initial idea would simply avoid overlapping addresses for --pool4, --pool6, --eamt, and «anything else». Perhaps I'd need a dash of --blacklist too...

Ok. In any case, we should probably queue SIIT Jool after NAT64 Jool so SIIT gets the NAT64 leftovers and not the other way around.

@toreanderson
Contributor

Just thought I'd mention that user-space packet processing frameworks such as VPP and Snabb Switch seems to be getting more and more fashionable. First and foremost they tend to be really, really, fast. I'm also guessing that developing features and applications is going to be easier to do in user space than in kernel space.

Cisco published a very interesting blog post about VPP a few days ago. It's well worth the read, and it got me thinking that it's probably worth considering if Jool fits as an application/feature living within one of these user-space packet processing frameworks.

@ydahhrk
Member
ydahhrk commented Apr 14, 2016

Ok

Sorry about the silence lately; I'm still roaming around in South America (as vacations now) and my service provider doesn't reach this area, so I've been having trouble getting online.

I'll try to craft a less rushed response on monday.

@danrl
Contributor
danrl commented Nov 27, 2016

@JAORMX

Anyway, seems to me like those performance issues would now be solved using DPDK, though, that would tie the project to x86

I'd like to mention the home user and small office use case here. Currently experimenting with Jool on LEDE/OpenWRT and x86 architecture is probably one of the smaller target architectures there. I liked the fact that Jool offered a solution for big boxes in data centers as well as the small plastic routers we unfortunately often have to run at home. I would like everyone to keep this (currently small but growing) target group in mind: IPv6 only SOHO networks.

My two cents on @ydahhrk initial statement:

Become a network (pseudo-)device driver (ie. look like an interface).

WireGuard, a in-kernel VPN (see link below), uses an interface that comes without addressing initially. This works great and is easy to integrate in all kinds of use cases. IP addresses can be assigned with ip and all sorts of custom setups would be possible. Also, instead of being part of netfilter, one can use netfilter to create rules for the interface. An interface is IMHO the abstraction that fits Jool perfectly and allows for the greatest flexibility. I do not know about the performance impact, though.
Using interface as framework may also reduce the amount of required code, depending on how much of the kernel functions can be used. However, this goes beyond my area of expertise and is speculative.

Moreover, this looks like the most promising path to upstreaming into the kernel to me.

Move over to userspace (follow Tayga's steps).

Please don't. We have tayga already. However, it is easy to distribute userspace applications compared to kernel modules.

Become an iptables module.

Hmm... Sounds quite interesting to me. What if iptables gets deprecated some day? All gone?

Remain a Netfilter module and find workarounds for our compliance issues.

There must be a better way.

In WireGuard (http://www.wireguard.io) we have had tremendous performance improvements by leveraging the kernel's PADATA functions: http://lxr.free-electrons.com/source/kernel/padata.c

@ydahhrk
Member
ydahhrk commented Nov 28, 2016 edited

@toreanderson:

I'll try to craft a less rushed response on monday.

OOOOOPS; I left this hanging. My bad.

OK so most of the concerns that inspired this thread have met workarounds so I'm not sure switching frameworks is worthwhile anymore. Not that I don't want to do it; out of the three paths of least resistance (device driver, iptables and Netfilter), Netfilter is my least favorite because of the greedy packet stealing.

You have so far proposed...

  • DPDK
  • VPP
  • Snabb Switch

I shouldn't pretend like I did my homework getting a proper mouthful of these products, but I took a look at VPP's packet representation and it looks like it's not the same as the kernel's. This is perfectly reasonable, but kind of bad. Now that I've had to manhandle the RFC 7915 code I no longer think that Jool's relationship with struct sk_buff is very platonic. And it is somewhat complicated as it is. This doesn't mean that a Jool+VPP combo it is infeasible, just that I don't think that the cost-benefit ratio is right. The other userspace frameworks will likely offer the same obstacle.

BTW it looks like VPP is natively going to support NAT64 natively eventually.

It would also be useful to be able to run a Jool instance in Stateful NAT64 mode and another Jool instance in stateless mode inside a single network namespace at the same time.

I'm the biggest idiot, I apologize.

@danrl:

I would like everyone to keep this (currently small but growing) target group in mind: IPv6 only SOHO networks.

Moreover, this looks like the most promising path to upstreaming into the kernel to me.

Please don't. We have tayga already. However, it is easy to distribute userspace applications compared to kernel modules.

Agree, agree, agree. Agree.

Hmm... Sounds quite interesting to me. What if iptables gets deprecated some day? All gone?

My current trend is to implement the three in-kernel options (device driver/iptables/netfilter). This is because it looks like they are all the same; we would just wrap them differently. I don't think that we need to drop one in favor of the others.

So it doesn't hurt if one of them is deprecated.

There must be a better way.

Well, we already found most workarounds and I think they're elegant, so... :)

In WireGuard (http://www.wireguard.io) we have had tremendous performance improvements by leveraging the kernel's PADATA functions: http://lxr.free-electrons.com/source/kernel/padata.c

Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment