New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrency in nat64 #69

Closed
ipclouds opened this Issue Sep 4, 2013 · 10 comments

Comments

Projects
None yet
3 participants
@ipclouds

ipclouds commented Sep 4, 2013

does the nat64 module support concurrency or parallelism?

i was looking through the code and doesn't seem to be using threads or other mechanism. maybe i'm missing something but how are you handling that.

I see that you are using locks, but why are using spin locks if the code is not concurrent.

@ydahhrk

This comment has been minimized.

Show comment
Hide comment
@ydahhrk

ydahhrk Sep 5, 2013

Member

The kernel is the one that handles the actual reception of the packets. Whenever a packet arrives, it is supposed to fork a thread, which will walk it through the network stack. Jool hooks itself to the Internet layer.

If two packets arrive at the same time, and the node has two CPUs, then the kernel will assign one packet to each CPU and they will both process their packets at the same time.

So while Jool doesn't fork any threads (other than the session deletion timer, which is irrelevant here), many Jools can be running at the same time on different CPUs. All these Jools are actually the same piece of software, so they share memory, and hence the spinlocks.

At least that's what I believe after reading http://lwn.net/images/pdf/LDD3/ch02.pdf (scroll down to the "Concurrency in the Kernel" section).

But that was indeed too theoretical to be conclusive, so we did a few tests. We simply logged the CPU ID (current_thread_info()->cpu) both at the beggining and at the end of Jool's pipeline, and we flooded the network with massive files using scp. After a while we found the following output:

[ 3385.446176] Got packet on CPU 0.
[ 3385.446186] Got packet on CPU 3.
[ 3385.524297] Quitting Jool on CPU 0.
[ 3385.536577] Quitting Jool on CPU 3.

If there are no fallacies in our reasoning, I'd say there is indeed concurrency here, and Jool is prepared for it.

Member

ydahhrk commented Sep 5, 2013

The kernel is the one that handles the actual reception of the packets. Whenever a packet arrives, it is supposed to fork a thread, which will walk it through the network stack. Jool hooks itself to the Internet layer.

If two packets arrive at the same time, and the node has two CPUs, then the kernel will assign one packet to each CPU and they will both process their packets at the same time.

So while Jool doesn't fork any threads (other than the session deletion timer, which is irrelevant here), many Jools can be running at the same time on different CPUs. All these Jools are actually the same piece of software, so they share memory, and hence the spinlocks.

At least that's what I believe after reading http://lwn.net/images/pdf/LDD3/ch02.pdf (scroll down to the "Concurrency in the Kernel" section).

But that was indeed too theoretical to be conclusive, so we did a few tests. We simply logged the CPU ID (current_thread_info()->cpu) both at the beggining and at the end of Jool's pipeline, and we flooded the network with massive files using scp. After a while we found the following output:

[ 3385.446176] Got packet on CPU 0.
[ 3385.446186] Got packet on CPU 3.
[ 3385.524297] Quitting Jool on CPU 0.
[ 3385.536577] Quitting Jool on CPU 3.

If there are no fallacies in our reasoning, I'd say there is indeed concurrency here, and Jool is prepared for it.

@ipclouds

This comment has been minimized.

Show comment
Hide comment
@ipclouds

ipclouds Sep 5, 2013

Fair enough!

So basically, if I have 8 cores on a server, there are going to be 8 instances of the translator and each instance has a thread that deletes the sessions? If they share the memory it would be better if there is only one thread that deletes sessions, wouldn't you say that?

Am I getting this right?

Can you tell me why are you using spin locks specifically? There several locking mechanisms in the kernel, like RCU locks, R/W spin lock, R/W semaphores that are better for some tasks than others. Sometimes double locking is a good choice also, using different types of locks. Taking into consideration, for example, how many readers and writers you have on the session table, is a good way to choose what type of lock suits you better.

As I recall spin locks are somewhat slow, but safe. Maybe this can affect your performance.

ipclouds commented Sep 5, 2013

Fair enough!

So basically, if I have 8 cores on a server, there are going to be 8 instances of the translator and each instance has a thread that deletes the sessions? If they share the memory it would be better if there is only one thread that deletes sessions, wouldn't you say that?

Am I getting this right?

Can you tell me why are you using spin locks specifically? There several locking mechanisms in the kernel, like RCU locks, R/W spin lock, R/W semaphores that are better for some tasks than others. Sometimes double locking is a good choice also, using different types of locks. Taking into consideration, for example, how many readers and writers you have on the session table, is a good way to choose what type of lock suits you better.

As I recall spin locks are somewhat slow, but safe. Maybe this can affect your performance.

@ydahhrk

This comment has been minimized.

Show comment
Hide comment
@ydahhrk

ydahhrk Sep 7, 2013

Member

"So basically, if I have 8 cores on a server, there are going to be 8 instances of the translator and each instance has a thread that deletes the sessions? If they share the memory it would be better if there is only one thread that deletes sessions, wouldn't you say that?"

I messed up when I said "many Jools". What I meant is that there is only one Jool in memory, but the different processors might be running it at the same time, each using it to mangle a different packet.

There is ever only one session deletion timer. It wakes up every ten seconds, claims one of the processors, kills old data, and goes back to sleep (returning the processor).

The whole thing is based on interrupts: At some point a packet arrives, which interrupts the kernel, and the kernel "sends" any available procesor to Jool' main callback. At other times enough time has passed, so the kernel "sends" an available processor to Jool's session deletion timer. It's like they're both programs that assist the kernel on demand.

If you compile the module using the debug option (uncomment it from the mod/Kbuild file) you'll see the session thread echoing messages to the kernel log every ten seconds (see the kernel log by running the "dmesg" command). By the way, you can find the timer code in the mod/session.c file (cleaner_timer() and clean_expired_sessions() functions).

"Can you tell me why are you using spin locks specifically?"

Literature claims that RCUs and r/w locks work best when there are mostly reads and Jool's main databases are constantly changing.
But now that you mention it, the steps' configuration structures are pieces of data that use spinlocks and are not expected to change much (because they depend on the user). I hadn't realized this, and it does deserve to be looked into.

Semaphores are a huge no-no because they sleep and most of Jool runs in software interrupt context. We do have a little mutex, though, in the Netlink handling code. It's fine to not have a spinlock here, because the configuration module runs in process context (on behalf of the userspace application).

Also, we always use the "_bh" variant of the spin_lock functions also because we're running in a software interrupt:

"If you have a spinlock that can be taken by code that runs in (...) interrupt context, you must use one of the forms of spin_lock that disables interrupts. (...) If you do not access your lock in a hardware interrupt handler, but you do via software interrupts (...), you can use spin_lock_bh to safely avoid deadlocks while still allowing hardware interrupts to be serviced." (www.makelinux.net/ldd3/chp-5-sect-5)

I always have trouble finding Netfilter documentation regarding the context it runs in, so I've done a bit of a leap of faith assuming we ALWAYS run in software interrupt context. I admit that all I know for sure is that during packet processing, the in_irq() function always returns me 0 and in_softirq() always gives me a positive integer (http://people.netfilter.org/rusty/unreliable-guides/kernel-hacking/basics-hardirqs.html).

Member

ydahhrk commented Sep 7, 2013

"So basically, if I have 8 cores on a server, there are going to be 8 instances of the translator and each instance has a thread that deletes the sessions? If they share the memory it would be better if there is only one thread that deletes sessions, wouldn't you say that?"

I messed up when I said "many Jools". What I meant is that there is only one Jool in memory, but the different processors might be running it at the same time, each using it to mangle a different packet.

There is ever only one session deletion timer. It wakes up every ten seconds, claims one of the processors, kills old data, and goes back to sleep (returning the processor).

The whole thing is based on interrupts: At some point a packet arrives, which interrupts the kernel, and the kernel "sends" any available procesor to Jool' main callback. At other times enough time has passed, so the kernel "sends" an available processor to Jool's session deletion timer. It's like they're both programs that assist the kernel on demand.

If you compile the module using the debug option (uncomment it from the mod/Kbuild file) you'll see the session thread echoing messages to the kernel log every ten seconds (see the kernel log by running the "dmesg" command). By the way, you can find the timer code in the mod/session.c file (cleaner_timer() and clean_expired_sessions() functions).

"Can you tell me why are you using spin locks specifically?"

Literature claims that RCUs and r/w locks work best when there are mostly reads and Jool's main databases are constantly changing.
But now that you mention it, the steps' configuration structures are pieces of data that use spinlocks and are not expected to change much (because they depend on the user). I hadn't realized this, and it does deserve to be looked into.

Semaphores are a huge no-no because they sleep and most of Jool runs in software interrupt context. We do have a little mutex, though, in the Netlink handling code. It's fine to not have a spinlock here, because the configuration module runs in process context (on behalf of the userspace application).

Also, we always use the "_bh" variant of the spin_lock functions also because we're running in a software interrupt:

"If you have a spinlock that can be taken by code that runs in (...) interrupt context, you must use one of the forms of spin_lock that disables interrupts. (...) If you do not access your lock in a hardware interrupt handler, but you do via software interrupts (...), you can use spin_lock_bh to safely avoid deadlocks while still allowing hardware interrupts to be serviced." (www.makelinux.net/ldd3/chp-5-sect-5)

I always have trouble finding Netfilter documentation regarding the context it runs in, so I've done a bit of a leap of faith assuming we ALWAYS run in software interrupt context. I admit that all I know for sure is that during packet processing, the in_irq() function always returns me 0 and in_softirq() always gives me a positive integer (http://people.netfilter.org/rusty/unreliable-guides/kernel-hacking/basics-hardirqs.html).

@ydahhrk

This comment has been minimized.

Show comment
Hide comment
@ydahhrk

ydahhrk Sep 26, 2013

Member

Finally updated the previous comment.

Member

ydahhrk commented Sep 26, 2013

Finally updated the previous comment.

@ipclouds

This comment has been minimized.

Show comment
Hide comment
@ipclouds

ipclouds Oct 28, 2013

Spinlocks are basically Busy waiting, which is an approach not always desired in parallel environments, but I guess you are making a valid case against the other types of locks.

"Spinning can be a valid strategy in certain circumstances, most notably in the implementation of spinlocks within operating systems designed to run on SMP systems. In general, however, spinning is considered an anti-pattern and should be avoided, as processor time that could be used to execute a different task is instead wasted on useless activity." from Wikipedia

Anyway... any update on this? Did you change the code or something?

"But now that you mention it, the steps' configuration structures are pieces of data that use spinlocks and are not expected to change much (because they depend on the user). I hadn't realized this, and it does deserve to be looked into."

ipclouds commented Oct 28, 2013

Spinlocks are basically Busy waiting, which is an approach not always desired in parallel environments, but I guess you are making a valid case against the other types of locks.

"Spinning can be a valid strategy in certain circumstances, most notably in the implementation of spinlocks within operating systems designed to run on SMP systems. In general, however, spinning is considered an anti-pattern and should be avoided, as processor time that could be used to execute a different task is instead wasted on useless activity." from Wikipedia

Anyway... any update on this? Did you change the code or something?

"But now that you mention it, the steps' configuration structures are pieces of data that use spinlocks and are not expected to change much (because they depend on the user). I hadn't realized this, and it does deserve to be looked into."

@ydahhrk

This comment has been minimized.

Show comment
Hide comment
@ydahhrk

ydahhrk Oct 28, 2013

Member

Sorry, I suppose I'm assuming our current status is clear even though it's probably not the case.

We are currently primarily focused in the development of our missing features (#59, #58 and #41, mainly fragmentation - https://github.com/NICMx/NAT64/tree/fragmentation), and also an official proper website, since they are believed to have more priority than performance issues.

Since this thread manifests outsider concern for this particular problem, I can offer delaying the current milestone (https://github.com/NICMx/NAT64/issues/milestones) in order to include these particular spinlocks' replacement.

Now, I can really understand if you just want us to do this in the name of elegance and perfection, as we're really aiming for them, but fixing this particular performance concern this early feels a little random. Are you positive these spinlocks are slowing you down? We have run some performance tests (though shamefully we haven't published most results) and it doesn't look like Jool is a bottleneck, even though we're testing in a 6 year old server.

Just to clear things up: I really think that only the configuration spinlocks can be switched to RCUs or r/w locks. In order to remove all spinlocks, I'd rather think the module would have to be moved to userspace, or at least to some other environment where sleeping is allowed. Such a solution would definitely require more time and as such we'd need to postpone it.

What do you say?

Member

ydahhrk commented Oct 28, 2013

Sorry, I suppose I'm assuming our current status is clear even though it's probably not the case.

We are currently primarily focused in the development of our missing features (#59, #58 and #41, mainly fragmentation - https://github.com/NICMx/NAT64/tree/fragmentation), and also an official proper website, since they are believed to have more priority than performance issues.

Since this thread manifests outsider concern for this particular problem, I can offer delaying the current milestone (https://github.com/NICMx/NAT64/issues/milestones) in order to include these particular spinlocks' replacement.

Now, I can really understand if you just want us to do this in the name of elegance and perfection, as we're really aiming for them, but fixing this particular performance concern this early feels a little random. Are you positive these spinlocks are slowing you down? We have run some performance tests (though shamefully we haven't published most results) and it doesn't look like Jool is a bottleneck, even though we're testing in a 6 year old server.

Just to clear things up: I really think that only the configuration spinlocks can be switched to RCUs or r/w locks. In order to remove all spinlocks, I'd rather think the module would have to be moved to userspace, or at least to some other environment where sleeping is allowed. Such a solution would definitely require more time and as such we'd need to postpone it.

What do you say?

@ipclouds

This comment has been minimized.

Show comment
Hide comment
@ipclouds

ipclouds Nov 3, 2013

Like I said on my previous comment, you are making a valid case for using spinlocks and if your performance results don't show anything wrong with them maybe you shouldn't change them.

I'm asking if you are going to change the configuration locks any time soon. I thought you were going to change them in the next release but I misunderstood.

If you need a website you should take a look at http://pages.github.com/

ipclouds commented Nov 3, 2013

Like I said on my previous comment, you are making a valid case for using spinlocks and if your performance results don't show anything wrong with them maybe you shouldn't change them.

I'm asking if you are going to change the configuration locks any time soon. I thought you were going to change them in the next release but I misunderstood.

If you need a website you should take a look at http://pages.github.com/

@ydahhrk

This comment has been minimized.

Show comment
Hide comment
@ydahhrk

ydahhrk Nov 6, 2013

Member

Thank you.
We're not overly sure of every implication issue #41 has waiting for us, but we're roughly expecting to be handling performance problems in about six months.
I'm upgrading labels on this issue to remind us of the config spinlocks problem.

Member

ydahhrk commented Nov 6, 2013

Thank you.
We're not overly sure of every implication issue #41 has waiting for us, but we're roughly expecting to be handling performance problems in about six months.
I'm upgrading labels on this issue to remind us of the config spinlocks problem.

@ydahhrk ydahhrk modified the milestones: 3.1.2, 3.1.1 Feb 20, 2014

@ydahhrk

This comment has been minimized.

Show comment
Hide comment
@ydahhrk

ydahhrk Mar 6, 2014

Member

We fixed #76 instead of this, so moving to milestone 3.1.3.

Member

ydahhrk commented Mar 6, 2014

We fixed #76 instead of this, so moving to milestone 3.1.3.

@ydahhrk ydahhrk modified the milestones: 3.1.3, 3.1.2 Mar 6, 2014

ydahhrk added a commit that referenced this issue Mar 26, 2014

Revision of the solution to issue #69.
Fixed some concurrence issues and updated some comments.
@ydahhrk

This comment has been minimized.

Show comment
Hide comment
@ydahhrk

ydahhrk Mar 26, 2014

Member

Fixed and merged to the master branch; closing.

Member

ydahhrk commented Mar 26, 2014

Fixed and merged to the master branch; closing.

@ydahhrk ydahhrk closed this Mar 26, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment