Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP RST handling for FIN RCV #212

Closed
pierky opened this issue Mar 31, 2016 · 12 comments
Closed

TCP RST handling for FIN RCV #212

pierky opened this issue Mar 31, 2016 · 12 comments
Milestone

Comments

@pierky
Copy link
Contributor

pierky commented Mar 31, 2016

Hello,

during some tests I've noticed that many TCP connections move to the V4_FIN_RCV / V6_FIN_RCV states and stay there up to 2 hours (TCP_EST), even if they are actually terminated on the hosts. Stuck connections adversely impact resource usage on a NAT64 box.

As far as I can see from some tcpdump captures TCP RSTs are involved in all these connections, for example RST sent in response to FIN with no more packets to follow.

I know RFC6146 section 3.5.2.2 states that RSTs should be handled only for connections in ESTABLISHED state (moving them to TRANS) but I wonder whether an optional configuration argument can be added to Jool to allow it to process RSTs in FIN RCV states too.

What I mean has already been discussed on BEHAVE:

Why cant the same thing apply to one of the FIN RCV state? Meaning if there is a RST, again all the caveats below apply, but bounce of TRANS see if the RST is accepted and further data flows and either move to CLOSED or come back to the FIN RCV states?

link to message

Of course RST handling needs a special attention about sequencing to mitigate attacks, and this introduces another topic: firewalling/filtering on a NAT64 box. Many considerations have already been written about this too:

A NAT is not a security device and does not implement "security policies."

link to message

firewalling should be kept out of the NAT64 document

link to message

NATs don't "filter". Stateful NATs translate packets for which they have binding state, and drop packets for which they have none.

link to message

I don't know if internally Jool already implements sequencing checks, so if you think my proposal could be taken into account I would also ask

  • if Jools does not implement seq checking, do you think it's worth and "politically correct" to spend some efforts on it to strengthen the FIN RCV -> TRANS transition on RST receipt?
  • do you think that blindly trusting the RST and moving the state to TRANS, leaving to the hosts the validation effort, would suffice?

Thank you for considering my request.

@ydahhrk
Copy link
Member

ydahhrk commented Mar 31, 2016

Well, we have challenged the law before. Mostly thanks to Tore, it seems :)

Jool does not currently track TCP seqs at all (not even the upcoming 3.5 release), but it will have to be implemented eventually if we want to support FTP.

I don't remember the details from when I started trying to fix FTP, but I do recall thinking it was going to be a lot of work. In fact, it might be effort better spent merging Jool with the kernel, because that should enable us access to iptables's seq tracking code. (And also the FTP NAT44 ALG.)

I know RFC6146 section 3.5.2.2 states that RSTs should be handled only for connections in ESTABLISHED state (moving them to TRANS) but I wonder whether an optional configuration argument can be added to Jool to allow it to process RSTs in FIN RCV states too.

What I mean has already been discussed on BEHAVE:

Before I say anything else, I'd like to point out that Dmitry Anipko did not suggest the state should be moved to TRANS; he merely proposed to shorten the timer: From "the maximum session lifetime" (usually TCP_EST) to "the established connection idle timeout" (usually TCP_TRANS).

The distinction is important because the FIN RCV states are meant to never return to ESTABLISHED, while TRANS does.

Edit: Oops. I failed you realize you were linking to Murari Sridharan's message, not to Dmitry's. I do not understand why Murari changed the idea, but see below.

do you think that blindly trusting the RST and moving the state to TRANS, leaving to the hosts the validation effort, would suffice?

It won't work as advertised 100% of the time:

A -------- N ----+--- B
                 |
                 +--- M

A = IPv4 node
N = NAT64
B = IPv6 node
M = Malicious node

Let's say A and B are exchanging packets. N is in the ESTABLISHED state.

  • A has nothing left to send, so it sends v4 FIN. B is still sending packets, so it doesn't. N therefore hangs in the V4 FIN RCV state, waiting for B's FIN. (This is normal.)
  • M sends a RST to A in B's name.
  • N moves to TRANS. Let's say the sequence number is not right so N could have technically detected the hijack.
  • A (or a firewall between A and N) detects the problem and drops the packet.
  • B keeps sending packets so N moves from TRANS to ESTABLISHED.

We have a problem now; N is in ESTABLISHED even though it should still be in V4 FIN RCV.

  • B eventually shuts up and sends FIN. N moves to V6 FIN RCV.
  • N hangs at V6 FIN RCV for two hours because A's FIN happened a long time ago...

Dmitry's solution doesn't have this problem as far as I can tell. Its only drawback is that the RST is going to break the connection if the endpoints do not exchange data during the TCP_TRANS period.

(But if my home's ISP is of any indication, this is normal even without NAT64s in the way :P. Then again, it's pretty brittle in general.)

I really don't have any objections to coding this, but it would not be enabled by default.

if Jools does not implement seq checking, do you think it's worth and "politically correct" to spend some efforts on it to strengthen the FIN RCV -> TRANS transition on RST receipt?

Noooooope :). Seqnum tracking is a lot of code for just this purpose, and since the filtering topic raised a lot of controversy I'd say it's fairly safe to assume the IETF rejected the idea during one of the meetings. (I cannot say the same for Dmitry's proposal however, as it kind of looks like they forgot about it.)

But if we end up implementing seq checking anyway due to FTP, then improving RST analysis immediately after would be rather inevitable.


tl;dr

I'm in for implementing Dmitry's proposal, which is not about moving to TRANS from FIN RCV.

Filtering bad RSTs may eventually happen, but I'd say it's unlikely in the near future. (unless someone can propose some clever code)

Objections?

@pierky
Copy link
Contributor Author

pierky commented Apr 1, 2016

Thanks @ydahhrk for your detailed reply.

Before I say anything else, I'd like to point out that Dmitry Anipko did not suggest the state should be moved to TRANS; he merely proposed to shorten the timer: From "the maximum session lifetime" (usually TCP_EST) to "the established connection idle timeout" (usually TCP_TRANS).

Well it would be perfect, actually I don't know how I ended up focusing on TRANS transition, what I was really looking for was the Dmitry Anipko's solution.

I totally agree with your conclusions, I'm ready to test the new code as soon as you release it.

Do you plan to introduce a new (non-RFC6146) state to also handle the case data keep flowing after the RST? I mean something that allows to revert the timer to TCP_EST.

@ydahhrk
Copy link
Member

ydahhrk commented Apr 1, 2016

Do you plan to introduce a new (non-RFC6146) state to also handle the case data keep flowing after the RST? I mean something that allows to revert the timer to TCP_EST.

See, that's the beauty of Dmitry's solution: I don't have to. Aside from the RST special handling, The FIN RCVs already do what we want them to, so we never need to leave them.

These are the two possible outcomes (that I know of) that involve a RST during one of the FIN RCV states:

1: B is legitimately trying to end the connection using an RST

Background: N is in V4 FIN RCV. Accordingly, B is still sending data, and A isn't.

  1. B is done, so it sends a v6 RST.
  2. N remains in V4 FIN RCV but shortens the timer.
  3. Nobody sends any more data, so the state dies naturally after a short while. (From V4 FIN RCV: "If the lifetime expires, the Session Table Entry is deleted, and the state is moved to CLOSED.")

Well, M can keep the session alive by sending more forged packets, but the state machine was already vulnerable to this anyway.

2: M is trying to tamper the connection by sneaking an RST in

Same background as 1.

  1. M sends a forged v6 RST.
  2. N remains in V4 FIN RCV but shortens the timer.
  3. B is still sending packets so N enlarges the timeout again. (From V4 FIN RCV: "If any packet other than the V6 FIN [or v6 RST in our case] is received, (...) The lifetime of the TCP Session Table Entry is set to at least the maximum session lifetime. (...) The state remains unchanged as V4 FIN RCV.")

We're now back to where we were before the RST; M's attack did nothing.

Assuming, that is, that B has something to send within 4 minutes. But if you're seeing lots of clients thinking they're being clever by ending with RST instead of FIN, breaking the ones that fulfill all of the following requirements to achieve better resource utilization is probably a reasonable tradeoff:

  • Are idle. (more than TCP_TRANS time between packets)
  • One side has already sent a FIN.
  • Are being RST-attacked.

(It can't be the default, though, because it encourages sloppy TCP stacks.)

I'll get to coding.

@ydahhrk ydahhrk added this to the 3.4.3 milestone Apr 1, 2016
@ydahhrk
Copy link
Member

ydahhrk commented Apr 1, 2016

I forgot to mention this issue in the commit message :C

This is the code: https://github.com/NICMx/Jool/tree/issue212

sudo jool --handle-rst-during-fin-rcv true

(It's still missing the man page entry, but not --help)

@pierky
Copy link
Contributor Author

pierky commented Apr 8, 2016

Today I've tested it for just a few minutes and it seemed to be working as expected. If you want you can close this issue. I'll open a new one if I'll find any problem. Many thanks

@ydahhrk
Copy link
Member

ydahhrk commented Apr 14, 2016

Thank you

I'll get the 3.4.3 release started on monday :)

@ydahhrk ydahhrk added the Status: Tested Needs release label Apr 20, 2016
@ydahhrk
Copy link
Member

ydahhrk commented Apr 20, 2016

Want credits in the README? (If so, please state what you'd like included)

(Edit: Rats. I should have asked via mail. My bad.)

@pierky
Copy link
Contributor Author

pierky commented Apr 20, 2016

Oh thanks, why not? Can't help with code but I try to do it in other ways.

* [Pier Carlo Chiodi](https://pierky.com/)

@ydahhrk ydahhrk removed the Status: Tested Needs release label Apr 25, 2016
@ydahhrk
Copy link
Member

ydahhrk commented Apr 25, 2016

3.4.3 released; closing.

@ydahhrk ydahhrk closed this as completed Apr 25, 2016
@laura-zelenku
Copy link
Contributor

Hello @ydahhrk, @pierky. It's quite a while already but ...
In our project I've found quite strange situation and thinking why is BIB table reaching its maximum for TCP traffic. As a fix we've used the handle-rst-during-fin-rcv flag. But it didn't solve the issue.

In the tcpdump we've found following:

  • connection is initiated by client (SYN, SYN + ACK, ACK)
  • data is sent by client and server ACK
  • server sends FIN
  • server sends RST right after FIN (but RST should be sent by client)
  • no more traffic

Jool session will end up with EST timeout. I would expect after FIN and RST (from any IPv4 or IPv6) session will be with TRANS timeout. Also the documentation says: If you activate handle-rst-during-fin-rcv, Jool will assign the transitory lifetime to connections ended with a FIN followed by a RST.
I'm thinking about to create new issue here but would like to ask before. I've already looked into code that the fix is possible.

@ydahhrk
Copy link
Member

ydahhrk commented Aug 26, 2021

Ok. I'll test it tomorrow, since I don't hace my equipment today.

The code does look like it should be working as advertised, though I won't know until I've tested it:

https://github.com/NICMx/Jool/blob/master/src/mod/common/steps/filtering_and_updating.c#L359-L362
https://github.com/NICMx/Jool/blob/master/src/mod/common/steps/filtering_and_updating.c#L384-L387

I've already looked into code that the fix is possible.

Did you find a solution? A pull request would be welcomed.

@laura-zelenku
Copy link
Contributor

I've created PR #364 but I'm still not sure about whether this is valid case. I mean if the traffic flow I've spotted is or isn't against RFC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants