Server-side Stateless Challenges #832

Merged
merged 7 commits into from Jun 23, 2016

Conversation

Projects
None yet
3 participants
@dionrhys
Member

dionrhys commented May 5, 2016

Server-side Stateless Challenges

This pull request replaces the stateful server-side client connection challenge system with a stateless one in order to reduce the effectiveness of distributed denial-of-service (DDoS) attacks employing spoofed IP addresses.

The original stateful challenge system is problematic because it holds a finite list of clients that are currently challenging with the server. These clients have not yet proven that they can establish two-way communication because they have only sent one packet, and a packet's source address can be easily spoofed over the internet. This means a distributed attack from several hosts sending "getchallenge" packets with spoofed source IP addresses can easily fill up the list of clients that are currently challenging. This, in effect, prevents legitimate game clients from connecting to the server during an attack.

With a stateless challenge system, no state will be stored on the server after the initial packet from the client, and thus it's not possible to exhaust any sort of list by flooding the server with "getchallenge" requests because no such list needs to be kept. In effect, this makes the application layer resistant to distributed denial-of-service (DDoS) attacks employing "getchallenge" packets with spoofed source addresses. Only traffic that can verify two-way communication with the server can get past the challenge stage and be given state on the server.

This patch doesn't provide any additional protection from a distributed attack from valid IP addresses. However, because attacks with spoofed source IP addresses will be rendered ineffective at the application layer, the server administrator can filter out malicious IP addresses safely because they're guaranteed to be valid IP addresses. This is best done with a firewall solution somewhere along the network stack.

This patch doesn't prevent an attacker from overwhelming the processing power or network bandwidth of the server - it merely ensures legitimate game clients still have the opportunity to connect during an attack.

Technical Details

Instead of generating random challenges and storing those in a list of currently-challenging clients, this stateless challenge system generates unforgeable, temporal challenges using cryptography.

In essence:

Challenge = HMAC(SecretKey, ClientAddress + Timestamp)

The code uses this algorithm to create a challenge when the server receives a "getchallenge" packet and to verify a challenge when the server receives a "connect" packet.

When a server is started for the first time, a random secret key is generated to be used in the HMAC calculations for creating challenges. This ensures that the challenges given to clients can't be predicted ahead-of-time as they're based on this ephemeral secret key that only the server knows. This secret key stays the same until the server is restarted manually (after a quit or killserver), or until the engine forcefully restarts the server after 21 days of uptime (which isn't new behaviour here).

The client's source address is fed into the HMAC message so that each challenge is unique to each client.

Challenges have a limited time of validity to prevent replay attacks from an attacker gathering a large number of challenges from various valid addresses. This is done by adding a timestamp into the HMAC message that gets periodically incremented. In this implementation, the timestamp increments every 2^14 (16384) milliseconds. This gives a client at least ~16 seconds to respond to a challenge. If the timestamp happens to increment just between the client being sent their challenge and replying to it, the server can deal with this because the challenge contains a flag marking whether the timestamp at the time was odd or even. If the client's flag does not match the current timestamp's flag, the server will assume the client had the previous timestamp and will verify against that one instead.

In order to keep backwards-compatibility with base game clients, the challenge must be a 32-bit integer as they can only handle 32-bit numeric challenges. Unfortunately, this means discarding most of the output of the HMAC and only retaining 31 bits. The other 1 bit is used to flag whether the challenge timestamp is odd or even, so that the server knows when verifying the challenge whether to use its current timestamp or the previous timestamp (as discussed in the previous paragraph). In this case, the highest bit of the challenge holds the timestamp flag and the remaining 31 bits come from bits 1-31 of the HMAC output.

This HMAC-based stateless challenge system is based on the stateless cookie exchange in RFC 6347.

Anticipated Questions

Q: Why are you using MD5? It's not secure!

A: OpenJK already contains an implementation of the MD5 hashing algorithm from an upstream contribution in ioq3 and I wasn't convinced of a need to add a dependency for a newer algorithm here. It's true that MD5 is not collision-resistant and thus isn't suitable for applications that rely on collision-resistance, such as TLS certificates or digital signatures. However, this implementation of stateless challenge relies on HMAC-MD5, and MD5's vulnerability to collisions does not affect the security of the HMAC construct.

Informational RFC 6151 concludes:

The attacks on HMAC-MD5 do not seem to indicate a practical vulnerability when used as a message authentication code.

Additionally, since the stateless challenge implementation has to keep backwards-compatibility with base game clients, the HMAC output has to be truncated to a 32-bit integer since base game clients can only handle numeric 32-bit challenges. Adding a dependency for a newer hashing algorithm wouldn't improve security since the code has to discard the large majority of the HMAC's output bits anyway.

Q: Why did you remove sv_minPing and sv_maxPing?

A: sv_minPing and sv_maxPing relied on measuring the time it took between the server receiving the client's initial "getchallenge" packet and the server receiving the client's subsequent "connect" packet. Since the premise of stateless challenges is that no state is stored on the server before the client has successfully challenged, the server can't record the time when it received the first packet.

Regardless, the process of measuring RTT (ping) solely between the client's first two packets is flawed (as seen in #776). There isn't enough of a sample to get an accurate figure for the client's true latency. Game mods may wish to reimplement sv_minPing and sv_maxPing by sampling the client's mean ping while they're active in the game and then taking appropriate action if their ping is outside the range for a certain period of time.

Q: Will this cause lag when clients connect?

A: Nope! The expensive part of the implementation is generating a secret key for the HMAC construct and this is only done once, when a server is started. The same secret key will be used until the server is restarted manually (after a quit or killserver), or until the engine forcefully restarts the server after 21 days of uptime (which isn't new behaviour here). Generating the secret key takes less than a millisecond typically, but it was found to cause noticeable blips on a client lagometer if it was being done during gameplay.

The HMAC calculation for generating and verifying challenges when clients connect is very fast and has a negligible performance impact.

dionrhys added some commits Sep 4, 2015

Switch to varying HMAC message by time instead of revolving secret ke…
…ys every interval

The performance cost of generating new secret keys during gameplay was found to be non-negligible and caused observable blips in a client-side lagometer when connected to a stateless challenge server. This new method instead generates a secret key for the challenge system at each SV_Startup() (each map load). The message used as input into the HMAC construct now incorporates svs.time right-shifted by 14 bits in order to give a timestamp value that increments every 16384 milliseconds. The highest bit of the challenge integer stores whether the timestamp value was calculated at an odd or even timestamp period to allow recreation of a client's expected challenge when verifying, even if the server's timestamp has incremented in the meantime.
Fix strict-aliasing issue spotted by gcc
The byteAlias_t union type-punning method from q_shared.h isn't defined in the C++ standard (but it is in C), so I've opted for the safer memcpy option. The compiled, optimised assembler code is equivalent.
Remove outbound bandwidth rate limit on getchallenge
There's not much point in stateless challenges if we allow denial-of-service by throttling getchallenge based on outbound bandwidth! The worst-case amplification that can be achieved by getchallenge is ~2.6x (12 bytes "getchallenge" inbound = 31 bytes "challengeResponse -2111222333 0" outbound).

The per-IP rate limit is kept to prevent reflection attacks.

@dionrhys dionrhys changed the title from Stateless challenge to Server-side Stateless Challenges May 5, 2016

@Yberion

This comment has been minimized.

Show comment
Hide comment
@Yberion

Yberion Jun 8, 2016

Contributor

Why no push?

Contributor

Yberion commented Jun 8, 2016

Why no push?

@ensiform

This comment has been minimized.

Show comment
Hide comment
@ensiform

ensiform Jun 9, 2016

Member

Untested etc

Member

ensiform commented Jun 9, 2016

Untested etc

@dionrhys dionrhys merged commit db45a09 into JACoders:master Jun 23, 2016

1 check passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment