How are (erroneous) non-ASCII ALPN strings handled? #16

johnthacker · 2023-10-17T13:52:00Z

"The first and last characters of the ALPN (Application-Layer Protocol Negotiation) first value... If there are no ALPN values or no ALPN extension then we print “00” as the value in the fingerprint."

The ALPN first value is technically not a string. It is an "IANA-registered, opaque, non-empty byte string". As described further, it is the "precise set of octet values that identifies the protocol. This could be the UTF-8 encoding of the protocol name."

Currently all registered values, aside from some reserved values, are ASCII. However, if in fuzzed or erroneous data we get non ASCII, how should that be displayed?

If we treat it as "probably UTF-8" based on the recommendation, does "first and last characters" refer to UTF-8 characters, possibly multibyte? Or does it refer to octets?

If a handshake is encountered with first and last octets of the ProtocolName opaque byte string not printable ASCII characters (whether because someone registers a UTF-8 Identification Sequence or because of errors in the capture, how should it be handled? Should the non-printable ASCII bytes be escaped, changing the length of the JA4 string? Should they be replaced with single characters like '?' or with a (multibyte) UTF-8 REPLACEMENT CHARACTER? Should the ALPN portion of the JA4 string be replaced with "00" as in the cases where it is missing?

john-althouse · 2023-10-17T19:23:06Z

This is a great question, thanks for bringing this up!

For example purposes, let's say the first ALPN value is 0xAB 0xCD.

To handle this edge case we could simply place a "99" in the JA4 string to denote an unknown non-ASCII ALPN value.

Or, for non-ASCII ALPN values, we could take the first high-nibble (A) and the last low-nibble (D). So the ALPN value in the JA4 string would be "ad".

I like the latter option. What do you think?

johnthacker · 2023-10-17T22:16:25Z

Either way is fine with me. If you like the latter option that works.

As suggested in FoxIO-LLC/ja4#16 use first high-nibble and the last low-nibble for non printable ALPN values. Fixes: 19401

As suggested in FoxIO-LLC/ja4#16 use first high-nibble and the last low-nibble for non printable ALPN values. Fixes: 19401 (cherry picked from commit 48cd7f9)

vvv · 2023-12-10T23:42:11Z

Sample capture file:
📎 tls-non-ascii-alpn.pcapng.gz

vvv · 2023-12-12T13:46:28Z

For example purposes, let's say the first ALPN value is 0xAB 0xCD.

To handle this edge case we could simply place a "99" in the JA4 string to denote an unknown non-ASCII ALPN value.

Or, for non-ASCII ALPN values, we could take the first high-nibble (A) and the last low-nibble (D). So the ALPN value in the JA4 string would be "ad".

I like the latter option.

@john-althouse The nibble approach sees no difference between "h2" and, say, "h\xf2". Still, it's more sensitive than the 99 approach.

Related issue: FoxIO-LLC#16

* rust: Support tshark v4.2.0 * rust: Handle non-ASCII ALPN strings Related issue: #16

igr001-galactica · 2023-12-15T16:35:36Z

This should be fixed now with recent changes to Rust and Python.

* fix for issue #21, original order * fix for issue #30 ALPN values * fix for issue #16

* fix for issue #21, original order * fix for issue #30 ALPN values * fix for issue #16 * fix for issue #6 * update to check Cookie and Referer for all cases

* fix for issue #21, original order * fix for issue #30 ALPN values * fix for issue #16 * fix for issue #6 * update to check Cookie and Referer for all cases * fixes #41

geraldcombs pushed a commit to wireshark/wireshark that referenced this issue Oct 21, 2023

TLS: JA4 fix non printable ALPN values

48cd7f9

As suggested in FoxIO-LLC/ja4#16 use first high-nibble and the last low-nibble for non printable ALPN values. Fixes: 19401

john-althouse self-assigned this Nov 8, 2023

john-althouse added the enhancement New feature or request label Nov 8, 2023

vvv added a commit to vvv/ja4 that referenced this issue Dec 12, 2023

rust: Handle non-ASCII ALPN strings

847cfa5

Related issue: FoxIO-LLC#16

vvv mentioned this issue Dec 12, 2023

[rust] Handle non-ASCII ALPN strings #32

Merged

vvv added a commit to vvv/ja4 that referenced this issue Dec 12, 2023

rust: Handle non-ASCII ALPN strings

f6ef842

Related issue: FoxIO-LLC#16

igr001-galactica pushed a commit that referenced this issue Dec 15, 2023

[rust] Handle non-ASCII ALPN strings (#32)

421917f

* rust: Support tshark v4.2.0 * rust: Handle non-ASCII ALPN strings Related issue: #16

igr001-galactica closed this as completed Dec 15, 2023

noeltimothy added a commit to noeltimothy/ja4 that referenced this issue Dec 20, 2023

fix for issue FoxIO-LLC#16

fca3b3b

igr001-galactica pushed a commit that referenced this issue Dec 20, 2023

Fixes non-ascii ALPN values (#37)

2636182

* fix for issue #21, original order * fix for issue #30 ALPN values * fix for issue #16

igr001-galactica pushed a commit that referenced this issue Jan 4, 2024

Fix for issue #6 (#43)

96a9e7a

* fix for issue #21, original order * fix for issue #30 ALPN values * fix for issue #16 * fix for issue #6 * update to check Cookie and Referer for all cases

igr001-galactica pushed a commit that referenced this issue Jan 4, 2024

Skip delegated credentials (#46)

747d870

* fix for issue #21, original order * fix for issue #30 ALPN values * fix for issue #16 * fix for issue #6 * update to check Cookie and Referer for all cases * fixes #41

lrstewart mentioned this issue Aug 21, 2024

Clarify alpn edge case handling #147

Closed

p-l- mentioned this issue Aug 21, 2024

Question about the ALPN bytes in JA4 #148

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are (erroneous) non-ASCII ALPN strings handled? #16

How are (erroneous) non-ASCII ALPN strings handled? #16

johnthacker commented Oct 17, 2023

john-althouse commented Oct 17, 2023

johnthacker commented Oct 17, 2023

vvv commented Dec 10, 2023

vvv commented Dec 12, 2023

igr001-galactica commented Dec 15, 2023

How are (erroneous) non-ASCII ALPN strings handled? #16

How are (erroneous) non-ASCII ALPN strings handled? #16

Comments

johnthacker commented Oct 17, 2023

john-althouse commented Oct 17, 2023

johnthacker commented Oct 17, 2023

vvv commented Dec 10, 2023

vvv commented Dec 12, 2023

igr001-galactica commented Dec 15, 2023