Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are (erroneous) non-ASCII ALPN strings handled? #16

Closed
johnthacker opened this issue Oct 17, 2023 · 5 comments
Closed

How are (erroneous) non-ASCII ALPN strings handled? #16

johnthacker opened this issue Oct 17, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@johnthacker
Copy link

ALPN Extension Value:

"The first and last characters of the ALPN (Application-Layer Protocol Negotiation) first value... If there are no ALPN values or no ALPN extension then we print “00” as the value in the fingerprint."

The ALPN first value is technically not a string. It is an "IANA-registered, opaque, non-empty byte string". As described further, it is the "precise set of octet values that identifies the protocol. This could be the UTF-8 encoding of the protocol name."

Currently all registered values, aside from some reserved values, are ASCII. However, if in fuzzed or erroneous data we get non ASCII, how should that be displayed?

If we treat it as "probably UTF-8" based on the recommendation, does "first and last characters" refer to UTF-8 characters, possibly multibyte? Or does it refer to octets?

If a handshake is encountered with first and last octets of the ProtocolName opaque byte string not printable ASCII characters (whether because someone registers a UTF-8 Identification Sequence or because of errors in the capture, how should it be handled? Should the non-printable ASCII bytes be escaped, changing the length of the JA4 string? Should they be replaced with single characters like '?' or with a (multibyte) UTF-8 REPLACEMENT CHARACTER? Should the ALPN portion of the JA4 string be replaced with "00" as in the cases where it is missing?

@john-althouse
Copy link
Collaborator

This is a great question, thanks for bringing this up!

For example purposes, let's say the first ALPN value is 0xAB 0xCD.

To handle this edge case we could simply place a "99" in the JA4 string to denote an unknown non-ASCII ALPN value.

Or, for non-ASCII ALPN values, we could take the first high-nibble (A) and the last low-nibble (D). So the ALPN value in the JA4 string would be "ad".

I like the latter option. What do you think?

@johnthacker
Copy link
Author

Either way is fine with me. If you like the latter option that works.

geraldcombs pushed a commit to wireshark/wireshark that referenced this issue Oct 21, 2023
As suggested in FoxIO-LLC/ja4#16 use first high-nibble
and the last low-nibble for non printable ALPN values.

Fixes: 19401
geraldcombs pushed a commit to wireshark/wireshark that referenced this issue Oct 22, 2023
As suggested in FoxIO-LLC/ja4#16 use first high-nibble
and the last low-nibble for non printable ALPN values.

Fixes: 19401


(cherry picked from commit 48cd7f9)
@john-althouse john-althouse self-assigned this Nov 8, 2023
@john-althouse john-althouse added the enhancement New feature or request label Nov 8, 2023
@vvv
Copy link
Collaborator

vvv commented Dec 10, 2023

Sample capture file:
📎 tls-non-ascii-alpn.pcapng.gz

Wireshark window showing non-ASCII ALPN Next Protocol value — 0xba 0xad

@vvv
Copy link
Collaborator

vvv commented Dec 12, 2023

For example purposes, let's say the first ALPN value is 0xAB 0xCD.

To handle this edge case we could simply place a "99" in the JA4 string to denote an unknown non-ASCII ALPN value.

Or, for non-ASCII ALPN values, we could take the first high-nibble (A) and the last low-nibble (D). So the ALPN value in the JA4 string would be "ad".

I like the latter option.

@john-althouse The nibble approach sees no difference between "h2" and, say, "h\xf2". Still, it's more sensitive than the 99 approach.

vvv added a commit to vvv/ja4 that referenced this issue Dec 12, 2023
vvv added a commit to vvv/ja4 that referenced this issue Dec 12, 2023
igr001-galactica pushed a commit that referenced this issue Dec 15, 2023
* rust: Support tshark v4.2.0

* rust: Handle non-ASCII ALPN strings

Related issue: #16
@igr001-galactica
Copy link
Collaborator

This should be fixed now with recent changes to Rust and Python.

noeltimothy added a commit to noeltimothy/ja4 that referenced this issue Dec 20, 2023
igr001-galactica pushed a commit that referenced this issue Dec 20, 2023
* fix for issue #21, original order

* fix for issue #30 ALPN values

* fix for issue #16
igr001-galactica pushed a commit that referenced this issue Jan 4, 2024
* fix for issue #21, original order

* fix for issue #30 ALPN values

* fix for issue #16

* fix for issue #6

* update to check Cookie and Referer for all cases
igr001-galactica pushed a commit that referenced this issue Jan 4, 2024
* fix for issue #21, original order

* fix for issue #30 ALPN values

* fix for issue #16

* fix for issue #6

* update to check Cookie and Referer for all cases

* fixes #41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants