Skip to content

WebSocket Tests and Fixes #17136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

Conversation

viscruocco
Copy link
Contributor

@viscruocco viscruocco commented Apr 22, 2025

This started out as regression tests for the curl_ws_recv() and curl_ws_send() implementation and ended up with a bugfix, additional protocol validation and minor logging improvements.

Bugfixes

  • Fix reset of fragmented message decoder state when a PING/PONG is received in between message fragments.
  • Fix undefined behavior (applying zero offset to null pointer) in curl_ws_send() when the given buffer is NULL.

Additional Validations

  • Detect invalid overlong PING/PONG/CLOSE frames.
  • Detect invalid fragmented PING/PONG/CLOSE frames.
  • Detect invalid sequences of fragmented frames.
    • a) A continuation frame (0x80...) is received without any ongoing fragmented message.
    • b) A new fragmented message is started (0x81/0x01/0x82/0x02...) before the ongoing fragmented message has terminated.

Logging Improvements

  • Made logs for invalid opcodes easier to understand.
  • Moved noisy logs to the CURL_TRC_WS log level.
  • Unified the prefixes for WebSocket log messages: [WS] ...

New Debug Feature

  • Add env var CURL_WS_FORCE_ZERO_MASK in debug builds.
    • If set, it forces the bit mask applied to outgoing payloads to 0x00000000, which effectively means the payload is not masked at all. This drastically simplifies defining the expected <protocol> data in test cases.

Added Tests

  • 2700: Frame types
  • 2701: Invalid opcode 0x3
  • 2702: Invalid opcode 0xB
  • 2703: Invalid reserved bit RSV1 (replaces 2310)
  • 2704: Invalid reserved bit RSV2
  • 2705: Invalid reserved bit RSV3
  • 2706: Invalid masked server message
  • 2707: Peculiar frame sizes (part. replaces 2311)
  • 2708: Automatic PONG
  • 2709: No automatic PONG (replaces 2312)
  • 2710: Unsolicited PONG
  • 2711: Empty PING/PONG/CLOSE
  • 2712: Max sized PING/PONG/CLOSE
  • 2713: Invalid oversized PING (replaces 2307)
  • 2714: Invalid oversized PONG
  • 2715: Invalid oversized CLOSE
  • 2716: Invalid fragmented PING
  • 2717: Invalid fragmented PONG
  • 2718: Invalid fragmented CLOSE
  • 2719: Fragmented messages (part. replaces 2311)
  • 2720: Fragmented messages with empty fragments
  • 2721: Fragmented messages with interleaved pong
  • 2722: Invalid fragmented message without initial frame
  • 2723: Invalid fragmented message without final frame

Removed Tests

  • 2305: curl_ws_recv() loop reading three larger frames
    • This test involuntarily sent an invalid sequence of opcodes (0x01...,0x01...,0x81...) , but neither libcurl nor the test caught this! The correct sequence was tested in 2311 (0x01...,0x00...,0x80...). See below for 2311.
    • Validation of the opcode sequence was added to libcurl and is now tested in 2723.
    • Superseded by 2719 (fragmented message) and 2707 (large frames).
  • 2307: overlong PING payload
    • The tested PING payload length check was actually missing, but the test didn't catch this since it involuntarily sent an invalid opcode (0x19... instead of 0x89...) so that the expected error occurred, but for the wrong reason.
    • Superseded by 2713.
  • 2310: unknown reserved bit set in frame header
    • Superseded by 2703 and extended by 2704 and 2705.
  • 2311: curl_ws_recv() read fragmented message
    • Superseded by 2719 (fragmented message) and 2707 (large frames).
  • 2312: WebSockets no auto ping
    • Superseded by 2709.

Known Shortcomings

  • No tests for CURLOPT_WRITEFUNCTION.
  • No tests for sending of invalid frames/fragments.

@github-actions github-actions bot added the tests label Apr 22, 2025
@viscruocco viscruocco force-pushed the ws-tests-and-fixes branch 2 times, most recently from a455089 to 212a5a0 Compare April 22, 2025 14:00
@viscruocco viscruocco marked this pull request as draft April 22, 2025 14:32
@testclutch
Copy link

Analysis of PR #17136 at 212a5a00:

Test 2700 failed, which has NOT been flaky recently, so there could be a real issue in this PR. Note that this test has failed in 2 different CI jobs (the link just goes to one of them).

Test 2712 failed, which has NOT been flaky recently, so there could be a real issue in this PR. Note that this test has failed in 2 different CI jobs (the link just goes to one of them).

Test 2707 failed, which has NOT been flaky recently, so there could be a real issue in this PR. Note that this test has failed in 2 different CI jobs (the link just goes to one of them).

Test 2709 failed, which has NOT been flaky recently, so there could be a real issue in this PR. Note that this test has failed in 2 different CI jobs (the link just goes to one of them).

Test 2711 failed, which has NOT been flaky recently, so there could be a real issue in this PR. Note that this test has failed in 2 different CI jobs (the link just goes to one of them).

Test 2708 failed, which has NOT been flaky recently, so there could be a real issue in this PR.

Test 2710 failed, which has NOT been flaky recently, so there could be a real issue in this PR.

There are more failures, but that's enough from Gha.

Generated by Testclutch

@viscruocco viscruocco force-pushed the ws-tests-and-fixes branch from 212a5a0 to a9cf700 Compare May 7, 2025 07:05
@viscruocco viscruocco marked this pull request as ready for review May 7, 2025 07:06
@viscruocco viscruocco force-pushed the ws-tests-and-fixes branch 4 times, most recently from 32c9fdd to 5e41730 Compare May 12, 2025 06:33
@viscruocco viscruocco requested a review from vszakats May 21, 2025 07:06
@vszakats
Copy link
Member

@viscruocco LGTM from my end, can you rebase to the latest master to make the codespell issue go away?

@viscruocco viscruocco force-pushed the ws-tests-and-fixes branch from e54cfee to 04e7117 Compare May 28, 2025 10:13
@viscruocco
Copy link
Contributor Author

@viscruocco LGTM from my end, can you rebase to the latest master to make the codespell issue go away?

@vszakats Yay! Done! 😊

@viscruocco
Copy link
Contributor Author

btw: this PR is missing the WebSocket label for discoverability.

@bagder bagder closed this in d3594be Jun 2, 2025
@bagder
Copy link
Member

bagder commented Jun 2, 2025

Thanks!

vszakats added a commit that referenced this pull request Jun 11, 2025
Trying to avoid the occasional ~6-minute long delays seen in the OpenBSD
since last week. The long delay causes the CI job to timeout and fail:
https://github.com/curl/curl/actions/workflows/non-native.yml?page=2&query=branch%3Amaster

The exact reason is or test number is unknown. I base this attempt on
looking at the first occurrences and possible patches that may be
related.

The issue was first seen in CI within PR #17136:
```
[...]
Wed, 07 May 2025 07:10:30 GMT test 3014...[Check if %{num_headers} returns correct number of headers]
Wed, 07 May 2025 07:10:30 GMT s-p----e--- OK (1743 out of 1778, remaining: 00:02, took 0.195s, duration: 01:43)
Wed, 07 May 2025 07:10:30 GMT test 3016...[GET a directory using file://]
[long delay here]
Wed, 07 May 2025 07:16:17 GMT -------
Wed, 07 May 2025 07:16:17 GMT Error: The operation was canceled.
```
Ref: https://github.com/curl/curl/actions/runs/14877264415/job/41776966626#step:3:5566
Ref: https://github.com/curl/curl/actions/runs/14900320627/job/41850699301#step:3:5561 (next in PR)

Then in master, shortly after merging it via d3594be:
```
[...]
Mon, 02 Jun 2025 09:23:55 GMT test 3201...[HTTP GET when PROXY Protocol enabled and spoofed client IP]
Mon, 02 Jun 2025 09:23:55 GMT --p----e--- OK (1777 out of 1788, remaining: 00:00, took 0.222s, duration: 01:42)
Mon, 02 Jun 2025 09:23:55 GMT RUN: failed to start the HTTP/2 server
Mon, 02 Jun 2025 09:23:55 GMT test 3202...[HTTP-IPv6 GET with PROXY protocol with spoofed client IP]
[long delay here]
Mon, 02 Jun 2025 09:29:48 GMT --p----e--- OK (1778 out of 1788, remaining: 00:00, took 0.1
Mon, 02 Jun 2025 09:29:48 GMT Error: The operation was canceled.
```
Ref: https://github.com/curl/curl/actions/runs/15388587165/job/43292652793#step:3:5097
Ref: https://github.com/curl/curl/actions/runs/15390589464/job/43298911578#step:3:5097 (next in master)

Closes #17562
vszakats added a commit to vszakats/curl that referenced this pull request Jun 11, 2025
vszakats added a commit that referenced this pull request Jun 12, 2025
Narrowing down the test which may be causing the flaky 6-minute long
delays and CI failures.

Suggested-by: Calvin Ruocco
Ref: #17562 (comment)

Follow-up to 05db18e #17562
Follow-up to d3594be #17136

Closes #17588
denandz pushed a commit to denandz/curl that referenced this pull request Jun 21, 2025
Trying to avoid the occasional ~6-minute long delays seen in the OpenBSD
since last week. The long delay causes the CI job to timeout and fail:
https://github.com/curl/curl/actions/workflows/non-native.yml?page=2&query=branch%3Amaster

The exact reason is or test number is unknown. I base this attempt on
looking at the first occurrences and possible patches that may be
related.

The issue was first seen in CI within PR curl#17136:
```
[...]
Wed, 07 May 2025 07:10:30 GMT test 3014...[Check if %{num_headers} returns correct number of headers]
Wed, 07 May 2025 07:10:30 GMT s-p----e--- OK (1743 out of 1778, remaining: 00:02, took 0.195s, duration: 01:43)
Wed, 07 May 2025 07:10:30 GMT test 3016...[GET a directory using file://]
[long delay here]
Wed, 07 May 2025 07:16:17 GMT -------
Wed, 07 May 2025 07:16:17 GMT Error: The operation was canceled.
```
Ref: https://github.com/curl/curl/actions/runs/14877264415/job/41776966626#step:3:5566
Ref: https://github.com/curl/curl/actions/runs/14900320627/job/41850699301#step:3:5561 (next in PR)

Then in master, shortly after merging it via d3594be:
```
[...]
Mon, 02 Jun 2025 09:23:55 GMT test 3201...[HTTP GET when PROXY Protocol enabled and spoofed client IP]
Mon, 02 Jun 2025 09:23:55 GMT --p----e--- OK (1777 out of 1788, remaining: 00:00, took 0.222s, duration: 01:42)
Mon, 02 Jun 2025 09:23:55 GMT RUN: failed to start the HTTP/2 server
Mon, 02 Jun 2025 09:23:55 GMT test 3202...[HTTP-IPv6 GET with PROXY protocol with spoofed client IP]
[long delay here]
Mon, 02 Jun 2025 09:29:48 GMT --p----e--- OK (1778 out of 1788, remaining: 00:00, took 0.1
Mon, 02 Jun 2025 09:29:48 GMT Error: The operation was canceled.
```
Ref: https://github.com/curl/curl/actions/runs/15388587165/job/43292652793#step:3:5097
Ref: https://github.com/curl/curl/actions/runs/15390589464/job/43298911578#step:3:5097 (next in master)

Closes curl#17562
denandz pushed a commit to denandz/curl that referenced this pull request Jun 21, 2025
Narrowing down the test which may be causing the flaky 6-minute long
delays and CI failures.

Suggested-by: Calvin Ruocco
Ref: curl#17562 (comment)

Follow-up to 05db18e curl#17562
Follow-up to d3594be curl#17136

Closes curl#17588
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants