Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange delays on data channel request/response round trip implementation #382

Closed
ozwaldorf opened this issue Oct 5, 2023 · 11 comments · Fixed by #383
Closed

Strange delays on data channel request/response round trip implementation #382

ozwaldorf opened this issue Oct 5, 2023 · 11 comments · Fixed by #383

Comments

@ozwaldorf
Copy link

ozwaldorf commented Oct 5, 2023

In an experiment to replace webrtc-rs with str0m (as many others are looking to do), I ran into some strange delays happening when a javascript client and str0m server have a request/response round trip, where it takes the client ~1.5 seconds to receive the first payload after sending the request. The example code that contains the behavior can be found here: https://github.com/ozwaldorf/str0m-experiment

The request flow looks like this:

sequenceDiagram
    Client-->>Server: SDP (HTTP)
    Client-->>Server: WebRTC Connection
    Client-->>Server: Open data channel
    Server->>Client: Send total number of blocks
    loop Session
    Client->>+Server: Block number request
    Client-->Server: Strange 1.5s delay
    Server->>Client: Chunk 0
    Server->>Client: Chunk 1, 2, ..., 7, 8
    end
Loading

A client receives the total number of blocks for some content, and can request each block id. Each block is <=256KiB, and is chunked into 32KiB payloads and sent over the webrtc datachannel.

There is a very long time to first byte after sending each request for the next block, around 1.5 seconds, from when the javascript client sends the request to when it receives the first byte. Compared to similar code written with webrtc-rs, there aren't any delays with each ttfb being ~1ms.

The str0m server takes ~1ms to receive the request payload as a datachannel message, and ~220ms to write the 8 chunks to the rtc state and for the last chunk to be sent out over the socket. Should note, ~200ms of that time is a delay between sending the first packet and the second packet, and the rest of the other packets with the payloads only take a few ms between each other to send after that. A possible clue that I was able to find while inspecting network packets (via wireshark), is that there always seems to be a STUN request happening while there are no outgoing payload packets from the server.

I'd appreciate a glance over the example code, and any recommendations if there is any misuse of the library, but I haven't been able to find any glaring issues, thanks!

@algesten
Copy link
Owner

algesten commented Oct 5, 2023

Hi @ozwaldorf! Welcome to str0m!

I think the problem in your code is here: https://github.com/ozwaldorf/str0m-experiment/blob/main/server/src/driver.rs#L85

This wait will always happen, also when you are sending new data. The key here is that when you use the webrtc_send macro, you must immediately follow up with a handle_output. Or in str0m terms: a data channel write must immediately be followed by poll_output.

In the chat example we have this max timeout of 100ms, which means a potential write would be max delayed by 100ms. https://github.com/algesten/str0m/blob/main/examples/chat.rs#L130 but this is not optimal for a real server.

When you use tokio, I recommend using the tokio::select! macro here to make row 85 wait for one of three things.

  1. The timeout (which in this case doesn't need to 100ms cap)
  2. Incoming network data.
  3. Wake-up after data channel write

Whichever case happens, it must swiftly loop back to handle_output.

Hope that helps!

@ozwaldorf
Copy link
Author

ozwaldorf commented Oct 5, 2023

Hey @algesten, thank you for the swift response!

When you use tokio, I recommend using the tokio::select! macro here to make row 85 wait for one of three things.

  1. The timeout (which in this case doesn't need to 100ms cap)
  2. Incoming network data.
  3. Wake-up after data channel write

Interestingly, I had tried previously using this exact flow with a tokio::sync::Notify as a waker, suspecting that the socket read was hanging until it received the stun and allowed the rtc state to progress, but this didn't have any effect and still produced the delay, so I had ended up removing it from the experiment. It's worth mentioning though, that the notify behavior is to only queue once if the loop is not actively awaiting the notify, perhaps there needs to be a queue to skip the read task for every call to write and the subsequent receive from the socket?

I've updated the example source with the notify waker put back in, if you want to take a look again (ozwaldorf/str0m-experiment@241ab16) 🙏

Another solution I tried was using a channel for outgoing payloads instead of writing to the connection directly and waiting for a notify. The driver would select over either receiving from the socket, the timeout, or receiving an outgoing payload and calling write there, but this also didn't have any effect on the delay either

@algesten
Copy link
Owner

algesten commented Oct 5, 2023

This looks better.

However, there's a big difference between this and the chat example – this loop drives all clients, whereas the way chat.rs is structured there is one such loop per client (this is also how we use it in our own production system with tokio).

I'm trying to think what the consequences of that would be.

  • It would not take advantage of a multi-core CPU, because there is one task.
  • There is unnecessary poll_output() because a data channel write would poll_output() on all clients (even though they weren't used for the data channel write).

Computers are fast, so for a test this maybe doesn't matter. I think it might scale badly if you get many clients.

@ozwaldorf
Copy link
Author

ozwaldorf commented Oct 6, 2023

Definitely agree, there is wasted iteration for idle clients while others are still progressing, but that shouldn't be too impactful for this simple experiment as the test is over a local connection for a single client. With that being said, it's still super strange that it's taking ~1.5s for the first chunk to be received after sending the request to the server. With webrtc-rs, this time to first chunk is ~1ms for a local connection, so there's definitely something else off as there's several orders of magnitude difference.

@xnorpx
Copy link
Collaborator

xnorpx commented Oct 7, 2023

It's not clear to me what you mean with request here.

Are you saying once dtls and ice is setup and you enter the "loop" that for every block number request it takes 1.5s to get 8 chunks back to the client?

@ozwaldorf
Copy link
Author

ozwaldorf commented Oct 7, 2023

Are you saying once dtls and ice is setup and you enter the "loop" that for every block number request it takes 1.5s to get 8 chunks back to the client?

To follow in the flow chart, after the setup and the loop is entered and the request is sent, it takes around 1.5s to receive the first chunk back, and the other 7 chunks are received very quickly after with no excess delays. Every time it loops, the client sends the request, and there is the same delay on the first chunk consistently. Apologies if that wasn't clear before! I updated the flow chart to make the delay clearer as well.

@xnorpx
Copy link
Collaborator

xnorpx commented Oct 7, 2023

You say that there was 200ms between first and second chunk. Where is that time spent, in str0m?

@xnorpx
Copy link
Collaborator

xnorpx commented Oct 7, 2023

I am no expert on datachannels but things I would try.

Check how webrtc-rs datachannel is configured then configure str0m the same.

Configure datachannel as unordered/unreliable.

Add bias in your select to ensure you prioritize incoming packets before sending new ones.

algesten added a commit that referenced this issue Oct 7, 2023
This fixes a bad bug where a lot of packets were thrown away
for big SCTP payload.

Close #382
@algesten
Copy link
Owner

algesten commented Oct 7, 2023

send frame
2023-10-07T12:08:12.869Z ERROR [str0m_experiment::driver] payload: 32769
2023-10-07T12:08:12.869Z ERROR [str0m_experiment::driver] payload: 32769
2023-10-07T12:08:12.869Z ERROR [str0m_experiment::driver] payload: 32769
2023-10-07T12:08:12.869Z ERROR [str0m_experiment::driver] payload: 32769
2023-10-07T12:08:12.869Z ERROR [str0m_experiment::driver] payload: 32769
2023-10-07T12:08:12.869Z ERROR [str0m_experiment::driver] payload: 32769
2023-10-07T12:08:12.869Z ERROR [str0m_experiment::driver] payload: 32769
2023-10-07T12:08:12.869Z ERROR [str0m_experiment::driver] payload: 32769
2023-10-07T12:08:12.870Z ERROR [str0m_experiment::driver] send: 1185
2023-10-07T12:08:13.071Z ERROR [str0m_experiment::driver] send: 65
2023-10-07T12:08:13.072Z ERROR [str0m_experiment::driver] recv: 65
2023-10-07T12:08:13.072Z ERROR [str0m_experiment::driver] send: 1185
2023-10-07T12:08:13.278Z ERROR [str0m_experiment::driver] recv: 65
2023-10-07T12:08:13.279Z ERROR [str0m_experiment::driver] send: 1185
2023-10-07T12:08:13.279Z ERROR [str0m_experiment::driver] recv: 69
2023-10-07T12:08:13.279Z ERROR [str0m_experiment::driver] send: 1185
2023-10-07T12:08:13.279Z ERROR [str0m_experiment::driver] recv: 73
2023-10-07T12:08:13.279Z ERROR [str0m_experiment::driver] send: 1185
2023-10-07T12:08:13.280Z ERROR [str0m_experiment::driver] recv: 73
…
2023-10-07T12:08:14.344Z ERROR [str0m_experiment::driver] recv: 89

To flush the contents out we waited 12.869 to 14.344 = 1.475 seconds.

The above usage pattern is one we haven't tested much, it loads up str0m with 200k+ of data then flushes it all out. This triggered a bug where we throw away half the SCTP packets. I imagine the reason it worked at all is because of resends.

This test fails when I fix str0m, but it seems to be client specific:

Uncaught DOMException: Failed to execute 'appendBuffer' on 'SourceBuffer': This SourceBuffer is still processing an 'appendBuffer' or 'remove' operation.

@algesten
Copy link
Owner

algesten commented Oct 7, 2023

@ozwaldorf thanks for reporting this bug!

algesten added a commit that referenced this issue Oct 7, 2023
This fixes a bad bug where a lot of packets were thrown away
for big SCTP payload.

Close #382
algesten added a commit that referenced this issue Oct 7, 2023
This fixes a bad bug where a lot of packets were thrown away
for big SCTP payload.

Close #382
@ozwaldorf
Copy link
Author

Awesome work @algesten! Thanks for the fix

algesten added a commit that referenced this issue Oct 8, 2023
This fixes a bad bug where a lot of packets were thrown away
for big SCTP payload.

Close #382
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants