Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

small upload chunk sizes (below UPLOADBUFFER_MIN) #4826

Closed
MrdUkk opened this issue Jan 16, 2020 · 19 comments
Closed

small upload chunk sizes (below UPLOADBUFFER_MIN) #4826

MrdUkk opened this issue Jan 16, 2020 · 19 comments

Comments

@MrdUkk
Copy link

@MrdUkk MrdUkk commented Jan 16, 2020

There is a large changes how libcurl uploading chunked encoding data (between 7.39 and 7.68).
No static 16k buffer anymore, user is allowed to set it between 16k and 2mb in current version with CURLOPT_UPLOAD_BUFFERSIZE.
BUT it is limited in url.h and setopt.c to be not smaller than UPLOADBUFFER_MIN.
Sadly, but chunked real-time uploading of small data (1-6k) is NOT possible anymore in libcurl. only large and super-duper-fast transfers allowed.
libcurl for years was like swiss army knife in networking. it can do anything.... but not anymore :(

im doing http posting uploading with callback function and multipart-formdata chunked encoding.
in 7.39
read callback is flushing 1k of data to the network without problems withing milliseconds:
[13:25:16.722 size=1028 off=0
[13:25:16.844 size=1032 off=1028
[13:25:16.968 size=1032 off=2060
[13:25:17.088 size=1204 off=3092
[13:25:17.218 size=1032 off=4296
[13:25:17.337 size=1032 off=5328
no seconds lag between libcurl callback function invocation.

in 7.68 (with CURLOPT_UPLOAD_BUFFERSIZE set to UPLOADBUFFER_MIN)
[13:29:46.607 size=8037 off=0
[13:29:46.609 size=6408 off=8037
[13:29:46.610 size=1778 off=14445
[13:29:48.607 size=8190 off=16223
[13:29:48.609 size=7884 off=24413
[13:29:48.610 size=298 off=32297
see the gap between 46 and 48 second. It's not real time anymore, and no option to set buffer sizes below 16k.

@bagder
Copy link
Member

@bagder bagder commented Jan 16, 2020

Sadly, but chunked real-time uploading of small data (1-6k) is NOT possible anymore in libcurl

The size of the buffer curl uses does not limit how small data chunks you return in the read callback. It shouldn't affect "real-time uploading" at all. If you see a performance degradation it is because of a bug somewhere, not because of the buffer size.

libcurl for years was like swiss army knife in networking. it can do anything.... but not anymore :(

That's a pretty wild statement and of course completely untrue. libcurl can do more now than it ever did before. Does it still have bugs or issues? Yes. And we do our best to fix them as soon as we become aware of them.

Can you provide us with an example source code that reproduces this? You don't give a lot of details.

@bagder
Copy link
Member

@bagder bagder commented Jan 17, 2020

Have you tried changing UPLOADBUFFER_MIN to something smaller like 1024 and checked if that makes a difference?

@MrdUkk
Copy link
Author

@MrdUkk MrdUkk commented Jan 17, 2020

i can provide test code for msvc2015 (win32) platform. okay?

is it safe to set UPLOADBUFFER_MIN = 2048 or 4096?

@bagder
Copy link
Member

@bagder bagder commented Jan 17, 2020

I'm not asking you to run this in production, I'm only curios if having a smaller buffer actually changes anything. I'll still need to know how to reproduce the issue though.

@MrdUkk
Copy link
Author

@MrdUkk MrdUkk commented Jan 17, 2020

okay, will write test code.

the clue here is method how newest libcurl versions send chunked data. (old versions send it with each callback invocation that filled buffer (1-2kbytes of data), the newest one - send data only then buffer is filled fully in callback.

@bagder
Copy link
Member

@bagder bagder commented Jan 17, 2020

Again: you have not told us details.

Are you talking about formpost uploading with that callback or what are you doing? What protocol? What platform?

@MrdUkk
Copy link
Author

@MrdUkk MrdUkk commented Jan 20, 2020

libcurl-post.log
please rename file extension to .cpp (github won't allow upload direct this file).
compiles under MSVC2015 Win32. this is minimal client-side PoC.
Monitor packets send to server with some kind of network sniffer (wireshark for example).
but if this is problem - i can write minimal server example.

@bagder
Copy link
Member

@bagder bagder commented Jan 21, 2020

I don't easily build on Windows so a Windows-specific example isn't very convenient for me.

But your code does use multipart formpost so that at least answered that question. I think the delay you've reported here is due to changes in those internals rather than the size of the upload buffer.

@monnerat, with your #4833 fix, does the code stop the looping to fill up the buffer before it sends off data?

@monnerat
Copy link
Contributor

@monnerat monnerat commented Jan 21, 2020

with your #4833 fix, does the code stop the looping to fill up the buffer before it sends off data?

No. This would probably affect performance, as building the "hidden" parts of the form may sometimes return as few as 2 bytes (mainly CRLFs). In a chunked transfer, this adds an important overhead. In all cases, multiplying the tcp packets would do so too.

I agee with you that if this problem is reproducible, we should investigate. But looking at the numbers above:

[13:29:46.607 size=8037 off=0
[13:29:46.609 size=6408 off=8037    2.136Mb/s
[13:29:46.610 size=1778 off=14445   1.778Mb/s
[13:29:48.607 size=8190 off=16223   0.411Mb/s
[13:29:48.609 size=7884 off=24413   3.942Mb/s
[13:29:48.610 size=298 off=32297

We see that the form building is normally capable of processing 2-4 Mb/s and the "black sheep" 0.411 Mb/s case is not (yet) explained.

Note also that the libcurl-post.log program above articially limits the callback execution rate to 10 per sec by waiting in the read callback using WaitForMultipleObjects(). But the program that generated the above numbers might do it otherwise...

@MrdUkk
Copy link
Author

@MrdUkk MrdUkk commented Jan 22, 2020

Dear sirs! i mention what i'm doing in my first post. quotation: "im doing http posting uploading with callback function and multipart-formdata chunked encoding."

I will back later (~few days) with example compiling on linux, gcc. if that's a clue
the key point is not sending fast using all available bandwidth. the key here is to send each chunk (1-2kbytes) of data not waiting for 100% libcurl buffer filling. as in version 7.39 . It is some kind of realtime communication over http, so latency will be unacceptable if using up to date libcurl versions (above currently in use 7.39) . it is clearly seen in network sniffer.

Maybe some new option to set libcurl logic like CHUNKED_UPLOAD_BUFFER_SEND_ASIS_MODE = 1

@bagder
Copy link
Member

@bagder bagder commented Jan 22, 2020

i mention what i'm doing in my first post.

You said that in a different issue (#4813). You didn't specify that this issue was the same use case or setup - which is why I asked. But now we know.

@bagder
Copy link
Member

@bagder bagder commented Jan 22, 2020

No. This would probably affect performance, as building the "hidden" parts of the form may sometimes return as few as 2 bytes (mainly CRLFs). In a chunked transfer, this adds an important overhead. In all cases, multiplying the tcp packets would do so too.

It would multiply send() calls, which aren't necessary mapped 1:1 to TCP packets (Nagle's algorithm).

And even if it did, I would consider that a smaller problem than what we have now. And a problem we could work on optimizing.

I agee with you that if this problem is reproducible, we should investigate.

Imagine a (very) slow disk reading function as a callback. We call the callback, it gets 12 bytes back because it reads really slow, the callback returns that so it can get sent over the wire. It gets called again, and now it gets another 12 bytes etc. Every call takes a bunch of milliseconds. Possibly even many.

If we keep doing that and not send the data early, the code will eventually fill up the buffer and send it off, but with a significant delay. And a delay that we don't want and one that we state in documentation that we don't impose. It is a bug.

@monnerat
Copy link
Contributor

@monnerat monnerat commented Jan 27, 2020

@MrdUkk

if that's a clue the key point is not sending fast using all available bandwidth

Did you try to use option CURLOPT_MAX_SEND_SPEED_LARGE rather than pausing or blocking your reads ? From what I understand from your trials and comments, this is the option you might use to limit bandwidth.

@bagder

It would multiply send() calls, which aren't necessary mapped 1:1 to TCP packets (Nagle's algorithm).

You're right. However this does not apply when send() calls are sparse (and this is what is wanted).

And a delay that we don't want and one that we state in documentation that we don't impose. It is a bug.

Not really. I would say it's a data size optimization strategy that goes too far regarding libcurl's expectations.

I'll push a commit in my currently active PR for that. My idea is to limit to a single "read" callback execution per output buffer for curl_mime_filedata() and curl_mime_data_cb() when possible (encoded data may require more). Please be aware that we'll have a 500% data size overhead to transmit chunked curl_mime_data_cb() reads of size 1.

@bagder
Copy link
Member

@bagder bagder commented Jan 27, 2020

Please be aware that we'll have a 500% data size overhead to transmit chunked curl_mime_data_cb() reads of size 1.

... and that's still exactly what libcurl does if you do chunked uploading over HTTP.

@MrdUkk
Copy link
Author

@MrdUkk MrdUkk commented Jan 27, 2020

Okay, there is linux (gcc) version PoC.
curlpost-linux.log

@MrdUkk
Copy link
Author

@MrdUkk MrdUkk commented Jan 27, 2020

@monnerat
this option is not for me. in samples above i set static 100ms interpacket delay for example only. In real-world application NetworkWorkerThread() is driven by signals from other thread. I don't want pauses or delays in some third party code (libcurl). All proper delays already calculated in my program workflow. What libcurl should do is send data over the network when asked to do so by events. I need very low latency, not bandwidth (speed).

@monnerat
Copy link
Contributor

@monnerat monnerat commented Jan 27, 2020

this option is not for me.

Never mind!

I just tested your curlpost-linux with branch https://github.com/monnerat/curl/tree/mime-abort-pause and looking at packet times in wireshark, it seems to do what you want.

@MrdUkk
Copy link
Author

@MrdUkk MrdUkk commented Jan 27, 2020

@monnerat
thank you! i confirm -> working fully again!
and aborting while transfer works too!

@MrdUkk MrdUkk closed this Jan 27, 2020
@monnerat
Copy link
Contributor

@monnerat monnerat commented Jan 27, 2020

Warning: this has not yet landed in master.

bagder added a commit that referenced this issue Mar 1, 2020
Input buffer filling may delay the data sending if data reads are slow.
To overcome this problem, file and callback data reads do not accumulate
in buffer anymore. All other data (memory data and mime framing) are
considered as fast and still concatenated in buffer.
As this may highly impact performance in terms of data overhead, an early
end of part data check is added to spare a read call.
When encoding a part's data, an encoder may require more bytes than made
available by a single read. In this case, the above rule does not apply
and reads are performed until the encoder is able to deliver some data.

Tests 643, 644, 645, 650 and 654 have been adapted to the output data
changes, with test data size reduced to avoid the boredom of long lists of
1-byte chunks in verification data.
New test 664 checks mimepost using single-byte read callback with encoder.
New test 665 checks the end of part data early detection.

Fixes #4826
Reported-by: MrdUkk on github
bagder added a commit that referenced this issue Mar 7, 2020
Input buffer filling may delay the data sending if data reads are slow.
To overcome this problem, file and callback data reads do not accumulate
in buffer anymore. All other data (memory data and mime framing) are
considered as fast and still concatenated in buffer.
As this may highly impact performance in terms of data overhead, an early
end of part data check is added to spare a read call.
When encoding a part's data, an encoder may require more bytes than made
available by a single read. In this case, the above rule does not apply
and reads are performed until the encoder is able to deliver some data.

Tests 643, 644, 645, 650 and 654 have been adapted to the output data
changes, with test data size reduced to avoid the boredom of long lists of
1-byte chunks in verification data.
New test 667 checks mimepost using single-byte read callback with encoder.
New test 668 checks the end of part data early detection.

Fixes #4826
Reported-by: MrdUkk on github
@lock lock bot locked as resolved and limited conversation to collaborators May 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.