-
-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
small upload chunk sizes (below UPLOADBUFFER_MIN) #4826
Comments
The size of the buffer curl uses does not limit how small data chunks you return in the read callback. It shouldn't affect "real-time uploading" at all. If you see a performance degradation it is because of a bug somewhere, not because of the buffer size.
That's a pretty wild statement and of course completely untrue. libcurl can do more now than it ever did before. Does it still have bugs or issues? Yes. And we do our best to fix them as soon as we become aware of them. Can you provide us with an example source code that reproduces this? You don't give a lot of details. |
Have you tried changing |
i can provide test code for msvc2015 (win32) platform. okay? is it safe to set UPLOADBUFFER_MIN = 2048 or 4096? |
I'm not asking you to run this in production, I'm only curios if having a smaller buffer actually changes anything. I'll still need to know how to reproduce the issue though. |
okay, will write test code. the clue here is method how newest libcurl versions send chunked data. (old versions send it with each callback invocation that filled buffer (1-2kbytes of data), the newest one - send data only then buffer is filled fully in callback. |
Again: you have not told us details. Are you talking about formpost uploading with that callback or what are you doing? What protocol? What platform? |
libcurl-post.log |
I don't easily build on Windows so a Windows-specific example isn't very convenient for me. But your code does use multipart formpost so that at least answered that question. I think the delay you've reported here is due to changes in those internals rather than the size of the upload buffer. @monnerat, with your #4833 fix, does the code stop the looping to fill up the buffer before it sends off data? |
No. This would probably affect performance, as building the "hidden" parts of the form may sometimes return as few as 2 bytes (mainly CRLFs). In a chunked transfer, this adds an important overhead. In all cases, multiplying the tcp packets would do so too. I agee with you that if this problem is reproducible, we should investigate. But looking at the numbers above:
We see that the form building is normally capable of processing 2-4 Mb/s and the "black sheep" 0.411 Mb/s case is not (yet) explained. Note also that the libcurl-post.log program above articially limits the callback execution rate to 10 per sec by waiting in the read callback using |
Dear sirs! i mention what i'm doing in my first post. quotation: "im doing http posting uploading with callback function and multipart-formdata chunked encoding." I will back later (~few days) with example compiling on linux, gcc. if that's a clue Maybe some new option to set libcurl logic like CHUNKED_UPLOAD_BUFFER_SEND_ASIS_MODE = 1 |
You said that in a different issue (#4813). You didn't specify that this issue was the same use case or setup - which is why I asked. But now we know. |
It would multiply send() calls, which aren't necessary mapped 1:1 to TCP packets (Nagle's algorithm). And even if it did, I would consider that a smaller problem than what we have now. And a problem we could work on optimizing.
Imagine a (very) slow disk reading function as a callback. We call the callback, it gets 12 bytes back because it reads really slow, the callback returns that so it can get sent over the wire. It gets called again, and now it gets another 12 bytes etc. Every call takes a bunch of milliseconds. Possibly even many. If we keep doing that and not send the data early, the code will eventually fill up the buffer and send it off, but with a significant delay. And a delay that we don't want and one that we state in documentation that we don't impose. It is a bug. |
Did you try to use option CURLOPT_MAX_SEND_SPEED_LARGE rather than pausing or blocking your reads ? From what I understand from your trials and comments, this is the option you might use to limit bandwidth.
You're right. However this does not apply when send() calls are sparse (and this is what is wanted).
Not really. I would say it's a data size optimization strategy that goes too far regarding libcurl's expectations. I'll push a commit in my currently active PR for that. My idea is to limit to a single "read" callback execution per output buffer for |
... and that's still exactly what libcurl does if you do chunked uploading over HTTP. |
Okay, there is linux (gcc) version PoC. |
@monnerat |
Never mind! I just tested your curlpost-linux with branch https://github.com/monnerat/curl/tree/mime-abort-pause and looking at packet times in wireshark, it seems to do what you want. |
@monnerat |
Warning: this has not yet landed in master. |
Input buffer filling may delay the data sending if data reads are slow. To overcome this problem, file and callback data reads do not accumulate in buffer anymore. All other data (memory data and mime framing) are considered as fast and still concatenated in buffer. As this may highly impact performance in terms of data overhead, an early end of part data check is added to spare a read call. When encoding a part's data, an encoder may require more bytes than made available by a single read. In this case, the above rule does not apply and reads are performed until the encoder is able to deliver some data. Tests 643, 644, 645, 650 and 654 have been adapted to the output data changes, with test data size reduced to avoid the boredom of long lists of 1-byte chunks in verification data. New test 664 checks mimepost using single-byte read callback with encoder. New test 665 checks the end of part data early detection. Fixes #4826 Reported-by: MrdUkk on github
Input buffer filling may delay the data sending if data reads are slow. To overcome this problem, file and callback data reads do not accumulate in buffer anymore. All other data (memory data and mime framing) are considered as fast and still concatenated in buffer. As this may highly impact performance in terms of data overhead, an early end of part data check is added to spare a read call. When encoding a part's data, an encoder may require more bytes than made available by a single read. In this case, the above rule does not apply and reads are performed until the encoder is able to deliver some data. Tests 643, 644, 645, 650 and 654 have been adapted to the output data changes, with test data size reduced to avoid the boredom of long lists of 1-byte chunks in verification data. New test 667 checks mimepost using single-byte read callback with encoder. New test 668 checks the end of part data early detection. Fixes #4826 Reported-by: MrdUkk on github
There is a large changes how libcurl uploading chunked encoding data (between 7.39 and 7.68).
No static 16k buffer anymore, user is allowed to set it between 16k and 2mb in current version with CURLOPT_UPLOAD_BUFFERSIZE.
BUT it is limited in url.h and setopt.c to be not smaller than UPLOADBUFFER_MIN.
Sadly, but chunked real-time uploading of small data (1-6k) is NOT possible anymore in libcurl. only large and super-duper-fast transfers allowed.
libcurl for years was like swiss army knife in networking. it can do anything.... but not anymore :(
im doing http posting uploading with callback function and multipart-formdata chunked encoding.
in 7.39
read callback is flushing 1k of data to the network without problems withing milliseconds:
[13:25:16.722 size=1028 off=0
[13:25:16.844 size=1032 off=1028
[13:25:16.968 size=1032 off=2060
[13:25:17.088 size=1204 off=3092
[13:25:17.218 size=1032 off=4296
[13:25:17.337 size=1032 off=5328
no seconds lag between libcurl callback function invocation.
in 7.68 (with CURLOPT_UPLOAD_BUFFERSIZE set to UPLOADBUFFER_MIN)
[13:29:46.607 size=8037 off=0
[13:29:46.609 size=6408 off=8037
[13:29:46.610 size=1778 off=14445
[13:29:48.607 size=8190 off=16223
[13:29:48.609 size=7884 off=24413
[13:29:48.610 size=298 off=32297
see the gap between 46 and 48 second. It's not real time anymore, and no option to set buffer sizes below 16k.
The text was updated successfully, but these errors were encountered: