-
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Deflate error after all content was received #2719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If this is a broken stream then curl should report error and not silently ignore it. (I'm not saying it is, as I haven't researched it deeper.) When I try that command line I get a test file in my directory that is 146222 bytes big which looks like it could be the whole file so the error could be something in the final bits of the stream... |
The error is annoying in a browser, since going to that page then throws up an error page instead of the page contents. Ignoring that error would be bad in contexts where it's more serious. I added some printfs, and looks like it's because that commit removed the restarting support when the state was ZLIB_UNINIT. Is it legal to have multiple deflate streams? |
@monnerat, can you figure out if there's anything we can improve here? |
This particular case uses the "deflate" algorithm in its deviant form (raw data), with 2 extra bytes appended (0x03, 0x00). The semantics of these bytes are unknown to me, but I can see they are constants even if the page content changes. This was not detected before the commit mentioned above, since all trailing data were silently ignored, even if consisting of thousands of bytes. Curl deals with 3 kinds of zlib-supported data:
What is strange here is this seems to come from an Apache server: I can suspect it is a reverse proxy to a M$ software, or the encoding is processed by the application itself. Is it another deviant form of data ? Do we have to support it ? Since I can't understand what these additional bytes are, we need details about what to support. And write down these in our own documentation since these forms are not "official". In any case, unexpected trailing data should not be ignored without error, since this may result in considering the received data as complete while it is not. If you have details about expected data in deviant forms we should support, I can investigate for an implementation. Else I have no serious solution. There is no support for multiple streams in curl, since it is hard to figure how to process them. In our current case, the trailing data is obviously not a second stream. |
Is it another deviant form of data ? Do we have to support it ?
I have no details on the data itself, but I can confirm that all
browsers I have available go to the page without issue. I read that
curl tries to match browser behavior, and since I myself use curl in a
browser, getting the site working would be good.
I agree that silently ignoring them is not a good solution, and more
information is needed.
|
It's sometimes hard to mimic browsers since they are often very lenient with errors are never very strict on enforcing what's right or wrong (if it would make users annoyed). In this case where browsers show the full content without errors and even older curls do that I agree that it feels highly annoying that it now instead shows an error. This specific content is delivered with chunked-encoding. Can we take the fact that it has delivered the final terminating chunk as an additional clue that we have indeed seen the end of the "document" and then be lenient with gzip errors? |
I dont' like it at all. First, because chunked encoding processing is done at the input level, while other encodings (gtip, deflate, brotli) are handled at the output level (they really should not...). Next, because there can be a lot of unprocessed data (up to buffer size) at the write time. In addition, nothing tells us we cannot have it without chunked encoding... and maybe without data size indication too. I would prefer a solution being tolerant about the trailer length (i.e.: <= 4 bytes) and only in the case of deviant deflate data. The best solution remains to identify the meaning of these 2 additional bytes, in order to keep control of what curl does: without it, we are navigating in the fog, without GPS :-) Maybe browser code can tell us details of their implementation, but that goes out of my skills. |
As I said, browsers are very lenient when it comes to detecting the end of a document and they will refrain from triggering an error or alert and rather just hope all data got in - and if in doubt they rather just mark the connection as not suitable for reuse. A while ago I worked on trying to make Firefox's treatment of content-length and "framing" stricter, but I ended up having to surrender because the web is already broken in so many regards in that area so in the end we basically had to let it keep ignoring chunked-encoded streams that are cut off before the last chunk, weirdo gzip stream problems, gzip streams without the trailer etc. If we were to act like the browsers we would probably: ignore all gzip problems, stop reading at that point of the error and mark the connection for not-reuse. |
stooq.com works, but I hit another site that fails with master: ebay. curl -o test -k 'https://www.ebay.com/sch/i.html?LH_CAds=&_ex_kw=&_fpos=&_fspt=1&_mPrRngCbx=1&_nkw=150+in+1+gba&_sacat=&_sadis=&_sop=12&_udhi=&_udlo=&rmvSB=true&_fosrp=1' --compressed |
Oh, even the front page reproduces, no need for the long url: |
I cannot reproduce with the ebay home page: this works as expected here.
This is clearly a server bug. |
Agreed, that's a server bug. I can't seem to find any way to contact the
responsible people though. Would it be possible to add a curl option to
ignore any such buggy trailing data? Not being able to access ebay is
bad.
|
I'm not convinced this is the good and right thing to do. Therefore I request other developers opinion.
You're still able to access it uncompressed and I think this is the best thing to do when facing a server compression bug. |
I cannot tell it apart from legit (23) write errors. |
Would it be satisfactory for you to get CURLE_BAD_CONTENT_ENCODING (61: Unrecognized or bad HTTP Content or Transfer-Encoding) in our cases ? |
Not sure if that would also have legit errors. A new "retry without compression" error would work. However, forcing a page reload without compression would give a worse experience vs other browsers, so ideally I'd want these trailers to be ignored. It might cause a visible delay up to one second, plus the time taken by an uncompressed download, and then having to keep track which sites need to disable compression. |
(well, a CURLE_RETRY_WITHOUT_COMPRESSION would also achieve the ideal goal, since I could then ignore that error without fear of false positives) |
The following situations may currently return
|
Okay, all those are errors I want to show to the user, so they would be false positives from the ignoring POV. |
That would not be an error code name I would approve of. We identify the problem with the error codes. The correct next action is then up to the application, I don't think the error code itself should suggest that as we can't possibly know what the application actually wants to do. I think we should rather increase our efforts to figure out when we get these errors after all "relevant" data has already been returned so that we can ignore the error (but possibly mark the connection for close since it's in a dubious state). |
Should I open a new issue so this doesn't get forgotten, or can this be reopened? |
I was trying to find what's that broken server software and while ebay hid their server, |
I did this
curl -o test -k https://stooq.com/q/?s=cbl_e.us -v --compressed
Ever since dbcced8, this download fails with "curl: (23) Failed writing data".
I expected the following
Successful download. It may be curl's or Apache's bug, but in either case curl should handle it gracefully.
curl/libcurl version
curl 7.61.0-DEV (x86_64-unknown-linux-gnu) libcurl/7.61.0-DEV OpenSSL/1.0.2o zlib/1.2.3 librtmp/2.3
Release-Date: [unreleased]
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp
Features: Largefile NTLM NTLM_WB SSL libz TLS-SRP UnixSockets HTTPS-proxy
operating system
Linux x86_64
The text was updated successfully, but these errors were encountered: