Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP --max-filesize counts ignored body data #14899

Closed
jay opened this issue Sep 13, 2024 · 9 comments
Closed

HTTP --max-filesize counts ignored body data #14899

jay opened this issue Sep 13, 2024 · 9 comments
Labels

Comments

@jay
Copy link
Member

jay commented Sep 13, 2024

From #14440 (comment) user @MasterInQuestion reports that --max-filesize is applied to the ignored content of a redirect response. For example curl -v -L --max-filesize 1 http://localhost:8080 and the server's redirect reply has 5 content bytes:

while true; \
do perl -e 'print ("HTTP/1.1 301\r\nContent-Length: 5\r\nLocation: http://localhost:8080/abc\r\n\r\napple")' | nc -4l localhost 8080; \
done

curl will fail because 5 bytes of content (even though it's ignored) is more than 1:

* STATE: DO => DID handle 0x76bd88; line 2164
* STATE: DID => PERFORMING handle 0x76bd88; line 2282
< HTTP/1.1 301
< Content-Length: 5
< Location: http://localhost:8080/abc
* Maximum file size exceeded

The failure is caused by a check in Curl_http_size which AFAICT compares even for body bytes that are not "real" so to speak:

curl/lib/http.c

Lines 3273 to 3294 in a3bd1dd

/* Content-Length must be ignored if any Transfer-Encoding is present in the
response. Refer to RFC 7230 section 3.3.3 and RFC2616 section 4.4. This is
figured out here after all headers have been received but before the final
call to the user's header callback, so that a valid content length can be
retrieved by the user in the final call. */
CURLcode Curl_http_size(struct Curl_easy *data)
{
struct SingleRequest *k = &data->req;
if(data->req.ignore_cl || k->chunk) {
k->size = k->maxdownload = -1;
}
else if(k->size != -1) {
if(data->set.max_filesize &&
k->size > data->set.max_filesize) {
failf(data, "Maximum file size exceeded");
return CURLE_FILESIZE_EXCEEDED;
}
Curl_pgrsSetDownloadSize(data, k->size);
k->maxdownload = k->size;
}
return CURLE_OK;
}

There is a separate check that is done in cw_download_write which deals with real body bytes, however by that point the download is already taking place.

curl/lib/sendf.c

Lines 326 to 344 in a3bd1dd

if(excess_len) {
if(!data->req.ignorebody) {
infof(data,
"Excess found writing body:"
" excess = %zu"
", size = %" FMT_OFF_T
", maxdownload = %" FMT_OFF_T
", bytecount = %" FMT_OFF_T,
excess_len, data->req.size, data->req.maxdownload,
data->req.bytecount);
connclose(data->conn, "excess found in a read");
}
}
else if(nwrite < nbytes) {
failf(data, "Exceeded the maximum allowed file size "
"(%" FMT_OFF_T ") with %" FMT_OFF_T " bytes",
data->set.max_filesize, data->req.bytecount);
return CURLE_FILESIZE_EXCEEDED;
}

In other words, the former check causes curl to error immediately after the headers are received and the content length is known. The user may want this behavior because if they, say, limit to 1MB then maybe they don't want to download the first 1MB of a 100MB file. OTOH it seems that cw_download_write works in exactly that way, it will allow downloading to the maximum file size.

So if the former check is removed, which is my initial inclination, there will be a change in behavior.

@MasterInQuestion
Copy link

[ jay @ CE 2024-09-13 15:37:01 UTC:
https://github.com/curl/curl/issues/14899#issue-2525121254
    Up: https://github.com/curl/curl/issues/14440#issuecomment-2346734573

    @MasterInQuestion reports that:
    "--max-filesize" is applied to the ignored content, of redirect response.

    E.g.
    curl -v --max-filesize 1 -L "http://localhost:8080/"
    ; and the server's redirect reply has 5 content bytes:
[[

	while true;
	do {
	perl -e '
	$, = "\r\n";
	print (
	"HTTP/1.1 301",
	"Content-Length: 5",
	"Location: http://localhost:8080/abc",
	"",
	"abc", "" );
	' | nc -4l localhost 8080;
	};
	done;

]]
    `curl` will fail because 5 bytes of content (though ignored) is more than 1:
[[
* STATE: DO => DID handle 0x76bd88; line 2164
* STATE: DID => PERFORMING handle 0x76bd88; line 2282
< HTTP/1.1 301
< Content-Length: 5
< Location: http://localhost:8080/abc
* Maximum file size exceeded
]]

    The failure is caused by a check in "Curl_http_size", which AFAICT:
    Compares even for body bytes that are not "real" so to speak:
    https://github.com/curl/curl/blob/4ff04615a095ed1e7c489e3bbd4600b52e68d9a7/lib/http.c#L3273-L3294
    .
    There is a separate check that is done in "cw_download_write": which deals with real body bytes.
    However by that point the download is already taking place:
    https://github.com/curl/curl/blob/4ff04615a095ed1e7c489e3bbd4600b52e68d9a7/lib/sendf.c#L326-L344
    .
    In other words, the former check causes `curl` to error immediately:
    After the headers are received and the content length is known.

    The user may want this behavior because, if they say:
    Limit to 1 MiB, then maybe they don't want to download the first 1 MiB of a 100 MiB file.

    On the other hand, it seems that "cw_download_write" works in exactly that way:
    It will allow downloading to the maximum file size.

    So if the former check is removed, which is my initial inclination:
    There will be a change in behavior. ]
.
    See also: https://github.com/curl/curl/issues/14440#issuecomment-2346989774

@bagder
Copy link
Member

bagder commented Sep 13, 2024

@MasterInQuestion I did not understand one word of this

@MasterInQuestion
Copy link

MasterInQuestion commented Sep 13, 2024

    Refactored quality content.
    [ https://github.com/MasterInQuestion/attach/raw/main/curl/curl/14899/refactor.webp ]
    For potential future reference.

    Probably also what @jay intended: testing the format's potential.

@bagder
Copy link
Member

bagder commented Sep 13, 2024

It does not help me. I have no idea what you talk about. #14900 is my proposed fix.

@MasterInQuestion
Copy link

    I've noted you have tendency to comment before reading... per past events.
    Very strange, as the project's source should be harder to interpret.

    No matter. My proposed design is:
    “Perhaps a separation: "--max-dsize"? (parallel of "fsize")”
    (in "See also")

@bagder
Copy link
Member

bagder commented Sep 13, 2024

“Perhaps a separation: "--max-dsize"? (parallel of "fsize")”

A separation between what for what purpose? Is that another way of saying my fix is not to your liking?

@MasterInQuestion
Copy link

MasterInQuestion commented Sep 13, 2024

    Please read "See also":
    “Users of "--max-filesize" may want to limit the overall bytes downloaded:
    Even if it's specifically documented as file size downloaded.
    So I'm not sure that's a bug.”
    (original by @jay)

@bagder
Copy link
Member

bagder commented Sep 13, 2024

I give up.

bagder added a commit that referenced this issue Sep 14, 2024
Add test 477 to verify

Reported-by: Jay Satiro
Fixes #14899
@bagder bagder closed this as completed in aef384a Sep 14, 2024
@MasterInQuestion
Copy link

    Summary:
    |1| New option "--max-dsize": keeping the current behavior (counting all transfer bytes).
    |2| Alias "--max-fsize" to "--max-filesize": changed to count only the final file size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

3 participants