Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"--max-filesize 0" doesn't seem to work as described #14440

Closed
MasterInQuestion opened this issue Aug 7, 2024 · 20 comments
Closed

"--max-filesize 0" doesn't seem to work as described #14440

MasterInQuestion opened this issue Aug 7, 2024 · 20 comments

Comments

@MasterInQuestion
Copy link

MasterInQuestion commented Aug 7, 2024

    Seemingly no-op, or null the limit.
    Somewhat against: https://curl.se/docs/manpage.html#--max-filesize

    Use case:
    curl -v --max-filesize 1 -L "https://github.com/mozilla-mobile/firefox-android/assets/38040960/fd50937d-5442-494e-b4aa-0baf75569a57"
    .
    Effectively doing HEAD but with GET:
    [ ^ Alike what browsers do: https://bugzilla.mozilla.org/show_bug.cgi?id=1872503#c3 ]
    Certain servers may refuse to serve HEAD (example reported HTTP 403 Forbidden), meanwhile the file may be large.

    Related:
    https://github.com/curl/curl/issues/11810

bagder added a commit that referenced this issue Aug 7, 2024
Reported-by: MasterInQuestion on github
Fixes #14440
@bagder
Copy link
Member

bagder commented Aug 7, 2024

  1. The documentation does indeed not mention the zero exception, I fix in max-filesize.md: mention zero disables the limit #14443
  2. That Firefox bug does not say anything about this.
  3. You can do a GET without reading the body with curl -I -X GET https://example.com

@bagder bagder closed this as completed in d41916c Aug 7, 2024
@MasterInQuestion
Copy link
Author

MasterInQuestion commented Aug 7, 2024

    Didn't manage to locate the "-X"...
    Note "GET" is case-sensitive: "-X" just passes the string verbatim.

@MasterInQuestion
Copy link
Author

MasterInQuestion commented Aug 25, 2024

    Why does it for this example seem to still download the whole file..?
    https://drive.usercontent.google.com/download?confirm=t&export=download&id=1WxOrSi-GNB45nLUUiR4PT7c4H2VurtKk (~ 18.34 MiB)

    "--max-filesize 1" variant worked intended.

    See also:
    https://trac.ffmpeg.org/ticket/11056#comment:16
    https://trac.ffmpeg.org/ticket/11159#comment:3
    ("confirm=t" needed to bypass some "virus" confirmation)

    ----

    More suitable to test:
    -A "Mozilla/5.0 (Linux; rv:999) Gecko/20100101 Firefox/999" "https://premium.britannica.com/wp-content/uploads/2023/05/memorialday-2620x1080-1.png"
    (~ 1.3 MiB)

@MasterInQuestion
Copy link
Author

    @bagder, probably worth your attention.

@bagder
Copy link
Member

bagder commented Sep 9, 2024

What is? I don't understand what you're talking about.

@MasterInQuestion
Copy link
Author

    Pardon.
    Straightforward but less accurate:
    curl -I -X GET -A "Mozilla/5.0 (Linux; rv:999) Gecko/20100101 Firefox/999" "https://premium.britannica.com/wp-content/uploads/2023/05/memorialday-2620x1080-1.png"

@bagder
Copy link
Member

bagder commented Sep 9, 2024

That's a curl command line. What about it?

@bagder
Copy link
Member

bagder commented Sep 9, 2024

You ask for -I (HEAD) and get you insist on -X GET which is highly confusing. What do you want it to do?

@MasterInQuestion
Copy link
Author

MasterInQuestion commented Sep 9, 2024

    The question is:
    Comparing the "--max-filesize 1" variant, this one causes the unwanted full-download.
    (instead of mere getting the header)

    ----

    Rationale explained in 1st post:
    "Certain servers may refuse to serve HEAD (example reported HTTP 403 Forbidden), meanwhile the file may be large."

@bagder
Copy link
Member

bagder commented Sep 9, 2024

OK, so what is the exact question?

@MasterInQuestion
Copy link
Author

    How to:
    Effectively doing HEAD but with GET, without full-download?

@bagder
Copy link
Member

bagder commented Sep 9, 2024

That is exactly what you get with:

curl -I -X GET $URL

@MasterInQuestion
Copy link
Author

    That is exactly what you get me with...
    [ Quote bagder @ CE 2024-08-07 15:05:06 UTC:
https://github.com/curl/curl/issues/14440#issuecomment-2273695109
    3. You can do a GET without reading the body with `curl -I -X GET "https://example.com"`. ]

    The problem is:
    It seems to cause the unwanted full-download.

    Did it work (without full-download) for you?

@jay
Copy link
Member

jay commented Sep 10, 2024

I also don't understand what you are asking. You want curl to behave as if it's receiving a HEAD response and close? What do you mean it causes an unwanted download? For example this download of 200MB should terminate immediately (after receiving the headers) if you tell curl it's a HEAD request but then change it custom to GET:

curld -v -I -X GET http://cachefly.cachefly.net/200mb.test -o NUL

The server sees GET and replies with the content but curl will terminate the connection after the headers.

It sounds to me like you want to simulate a HEAD reply for a server that does not support those requests but if you send a GET request to the server then it may send data before curl can close the connection. That's what you are asking the server to do with GET you want to get the resource. Correct me if I'm wrong @bagder but I'm pretty sure it's discarded as excess in such a case (ie not written to -o outfile) though I don't know if that's guaranteed

@MasterInQuestion
Copy link
Author

    Compare:
    curl -I -X GET "https://cachefly.cachefly.net/200mb.test"
    curl -I -X GET "https://drive.usercontent.google.com/download?confirm=t&export=download&id=1WxOrSi-GNB45nLUUiR4PT7c4H2VurtKk"

    #1 also worked for me. (no notable download)

@jay
Copy link
Member

jay commented Sep 11, 2024

As I have explained the server may send data before curl can close the connection. I took a look at your latter example in Wireshark and google takes approximately 3 seconds to reply with HTTP/2 HEADERS, I don't know why so long but it has nothing to do with curl. Then the server follows with DATA frames and during that entire time which is less than 1 second like 100 200 ms curl replies with RST_STREAM on the stream and then GOAWAY on the connection. You cannot expect no data will be sent because you are requesting the data is sent and curl needs to hang up after receiving the headers.

@MasterInQuestion
Copy link
Author

MasterInQuestion commented Sep 12, 2024

[[
    As I have explained, the server may send data before `curl` can close the connection.
    I took a look at your latter example in Wireshark: and Google takes approximately 3 seconds to reply with HTTP/2 HEADERS.
    I don't know why so long but it has nothing to do with `curl`.

    Then the server follows with DATA frames, and during that entire time which is less than 1 second like 100, 200 ms:
    `curl` replies with RST_STREAM on the stream and then GOAWAY on the connection.

    You cannot expect no data will be sent: because you are requesting the data be sent.
    And `curl` needs to hang up after receiving the headers.
]]
    So for this case, the validity of "--max-filesize 0" seems to hold.

    Meanwhile I noted using "--max-filesize 1" with "-L", had caused those carp a redirection message of length:
    To croak amid the redirection for "(63) Maximum file size exceeded".

    Workaround would be rising the limit to somewhat higher more tolerable value, e.g. "4K" (4,096 B?).
    Note the cut-incomplete reply will not be output.
    However when dealing with some extraordinarily small files: "/dev/null" alike may have to be bothered.

@jay
Copy link
Member

jay commented Sep 12, 2024

I see, you are saying that --max-filesize applies to servers that redirect. Users of --max-filesize may want to limit the overall bytes downloaded even if it's specifically documented as file size downloaded, so I'm not sure that's a bug. What happens on redirect is curl is discarding the bytes like if the redirect is from localhost/foo to localhost/bar then it ignores foo download (* Ignoring the response-body) and downloads bar but it has to read the bytes of foo (which location redirects may have).

Anyone else have an opinion on whether this is appropriate behavior?

@MasterInQuestion
Copy link
Author

[[
    I see, you are saying that "--max-filesize" applies to servers that redirect.
    .
    Users of "--max-filesize" may want to limit the overall bytes downloaded:
    Even if it's specifically documented as file size downloaded.
    So I'm not sure that's a bug.

    What happens on redirect is:
    `curl` is discarding the bytes like, if the redirect is from "localhost/A" to "localhost/B":
    Then it ignores "A" download ("* Ignoring the response-body"), and downloads "B".
    But it has to read the bytes of "A" (which location redirects may have).

    Anyone else have an opinion on whether this is appropriate behavior?
]]
    Perhaps a separation: "--max-dsize"? (parallel of "fsize")

    The "foobar" non-sense is extraordinarily befuddling...
    Normalized and I still couldn't quite understand.

@bagder
Copy link
Member

bagder commented Sep 13, 2024

Anyone else have an opinion on whether this is appropriate behavior?

The ignored response-body should not be counted as "file download" data. That should be a bug if it is. The max filesize should be for the data actually delivered/saved, not just transferred I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants