Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl -O and wrong parsing of URL #7628

Closed
7-904-001-67-28 opened this issue Aug 25, 2021 · 15 comments
Closed

curl -O and wrong parsing of URL #7628

7-904-001-67-28 opened this issue Aug 25, 2021 · 15 comments

Comments

@7-904-001-67-28
Copy link

@7-904-001-67-28 7-904-001-67-28 commented Aug 25, 2021

I did this

In Command Prompt:

  1. curl.exe -I --url "https://github.com/PowerShell/vscode-powershell/releases/download/v2021.8.2/powershell-2021.8.2.vsix"
    HTTP/2 302

    location: …

    Redirection needed.
  2. Find string started with "location: ", get the new URL and restart cURL:
    curl.exe -I --url "https://github-releases.githubusercontent.com/42131201/f648ac7e-7de8-4a00-851e-f7a1631391d2?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20210825%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210825T124435Z&X-Amz-Expires=300&X-Amz-Signature=2cd843328ab928a1df7a9075151c8ec42c92331985beb6108d81871729143af2&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=42131201&response-content-disposition=attachment%3B%20filename%3Dpowershell-2021.8.2.vsix&response-content-type=application%2Foctet-stream"
    HTTP/2 200

    The URL is good.
  3. Change cURL option «-I» to «-O» only, NOT change the URL:
    curl.exe -O --url "https://github-releases.githubusercontent.com/42131201/f648ac7e-7de8-4a00-851e-f7a1631391d2?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20210825%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210825T124435Z&X-Amz-Expires=300&X-Amz-Signature=2cd843328ab928a1df7a9075151c8ec42c92331985beb6108d81871729143af2&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=42131201&response-content-disposition=attachment%3B%20filename%3Dpowershell-2021.8.2.vsix&response-content-type=application%2Foctet-stream"
    curl: (3) URL using bad/illegal format or missing URL

I expected the following

The downloading, not error.

curl/libcurl version

Build: 7.78.0_1
[curl -V output]
curl 7.78.0 (x86_64-pc-win32) libcurl/7.78.0 OpenSSL/1.1.1l (Schannel) zlib/1.2.11 brotli/1.0.9 zstd/1.5.0 libidn2/2.3.2 libssh2/1.9.0 nghttp2/1.44.0 libgsasl/1.10.0
Release-Date: 2021-07-21
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli gsasl HSTS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz MultiSSL NTLM SPNEGO SSL SSPI TLS-SRP UnixSockets zstd

operating system

MS Windows 10 21H1

@bagder
Copy link
Member

@bagder bagder commented Aug 25, 2021

I can't reproduce on Linux. Here I get a failure because "File name too long" ...

@bagder bagder changed the title Wrong parsing of URL curl -O and wrong parsing of URL Aug 25, 2021
@bagder bagder added the Windows label Aug 25, 2021
@7-904-001-67-28
Copy link
Author

@7-904-001-67-28 7-904-001-67-28 commented Aug 25, 2021

NOT ONLY this URL. Sometimes doubling the ampersands («&») helped. But not always.
Not only in Command Prompt, but also in «MS PowerShell».

@vszakats
Copy link
Member

@vszakats vszakats commented Aug 25, 2021

It breaks in the (Windows-specific) call sanitize_file_name() when trying to determine the output filename from the URL in get_url_file_name(). The reason is that the filename part of the URL is longer than 255 characters.

#!/bin/sh

url="$(curl --disable --silent --write-out '%{redirect_url}' --output /dev/null \
  'https://github.com/curl/curl/releases/download/curl-7_78_0/curl-7.78.0.tar.xz.asc')"

curl='curl.exe'

wine64 "${curl}" --disable "${url}" --head  # OK
wine64 "${curl}" --disable "${url}" --remote-name  # fail: 3
wine64 "${curl}" --disable "${url}" --output out.bin  # OK

output:

HTTP/2 200
x-amz-id-2: ********
x-amz-request-id: ********
last-modified: Wed, 21 Jul 2021 07:06:56 GMT
etag: "1c76c2080e3ea486d3756585448249e3"
content-disposition: attachment; filename=curl-7.78.0.tar.xz.asc
content-type: application/octet-stream
server: AmazonS3
via: 1.1 varnish, 1.1 varnish
accept-ranges: bytes
date: Wed, 25 Aug 2021 **:**:** GMT
age: 0
x-served-by: cache-********, cache-********
x-cache: MISS, MISS
x-cache-hits: 0, 0
strict-transport-security: max-age=31536000
x-fastly-request-id: ********
content-length: 488

curl: (3) URL using bad/illegal format or missing URL

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   488  100   488    0     0   5809      0 --:--:-- --:--:-- --:--:--  5809

@bagder
Copy link
Member

@bagder bagder commented Aug 25, 2021

It breaks in the (Windows-specific) call sanitize_file_name()

Aha! That explains why this only happens on Windows, since that function is only used there.

So perhaps we can help out and clarify this sitation a little bit better? Maybe like this:

diff --git a/src/tool_operate.c b/src/tool_operate.c
index 74221599d..960f3020a 100644
--- a/src/tool_operate.c
+++ b/src/tool_operate.c
@@ -948,12 +948,15 @@ static CURLcode single_transfer(struct GlobalConfig *global,
            */
 
           if(!per->outfile) {
             /* extract the file name from the URL */
             result = get_url_file_name(&per->outfile, per->this_url);
-            if(result)
+            if(result) {
+              errorf(global, "Failed to extract a sensible file name"
+                     " from the URL to use for storage!\n");
               break;
+            }
             if(!*per->outfile && !config->content_disposition) {
               errorf(global, "Remote file name has no length!\n");
               result = CURLE_WRITE_ERROR;
               break;
             }

@7-904-001-67-28
Copy link
Author

@7-904-001-67-28 7-904-001-67-28 commented Aug 25, 2021

Quote: «when trying to determine the output filename from the URL».
Sorry, but with option «-J» (i.e. curl.exe -J -O …) – the same error. Syntax error number 3, not error of writing to file number 23.
I was trying to simplify the situation, remove EXTRA options.

@bagder
Copy link
Member

@bagder bagder commented Aug 25, 2021

-J won't help here since there's no Content-Disposition header in there.

This is simply not a URL that -O works well with, and -J doesn't make it better.

@vszakats
Copy link
Member

@vszakats vszakats commented Aug 25, 2021

I can confirm the similar error with -J. It appeared to be a separate issue, though I did not investigate further.

On a second look it seems get_url_file_name() is called even when -J is specified, and on Windows (and MS-DOS) this will bail out early if sanitization fails. On other platforms the only similar error condition is out of memory, so it's much less likely to unexpectedly trigger. Ideally, it'd be best not to look at the URL at all with -J.

@vszakats
Copy link
Member

@vszakats vszakats commented Aug 25, 2021

@bagder: In my self-contained example there is a content-disposition: attachment; filename=curl-7.78.0.tar.xz.asc header. But even if there wouldn't be one, shouldn't -OJ rather result in curl: Remote file name has no length!?

@7-904-001-67-28
Copy link
Author

@7-904-001-67-28 7-904-001-67-28 commented Aug 25, 2021

Here is the information for the option «-J»:
&response-content-disposition=attachment%3B%20filename%3Dpowershell-2021.8.2.vsix&response-content-type=application%2Foctet-stream

Sorry, what header are you writing about? Syntax error. There was no attempts to communicate with the server.

Quote: «not to look at the URL at all with «-J»».
I agree. Or change description of «-J».

@vszakats
Copy link
Member

@vszakats vszakats commented Aug 26, 2021

@7-904-001-67-28: After the GET request is actually made (e.g. on non-Windows, or by omitting -O on Windows), and before serving the payload, the server will return a content-disposition response header, with a usable filename in it. That filename is normally used by -J.

@7-904-001-67-28
Copy link
Author

@7-904-001-67-28 7-904-001-67-28 commented Aug 26, 2021

Quote: «the server will return a content-disposition response header, with a usable filename in it. That filename is normally used by -J
Sorry, error number 3 is a syntax error. SYNTAX. «bad/illegal format». There can be no request to the server. Stop execution immediately.
Or, if there was a query, where is the server response? WHERE?! Nonsense.

@bagder
Copy link
Member

@bagder bagder commented Aug 26, 2021

I will ask that you stay polite or I will just close this issue and hide.

When -O is used, curl needs to be able to use a file name to store the data in. That name is extracted from the URL as documented. If no such name can be extracted or if the name is too long, it can't use that name. Then curl will fail with an error. Which is what happens here.

At that point it doesn't know if there will be any content-disposition or not. So, it needs a working file name first to use in case there won't show up a decent header.

You're welcome to improve this in the code if you have a better way.

bagder added a commit that referenced this issue Aug 26, 2021
Fixes #7628
@7-904-001-67-28
Copy link
Author

@7-904-001-67-28 7-904-001-67-28 commented Aug 26, 2021

Sorry, I do not know English. I am using a computer translation. What rule did I break? Why are you threatening? Are you a talib or a civilized man?
And the error is really present. Your program has bug. Many years. I have been seeing this mistake for two years. Correct the mistake and I will not write to you. This error can interfere with many people. I have endured her for two years.

I can not improve this in the code because I do not see the source code on this site. I do not know how to use the site «github.com» as repository. Give me the appropriate piece of source code in a familiar to me programming language, if you wish.

@danielgustafsson
Copy link
Member

@danielgustafsson danielgustafsson commented Aug 26, 2021

There is a proposed patch which fixes this, so we can close this issue. Thanks for the report.

@7-904-001-67-28
Copy link
Author

@7-904-001-67-28 7-904-001-67-28 commented Aug 26, 2021

Quote: «When -O is used, curl needs to be able to use a file name to store the data in.»
You are wrong. Not small letter «-o», but great letter «-O».
And I have written, that error have place with option «-J», when program ask name from server. Why?

@curl curl locked as resolved and limited conversation to collaborators Aug 26, 2021
bagder added a commit that referenced this issue Aug 26, 2021
Due to how this currently works internally, it needs a working initial
file name to store contents in, so it may still fail even with -J is
used (and thus accepting a name from content-disposition:) if the file
name part of the URL isn't "good enough".

Fixes #7628
Closes #7635
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants