Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASCII space characters in --data-urlencode encoded as %20 rather than + #3229

Closed
Witiko opened this issue Nov 3, 2018 · 2 comments
Closed

Comments

@Witiko
Copy link

Witiko commented Nov 3, 2018

I did this

$ curl --trace log --data-urlencode 'q=hello world' https://www.google.com/search
$ grep -A 1 -F '=> Send data, 15 bytes' <log 
=> Send data, 15 bytes (0xf)
0000: 71 3d 68 65 6c 6c 6f 25 32 30 77 6f 72 6c 64    q=hello%20world

I expected the following

Section 8.2.1. The form-urlencoded Media Type of RFC 1866 specifies that “form field names and values are escaped: space characters are replaced by `+', and then reserved characters are escaped as per [URL]”. As a result, I would expect the following data to be sent over the wire:

$ grep -A 1 -F '=> Send data, 13 bytes' <log 
=> Send data, 13 bytes (0xd)
0000: 71 3d 68 65 6c 6c 6f 2b 77 6f 72 6c 64    q=hello+world

curl/libcurl version

curl 7.52.1 (x86_64-pc-linux-gnu) libcurl/7.52.1 OpenSSL/1.0.2l zlib/1.2.8 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.16) libssh2/1.7.0 nghttp2/1.18.1 librtmp/2.3

Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp

Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL

operating system

Linux 4.9.110 (Debian 10)

@bagder bagder added the HTTP label Nov 5, 2018
@bagder
Copy link
Member

bagder commented Nov 5, 2018

+ certainly makes the output slightly more readable for humans but %20 is still not wrong for ASCII spaces since it still follows the syntax rules - and is easier for the encoder to use for consistency.

I would accept a patch that improves this, but I don't consider it a high-prio to work on myself.

@Witiko
Copy link
Author

Witiko commented Nov 5, 2018

Althought RFC 1866 does not explicitly use the MUST keyword, this is likely because the requirement level nomenclature was not established before RFC 2119. Other than that, there seems to be no uncertainty that substituting spaces for + before escaping the reserved characters is the only standard behavior. Applications expect this format and may produce different output when the standard is not met, which is how I stumbled on this in the first place.

@bagder bagder closed this as completed in 411d0c7 Jan 10, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Apr 10, 2019
bagder pushed a commit that referenced this issue Feb 27, 2020
According to RFC1866, in form-urlencoded content "space characters are
replaced by `+', and then reserved characters are escaped as per URL."

Fixes #3229
bagder pushed a commit that referenced this issue Feb 28, 2020
According to RFC1866, in form-urlencoded content "space characters are
replaced by `+', and then reserved characters are escaped as per URL."

Fixes #3229
Closes #4924
Closes #4987
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

Successfully merging a pull request may close this issue.

2 participants