New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASCII space characters in --data-urlencode encoded as %20 rather than + #3229

Closed
Witiko opened this Issue Nov 3, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@Witiko
Copy link

Witiko commented Nov 3, 2018

I did this

$ curl --trace log --data-urlencode 'q=hello world' https://www.google.com/search
$ grep -A 1 -F '=> Send data, 15 bytes' <log 
=> Send data, 15 bytes (0xf)
0000: 71 3d 68 65 6c 6c 6f 25 32 30 77 6f 72 6c 64    q=hello%20world

I expected the following

Section 8.2.1. The form-urlencoded Media Type of RFC 1866 specifies that “form field names and values are escaped: space characters are replaced by `+', and then reserved characters are escaped as per [URL]”. As a result, I would expect the following data to be sent over the wire:

$ grep -A 1 -F '=> Send data, 13 bytes' <log 
=> Send data, 13 bytes (0xd)
0000: 71 3d 68 65 6c 6c 6f 2b 77 6f 72 6c 64    q=hello+world

curl/libcurl version

curl 7.52.1 (x86_64-pc-linux-gnu) libcurl/7.52.1 OpenSSL/1.0.2l zlib/1.2.8 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.16) libssh2/1.7.0 nghttp2/1.18.1 librtmp/2.3

Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp

Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL

operating system

Linux 4.9.110 (Debian 10)

@bagder bagder added the HTTP label Nov 5, 2018

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Nov 5, 2018

+ certainly makes the output slightly more readable for humans but %20 is still not wrong for ASCII spaces since it still follows the syntax rules - and is easier for the encoder to use for consistency.

I would accept a patch that improves this, but I don't consider it a high-prio to work on myself.

@bagder bagder added the enhancement label Nov 5, 2018

@Witiko

This comment has been minimized.

Copy link

Witiko commented Nov 5, 2018

Althought RFC 1866 does not explicitly use the MUST keyword, this is likely because the requirement level nomenclature was not established before RFC 2119. Other than that, there seems to be no uncertainty that substituting spaces for + before escaping the reserved characters is the only standard behavior. Applications expect this format and may produce different output when the standard is not met, which is how I stumbled on this in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment