-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
curl: prevent binary output spewed to terminal #1512
Conversation
@bagder, thanks for your PR! By analyzing the history of the files in this pull request, we identified @yangtse, @captain-caveman2k and @tatsuhiro-t to be potential reviewers. |
fd2b6cf
to
1fbe241
Compare
Would it make sense to checker for |
I'm also concerned about false positives with utf-8 contents, which I suppose can have that value legitimately. I figured the zero check is fairly safe for that. |
UTF-8 is constructed so that there are never any ASCII bytes (i.e. 0x00 to 0x7F) in the (valid) encodings of non-ASCII characters, so there might not be any real false positive concerns. It might be interesting to look at how e.g. grep or less implement their similar is-binary tests. |
grep: looks for binary zeroes. See the less (found no way to link to a source code online so I quote a comment in the |
Decimal 27 is used for escape sequences, which is exactly why it is so annoying to get in a binary file since it can send the most crazy, illegal or weirdo sequences. But when used in a properly crafted stream, it can be used to set colors etc and then definitively (well, I think so at least) is not "binary". The http://wttr.in service is an example of a site that uses escape sequences without the response being considered "binary" by most people. |
Good point. I did not realize there are sites that intentionally print escape sequences to our terminals. It would be really bad idea to handle them as binary data. |
The zero check would trip up on UTF-16 and UTF-32. While I haven't ever seen UTF-32 text on a web server, UTF-16 does exist "in the wild" and can usually be displayed correctly by terminals, since they only drop the I don't think We could donate about 90 lines of code we use to handle a similar problem in out product. It first identifies text with a BOM, then explicitly checks for GIF, JPEG, gzip and PDF files, since those may contain lots of text-like characters at the start, and then runs a heuristic by counting some characters < 32 and sequences of 2 or more When we discussed this issue at the curl meet, I thought the easiest solution would be to replace all non-whitespace characters < 32, all characters 128-159, and possibly everything >=128 with a substitute, e.g. '.'. |
@bagder The http://wttr.in service is an example of a site ... Nice, but a bit too wide for the Windows console. This is a cool example too:
|
src/tool_getparam.c
Outdated
/* TLS version 1 for proxy */ | ||
config->proxy_ssl_version = CURL_SSLVERSION_TLSv1; | ||
break; | ||
|
||
case 'A': /* --binary-ok */ | ||
config->binary_ok = toggle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean global
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disregard i was looking at isatty which is global
how about diff --git a/src/tool_cb_wrt.c b/src/tool_cb_wrt.c
index 6c08943..08e538d 100644
--- a/src/tool_cb_wrt.c
+++ b/src/tool_cb_wrt.c
@@ -137,7 +137,17 @@ size_t tool_write_cb(char *buffer, size_t sz, size_t nmemb, void *userdata)
if(!outs->stream && !tool_create_output_file(outs))
return failure;
- rc = fwrite(buffer, sz, nmemb, outs->stream);
+ if(config->global->printable_only) {
+ char *p, *end = buffer + (sz * nmemb);
+ for(rc = 0, p = buffer; p != end; ++p, ++rc) {
+ bool printable = isprint(*p) || *p == '\r' || *p == '\n' || *p == '\t';
+ if(fputc((printable ? *p : '.'), outs->stream) == EOF)
+ break;
+ }
+ }
+ else {
+ rc = fwrite(buffer, sz, nmemb, outs->stream);
+ }
if((sz * nmemb) == rc)
/* we added this amount of data to the output */
diff --git a/src/tool_cfgable.h b/src/tool_cfgable.h
index 38777f6..25c303a 100644
--- a/src/tool_cfgable.h
+++ b/src/tool_cfgable.h
@@ -259,6 +259,7 @@ struct GlobalConfig {
int progressmode; /* CURL_PROGRESS_BAR / CURL_PROGRESS_STATS */
char *libcurl; /* Output libcurl code to this file name */
bool fail_early; /* exit on first transfer error */
+ bool printable_only; /* terminal output '.' for non-printables */
struct OperationConfig *first;
struct OperationConfig *current;
struct OperationConfig *last; /* Always last in the struct */
diff --git a/src/tool_getparam.c b/src/tool_getparam.c
index 56bbbf1..04a171c 100644
--- a/src/tool_getparam.c
+++ b/src/tool_getparam.c
@@ -251,6 +251,7 @@ static const struct LongShort aliases[]= {
{"E7", "proxy-capath", ARG_STRING},
{"E8", "proxy-insecure", ARG_BOOL},
{"E9", "proxy-tlsv1", ARG_NONE},
+ {"EA", "printable-only", ARG_BOOL},
{"f", "fail", ARG_BOOL},
{"fa", "fail-early", ARG_BOOL},
{"F", "form", ARG_STRING},
@@ -1559,6 +1560,11 @@ ParameterError getparameter(const char *flag, /* f or -long-flag */
config->proxy_ssl_version = CURL_SSLVERSION_TLSv1;
break;
+ case 'A':
+ /* terminal output '.' for non-printables */
+ global->printable_only = toggle;
+ break;
+
default: /* unknown flag */
return PARAM_OPTION_UNKNOWN;
} |
Made like that, it doesn't save anyone from accidentally sending binary to stdout, right? That's sort of the main point with my take on this. And enabling |
Right. My suggestion was something lighter, so rather than error on binary by default we don't do that. Instead of making a change to the defaults we could offer an option that the user can add to their curlrc if they don't want non-printable characters messing with their terminal, and those characters replaced with Note the condition I used didn't check for isatty simply because I forgot, it should be |
If you consider this a security problem (cf. http://seclists.org/oss-sec/2017/q2/183 ), then So, what's the scope of this - just prevent accidental messes or protect the user from potentially malicious content? |
This pull-request is an attempt to help users avoid accidental spewing binary data to the terminal. Requiring a special option to accomplish this doesn't help (since these users would avoid the problem all together by using a command line option already), and it is not an attempt to block "malicious content". (The malicious examples in the link above are about how terminal software is vulnerable to certain data and I really cannot see how it can be curl's job to filter its output for that purpose.) |
8f7f6b1
to
af124e0
Compare
Renamed the option to |
The name --binary-stdout implies that without it stdout isn't binary. --binary-stdout sounds like something that should always open stdout in binary mode. This is just stdout terminal, what about something that has term or terminal in the name and will toggle well like --terminal-binary --no-terminal-binary --terminal-allow-binary --no-terminal-allow-binary --term-allow-binary etc. If I saw that in a curlrc I'd have a better idea what it meant. But really these are also kind of misleading along with the doc because it doesn't strictly stop binary data since it only checks the first 2000 bytes. In other words any phrasing or option I think we don't want to create a false assurance. |
No option name can completely stand for itself without being ridiculously long, they do require that the user reads up on it to really get to know what it does and how it works. I personally always prefer shorter rather than longer option names. The fact that it checks the first 2000 bytes is mostly for a functional reason: getting the warning after already having shown a lot of data feels weird. I don't think that will trick any user. Maybe |
|
Ok, an even better suggestion IMHO:
That is, make it explicit that the output should be sent to stdout using the existing option for this purpose! |
Love it! |
We already use the single letter |
Was just a thought. |
... unless "--output -" is used. Binary detection is done by simply checking for a binary zero in early data. Added test 1425 1426 to verify. Closes #1512
af124e0
to
2789e1f
Compare
Now pushed the updated version here. The warning text now says:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--output -
I think that's a lot better.
"Use \"--output -\" to tell curl to output it to your terminal " | ||
"anyway, or consider \"--output <FILE>\" to save to a file.\n"); | ||
config->synthetic_error = ERR_BINARY_TERMINAL; | ||
return bytes-1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you return failure
here, that would be more in line with the rest of the function. In that case you don't even need the bytes
you could just pass that multiplication directly to memchr.
@@ -141,6 +147,7 @@ struct OperationConfig { | |||
bool insecure_ok; /* set TRUE to allow insecure SSL connects */ | |||
bool proxy_insecure_ok; /* set TRUE to allow insecure SSL connects | |||
for proxy */ | |||
bool binary_ok; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest disambiguate this as terminal_binary_ok
... unless --binary-ok is used. This is done by simply checking for a
binary zero in early data.