New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl: prevent binary output spewed to terminal #1512

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
9 participants
@bagder
Member

bagder commented May 25, 2017

... unless --binary-ok is used. This is done by simply checking for a
binary zero in early data.

@mention-bot

This comment has been minimized.

mention-bot commented May 25, 2017

@bagder, thanks for your PR! By analyzing the history of the files in this pull request, we identified @yangtse, @captain-caveman2k and @tatsuhiro-t to be potential reviewers.

@bagder bagder force-pushed the bagder/avoid-binary-to-terminal branch from fd2b6cf to 1fbe241 May 25, 2017

@kdudka

This comment has been minimized.

Collaborator

kdudka commented May 26, 2017

Would it make sense to checker for '\033', too? That is the character that breaks our terminals most of the time. On the other hand, I am not sure if it commonly occurs without hitting NUL first...

@bagder

This comment has been minimized.

Member

bagder commented May 26, 2017

I'm also concerned about false positives with utf-8 contents, which I suppose can have that value legitimately. I figured the zero check is fairly safe for that.

@jmarshall

This comment has been minimized.

jmarshall commented May 26, 2017

UTF-8 is constructed so that there are never any ASCII bytes (i.e. 0x00 to 0x7F) in the (valid) encodings of non-ASCII characters, so there might not be any real false positive concerns.

It might be interesting to look at how e.g. grep or less implement their similar is-binary tests.

@bagder

This comment has been minimized.

Member

bagder commented May 26, 2017

grep: looks for binary zeroes. See the buf_has_nulls function here: https://git.savannah.gnu.org/cgit/grep.git/tree/src/grep.c

less (found no way to link to a source code online so I quote a comment in the filename.c source file): "Call it a binary file if there are more than 5 binary characters in the first 256 bytes of the file"

@bagder

This comment has been minimized.

Member

bagder commented May 26, 2017

Decimal 27 is used for escape sequences, which is exactly why it is so annoying to get in a binary file since it can send the most crazy, illegal or weirdo sequences. But when used in a properly crafted stream, it can be used to set colors etc and then definitively (well, I think so at least) is not "binary". The http://wttr.in service is an example of a site that uses escape sequences without the response being considered "binary" by most people.

@kdudka

This comment has been minimized.

Collaborator

kdudka commented May 26, 2017

Good point. I did not realize there are sites that intentionally print escape sequences to our terminals. It would be really bad idea to handle them as binary data.

@rcanavan

This comment has been minimized.

rcanavan commented May 26, 2017

The zero check would trip up on UTF-16 and UTF-32. While I haven't ever seen UTF-32 text on a web server, UTF-16 does exist "in the wild" and can usually be displayed correctly by terminals, since they only drop the \000.

I don't think \033 is used to encode any UTF-8 characters aside from a literal \033, since all continuation bytes are >=128. A "correct" solution for all this may be to use libmagic.

We could donate about 90 lines of code we use to handle a similar problem in out product. It first identifies text with a BOM, then explicitly checks for GIF, JPEG, gzip and PDF files, since those may contain lots of text-like characters at the start, and then runs a heuristic by counting some characters < 32 and sequences of 2 or more \000. If there's interest in this code, I could mail or post that, or make a merge request (the latter may take a few days).

When we discussed this issue at the curl meet, I thought the easiest solution would be to replace all non-whitespace characters < 32, all characters 128-159, and possibly everything >=128 with a substitute, e.g. '.'.

@gvanem

This comment has been minimized.

Member

gvanem commented May 26, 2017

@bagder The http://wttr.in service is an example of a site ...

Nice, but a bit too wide for the Windows console. This is a cool example too:

curl -s http://adoxa.altervista.org/ansicon/ANSI%%20Prompt%%20Colours.txt | cat
/* TLS version 1 for proxy */
config->proxy_ssl_version = CURL_SSLVERSION_TLSv1;
break;

case 'A': /* --binary-ok */
config->binary_ok = toggle;

This comment has been minimized.

@jay

jay May 26, 2017

Member

I think you mean global

This comment has been minimized.

@jay

jay May 26, 2017

Member

disregard i was looking at isatty which is global

@jay

This comment has been minimized.

Member

jay commented May 26, 2017

how about --printable-only :

diff --git a/src/tool_cb_wrt.c b/src/tool_cb_wrt.c
index 6c08943..08e538d 100644
--- a/src/tool_cb_wrt.c
+++ b/src/tool_cb_wrt.c
@@ -137,7 +137,17 @@ size_t tool_write_cb(char *buffer, size_t sz, size_t nmemb, void *userdata)
   if(!outs->stream && !tool_create_output_file(outs))
     return failure;
 
-  rc = fwrite(buffer, sz, nmemb, outs->stream);
+  if(config->global->printable_only) {
+    char *p, *end = buffer + (sz * nmemb);
+    for(rc = 0, p = buffer; p != end; ++p, ++rc) {
+      bool printable = isprint(*p) || *p == '\r' || *p == '\n' || *p == '\t';
+      if(fputc((printable ? *p : '.'), outs->stream) == EOF)
+        break;
+    }
+  }
+  else {
+    rc = fwrite(buffer, sz, nmemb, outs->stream);
+  }
 
   if((sz * nmemb) == rc)
     /* we added this amount of data to the output */
diff --git a/src/tool_cfgable.h b/src/tool_cfgable.h
index 38777f6..25c303a 100644
--- a/src/tool_cfgable.h
+++ b/src/tool_cfgable.h
@@ -259,6 +259,7 @@ struct GlobalConfig {
   int progressmode;               /* CURL_PROGRESS_BAR / CURL_PROGRESS_STATS */
   char *libcurl;                  /* Output libcurl code to this file name */
   bool fail_early;                /* exit on first transfer error */
+  bool printable_only;            /* terminal output '.' for non-printables */
   struct OperationConfig *first;
   struct OperationConfig *current;
   struct OperationConfig *last;   /* Always last in the struct */
diff --git a/src/tool_getparam.c b/src/tool_getparam.c
index 56bbbf1..04a171c 100644
--- a/src/tool_getparam.c
+++ b/src/tool_getparam.c
@@ -251,6 +251,7 @@ static const struct LongShort aliases[]= {
   {"E7", "proxy-capath",             ARG_STRING},
   {"E8", "proxy-insecure",           ARG_BOOL},
   {"E9", "proxy-tlsv1",              ARG_NONE},
+  {"EA", "printable-only",           ARG_BOOL},
   {"f",  "fail",                     ARG_BOOL},
   {"fa", "fail-early",               ARG_BOOL},
   {"F",  "form",                     ARG_STRING},
@@ -1559,6 +1560,11 @@ ParameterError getparameter(const char *flag, /* f or -long-flag */
         config->proxy_ssl_version = CURL_SSLVERSION_TLSv1;
         break;
 
+      case 'A':
+        /* terminal output '.' for non-printables */
+        global->printable_only = toggle;
+        break;
+
       default: /* unknown flag */
         return PARAM_OPTION_UNKNOWN;
       }
@bagder

This comment has been minimized.

Member

bagder commented May 26, 2017

how about --printable-only

Made like that, it doesn't save anyone from accidentally sending binary to stdout, right? That's sort of the main point with my take on this. And enabling --printable-only by default if the output is detected to be a tty also seems a bit... strange, since it's a very subtle change.

@jay

This comment has been minimized.

Member

jay commented May 27, 2017

Made like that, it doesn't save anyone from accidentally sending binary to stdout, right?

Right. My suggestion was something lighter, so rather than error on binary by default we don't do that. Instead of making a change to the defaults we could offer an option that the user can add to their curlrc if they don't want non-printable characters messing with their terminal, and those characters replaced with .. In the case of either on/off there wouldn't be an error, some character would be output. Binary would remain ok by default.

Note the condition I used didn't check for isatty simply because I forgot, it should be if(config->global->isatty && config->global->printable_only). Also reading through this I'm wondering about UTF-16 and UTF-32, for example if we go char by char regardless of which of our options, we have to consider whether it's going to disrupt those encodings from being properly displayed (if that's possible with curl, I haven't tried it).

@rcanavan

This comment has been minimized.

rcanavan commented May 27, 2017

If you consider this a security problem (cf. http://seclists.org/oss-sec/2017/q2/183 ), then isprint() would obviously solve it, although it is very heavy-handed. I'm not sure if filtering \033 would be sufficient, or if the corresponding 8bit control characters would have to be filtered as well (2.5.1 in http://vt100.net/docs/vt220-rm/chapter2.html), since e.g. xterm supports 8bit controls, although those appear to be off by default.

So, what's the scope of this - just prevent accidental messes or protect the user from potentially malicious content?

@bagder

This comment has been minimized.

Member

bagder commented May 27, 2017

This pull-request is an attempt to help users avoid accidental spewing binary data to the terminal.

Requiring a special option to accomplish this doesn't help (since these users would avoid the problem all together by using a command line option already), and it is not an attempt to block "malicious content". (The malicious examples in the link above are about how terminal software is vulnerable to certain data and I really cannot see how it can be curl's job to filter its output for that purpose.)

@bagder bagder force-pushed the bagder/avoid-binary-to-terminal branch from 8f7f6b1 to af124e0 Jun 15, 2017

@bagder

This comment has been minimized.

Member

bagder commented Jun 15, 2017

Renamed the option to --binary-stdout and added two test cases that verify that the feature and option work as supposed.

@coveralls

This comment has been minimized.

coveralls commented Jun 15, 2017

Coverage Status

Coverage increased (+0.007%) to 73.169% when pulling af124e0 on bagder/avoid-binary-to-terminal into efc83d6 on master.

@jay

This comment has been minimized.

Member

jay commented Jun 15, 2017

The name --binary-stdout implies that without it stdout isn't binary. --binary-stdout sounds like something that should always open stdout in binary mode. This is just stdout terminal, what about something that has term or terminal in the name and will toggle well like --terminal-binary --no-terminal-binary --terminal-allow-binary --no-terminal-allow-binary --term-allow-binary etc. If I saw that in a curlrc I'd have a better idea what it meant. But really these are also kind of misleading along with the doc because it doesn't strictly stop binary data since it only checks the first 2000 bytes. In other words any phrasing or option I think we don't want to create a false assurance.

@bagder

This comment has been minimized.

Member

bagder commented Jun 15, 2017

No option name can completely stand for itself without being ridiculously long, they do require that the user reads up on it to really get to know what it does and how it works. I personally always prefer shorter rather than longer option names. The fact that it checks the first 2000 bytes is mostly for a functional reason: getting the warning after already having shown a lot of data feels weird. I don't think that will trick any user. Maybe --binary-ok is still the best name...

@bagder

This comment has been minimized.

Member

bagder commented Jun 16, 2017

--allow-binary (via) has a nice ring to it I think.

@bagder

This comment has been minimized.

Member

bagder commented Jun 16, 2017

Ok, an even better suggestion IMHO:

--output - or -o -

That is, make it explicit that the output should be sent to stdout using the existing option for this purpose!

@jzakrzewski

This comment has been minimized.

Contributor

jzakrzewski commented Jun 16, 2017

Love it!
I've seen this pattern in some other tools. Sometimes the "file" is called -, sometimes stdout.
Maybe, while on it, we could allow stdout and stderr there?

@bagder

This comment has been minimized.

Member

bagder commented Jun 16, 2017

while on it, we could allow stdout and stderr there?

We already use the single letter - to mean stdout (or stdin) for a whole bunch of options. Reserving the words "stdout" and "stderr" for the same thing now I think has a too high risk of damaging people's existing scripts... (and is slightly off-topic for this particular PR)

@jzakrzewski

This comment has been minimized.

Contributor

jzakrzewski commented Jun 16, 2017

Was just a thought.
Anyway, the -o - is then just perfect IMHO.

curl: prevent binary output spewed to terminal
... unless "--output -" is used. Binary detection is done by simply
checking for a binary zero in early data.

Added test 1425 1426 to verify.

Closes #1512

@bagder bagder force-pushed the bagder/avoid-binary-to-terminal branch from af124e0 to 2789e1f Jun 16, 2017

@bagder

This comment has been minimized.

Member

bagder commented Jun 16, 2017

Now pushed the updated version here. The warning text now says:

Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.

@coveralls

This comment has been minimized.

coveralls commented Jun 16, 2017

Coverage Status

Coverage increased (+0.04%) to 73.42% when pulling 2789e1f on bagder/avoid-binary-to-terminal into de4c747 on master.

@jay

--output -

I think that's a lot better.

"Use \"--output -\" to tell curl to output it to your terminal "
"anyway, or consider \"--output <FILE>\" to save to a file.\n");
config->synthetic_error = ERR_BINARY_TERMINAL;
return bytes-1;

This comment has been minimized.

@jay

jay Jun 16, 2017

Member

Couldn't you return failure here, that would be more in line with the rest of the function. In that case you don't even need the bytes you could just pass that multiplication directly to memchr.

@@ -141,6 +147,7 @@ struct OperationConfig {
bool insecure_ok; /* set TRUE to allow insecure SSL connects */
bool proxy_insecure_ok; /* set TRUE to allow insecure SSL connects
for proxy */
bool binary_ok;

This comment has been minimized.

@jay

jay Jun 16, 2017

Member

I suggest disambiguate this as terminal_binary_ok

@bagder bagder closed this in 5385450 Jun 16, 2017

@bagder bagder deleted the bagder/avoid-binary-to-terminal branch Jun 16, 2017

@lock lock bot locked as resolved and limited conversation to collaborators Dec 4, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.