-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failure to parse invalid Unicode on Windows in v2.33.0 (but not master) #1905
Comments
I don't think we will be backporting this fix because 2.x branch is frozen and only security fixes are going to be allowed. We are working hard on clap v3 and I am confident to say that the full release will be no more than 3 months away if we keep up the current pace of development. |
If I was able to figure out a fix on the 2.x branch, and it didn't look too hairy, would you be willing to release it? I'm interested in fixing this for my own work (it's a blocker for properly testing |
@oconnor663 I say we wouldn't because the code in question is a few years old code. The last time someone touched it was a year ago or so, nobody here (maybe short of @kbknapp) is familiar with that codebase. We simply can't review your fix and be certain that it won't break something else, even if it's just a few lines. |
We appreciate the very detailed bug report! @pksunkara is correct, 2.x is currently frozen except security fixes. However, I'm willing to take a look at a PR to fix the issue, since I'm familiar with the codebase. I can't promise a merge if it looks to be too high a maintenance burden. Lets at least see how far reaching the fix is, since causing a crash on valid input could be considered a security issue if we wend the optics a bit to include remote binaries with DDoS potential. |
Previously, starts_with() would panic on Windows if the OsStr contained any invalid Unicode. Because this method is in the critical path for parsing command line arguments, this meant that Clap up to v2.33.0 on Windows would panic on non-Unicode filenames, even when the application tried to use OsString specifically to account for this case. As a workaround, this commit adds a custom implementation of OsStr::starts_with() for Windows that doesn't rely on as_bytes(). Fixes clap-rs#1905.
Thanks for all the prompt replies. It looks like the fix can be quite targeted, essentially a one-line change to the existing |
There shouldn't be many places where |
Previously, starts_with() would panic on Windows if the OsStr contained any invalid Unicode. Because this method is in the critical path for parsing command line arguments, this meant that Clap up to v2.33.0 on Windows would panic on non-Unicode filenames, even when the application tried to use OsString specifically to account for this case. As a workaround, this commit adds a custom implementation of OsStr::starts_with() for Windows that doesn't rely on as_bytes(). Fixes clap-rs#1905.
Previously, starts_with() would panic on Windows if the OsStr contained any invalid Unicode. Because this method is in the critical path for parsing command line arguments, this meant that Clap up to v2.33.0 on Windows would panic on non-Unicode filenames, even when the application tried to use OsString specifically to account for this case. As a workaround, this commit adds a custom implementation of OsStr::starts_with() for Windows that doesn't rely on as_bytes(). Fixes clap-rs#1905.
Previously, starts_with() would panic on Windows if the OsStr contained any invalid Unicode. Because this method is in the critical path for parsing command line arguments, this meant that Clap up to v2.33.0 on Windows would panic on non-Unicode filenames, even when the application tried to use OsString specifically to account for this case. As a workaround, this commit adds a custom implementation of OsStr::starts_with() for Windows that doesn't rely on as_bytes(). Fixes clap-rs#1905.
I think this was fixed in master because of the recent refactor done on OsStr by @dylni. It was a pretty big change (IMO). |
Previously, starts_with() would panic on Windows if the OsStr contained any invalid Unicode. Because this method is in the critical path for parsing command line arguments, this meant that Clap up to v2.33.0 on Windows would panic on non-Unicode filenames, even when the application tried to use OsString specifically to account for this case. As a workaround, this commit adds a custom implementation of OsStr::starts_with() for Windows that doesn't rely on as_bytes(). Fixes clap-rs#1905.
Previously, OsStr::starts_with() would panic on Windows if the OsStr contained any invalid Unicode. Because this method is in the critical path for parsing command line arguments, Clap up to v2.33.0 on Windows panics on non-Unicode filenames, even when the application specifically requests OsString to account for this case. As a workaround, this commit adds a custom implementation of OsStr::starts_with() for Windows that doesn't rely on OsStr::as_bytes(). With this change, examples like the following can parse successfully. Here "X" represents invalid Unicode: clap.exe X clap.exe --some-arg X However, examples like these will still panic: clap.exe --some-arg=X clap.exe -sX These still panic, because they require string splitting operations like OsStr::split_at_byte() and OsStr::trim_left_matches(). Fixing these would require either breaking open the in-memory representation of OsStr, which is not stable, or changing all these APIs to allow for an allocated OsString/Cow<OsStr>. This fix is aiming to be minimal and also not wildly unsafe, so we leave these bugs in place. Hopefully the majority of invalid Unicode in the wild occurs in filepaths given as standalone arguments, and many of the rest can be converted from flags with `=` to flags with a space. Fixes clap-rs#1905.
Previously, OsStr::starts_with() would panic on Windows if the OsStr contained any invalid Unicode. Because this method is in the critical path for parsing command line arguments, Clap up to v2.33.0 on Windows panics on non-Unicode filenames, even when the application specifically requests OsString to account for this case. As a workaround, this commit adds a custom implementation of OsStr::starts_with() for Windows that doesn't rely on OsStr::as_bytes(). With this change, examples like the following can parse successfully. Here "X" represents invalid Unicode: clap.exe X clap.exe --some-arg X However, examples like these will still panic: clap.exe --some-arg=X clap.exe -sX These still panic, because they require string splitting operations like OsStr::split_at_byte() and OsStr::trim_left_matches(). Fixing these would require either breaking open the in-memory representation of OsStr, which is not stable, or changing all these APIs to allow for an allocated OsString/Cow<OsStr>. This fix is aiming to be minimal and also not wildly unsafe, so we leave these bugs in place. Hopefully the majority of invalid Unicode in the wild occurs in filepaths given as standalone arguments, and many of the rest can be converted from flags with `=` to flags with a space. Fixes clap-rs#1905.
I've submitted #1907. Despite appearances, I think this is a targeted fix :) Most of the code in there is new tests. The meat of the change is that I've defined the |
Thanks for the ping @pksunkara. Yes, this was fixed in this pull request. |
Yeah that's why I referenced the other PR (part of the refactor). I haven't had time the last day or so, but I'll look at this new PR by this weekend and see if it's something that could go on the 2.x branch with the understanding that if v2-master is currently broken, so long as we don't break more we're in a slightly better spot even if its not a total fix. Once v3 is fully released I fully intend to EOL v2 except for hard security patches |
1907: prevent some panics when parsing invalid Unicode on Windows r=kbknapp a=oconnor663 Previously, OsStr::starts_with() would panic on Windows if the OsStr contained any invalid Unicode. Because this method is in the critical path for parsing command line arguments, Clap up to v2.33.0 on Windows panics on non-Unicode filenames, even when the application specifically requests OsString to account for this case. As a workaround, this commit adds a custom implementation of OsStr::starts_with() for Windows that doesn't rely on OsStr::as_bytes(). With this change, examples like the following can parse successfully. Here "X" represents invalid Unicode: clap.exe X clap.exe --some-arg X However, examples like these will still panic: clap.exe --some-arg=X clap.exe -sX These still panic, because they require string splitting operations like OsStr::split_at_byte() and OsStr::trim_left_matches(). Fixing these would require either breaking open the in-memory representation of OsStr, which is not stable, or changing all these APIs to allow for an allocated OsString/Cow<OsStr>. This fix is aiming to be minimal and also not wildly unsafe, so we leave these bugs in place. Hopefully the majority of invalid Unicode in the wild occurs in filepaths given as standalone arguments, and many of the rest can be converted from flags with `=` to flags with a space. Fixes #1905. <!-- If your PR closes some issues, please write `Closes #XXXX` where `XXXX` is the number of the issue you want to fix. Each issue goes on it's own line. --> Co-authored-by: Jack O'Connor <oconnor663@gmail.com>
The following is for the current release version of Clap, v2.33.0. (The bug does not appear to repro in Clap
master
. More on that below.) I've set up a repository containing a minimal repro of this problem: https://github.com/oconnor663/clap_unicode_test. You can see a CI run demonstrating the failure here: https://github.com/oconnor663/clap_unicode_test/runs/645077320. To repro the failure yourself, runcargo test
in that repository.The issue is that passing invalid Unicode on the command line to a Clap binary causes an "unexpected invalid UTF-8 code point" error, but only on Windows. Here's the
main.rs
:Here's the test that runs on Linux and macOS. This one passes:
In that case, the actual output of the (successful) binary is:
And here's the test that fails on Windows:
In this one, the binary crashes in the
clap::App::get_matches
call.The tests are different between Unix and Windows because the way to construct an invalid Unicode
OsString
is different. On Unix, we just insert a non-UTF-8 byte into a byte slice. On Windows, we need to insert a unpaired surrogate character into a slice ofu16
. Other than that, the tests are effectively the same.When I try the master branch of Clap (see the
clap_master
branch of my repository and this CI run), the problem appears fixed. That's awesome! However, it looks like Clap might be in the middle of a substantial refactoring, and I don't know how long it will take for these fixes to see a release. Should something be backported in the meantime?The text was updated successfully, but these errors were encountered: