Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clap panics on argument in invalid unicode #262

Closed
remram44 opened this issue Sep 21, 2015 · 14 comments
Closed

Clap panics on argument in invalid unicode #262

remram44 opened this issue Sep 21, 2015 · 14 comments
Labels
C-bug C-enhancement

Comments

@remram44
Copy link

@remram44 remram44 commented Sep 21, 2015

I can understand your library not being designed to handle arguments that are not valid unicode (which are possible on UNIX, and can be valid filenames, but heh), however having your library panic the calling thread is not cool. Please consider using Result instead of crashing.

Reproduce:

echo -e 'r\xE9mi' | xargs target/debug/myprogramusingclap

(note that you can't use cargo run, it can't handle unicode either (but at least it doesn't panic))

Similar to rust-lang/getopts#30; I'm still looking for a safe argument-parsing library 😢

@sru sru added C-bug C: args labels Sep 21, 2015
@sru
Copy link
Contributor

@sru sru commented Sep 21, 2015

Caused by std::env::Args.

Using std::env::ArgsOs does not solve this issue because the get_matches_* functions take AsRef<str>, not AsRef<OsStr>. I tried to change all of them to AsRef<OsStr>, but it did not work.

One solution could be: If the argument passed is not valid unicode, we could first check if it is a valid directory or file. If it isn't, then produce error.

@WildCryptoFox
Copy link

@WildCryptoFox WildCryptoFox commented Sep 21, 2015

@remram44 Would you be okay with OsStr::to_string_lossy?

Any non-Unicode sequences are replaced with U+FFFD REPLACEMENT CHARACTER.

@remram44
Copy link
Author

@remram44 remram44 commented Sep 21, 2015

I think it's kind of weird -- in the Rust stdlib, lossy functions are clearly marked as such, and that is not the case of get_matches(). If a filename is mangled this way, the application will not access the correct pass, and thus fail with a surprising "file not found" or worse, go on creating the wrong file (with a replacement character in it). On the other hand you can't easily change your API at this point, and this behavior is better than panicking(?)

Another option is to make get_matches_safe() return an error (not panic) on invalid unicode, and get_matches() print out a readable error message instead of the default panic text.

@WildCryptoFox
Copy link

@WildCryptoFox WildCryptoFox commented Sep 21, 2015

On the other hand you can't easily change your API at this point

@remram44 cough I'm rewriting the entire clap-rs in #259, I'm open for changes like this. I am however, so far having lifetime issues when trying to use OsStr. ;-)

@WildCryptoFox
Copy link

@WildCryptoFox WildCryptoFox commented Sep 21, 2015

@remram44 Partly tested (compiles; no tests for invalid Unicode chars yet) I have to_string_lossy working in my code. Now we can use OsArgs. Note: This is for my 2.0 rewrite NOT 1.4.

I might take an attempt to porting it over to clap1.4 tomorrow.

Snapshot of current code for 2.0: https://gist.github.com/7897c09aab4186186576

  • Lossy matching prevents flags and positional arguments from containing the invalid Unicode (in raw form). While in --foo <arg>, arg may contain invalid Unicode.

I'm out for the night. zzZZZ

@kbknapp
Copy link
Member

@kbknapp kbknapp commented Sep 21, 2015

@remram44 Returning a Result on invalid unicode should be an easy addition. If this works for you we can probably have this added by tomorrow with a new version out to crates.io

It would also be easy to add a *_lossy() version to the get_matches() family. We're also open for some discussion on what clap2x's unicode handling will look like as stated by @james-darkfox

@kbknapp kbknapp added C-enhancement D: easy labels Sep 21, 2015
@sru
Copy link
Contributor

@sru sru commented Sep 22, 2015

@kbknapp If we were to add this, we have to change the function signature of get_matches families to take iterator of AsRef<OsStr>. Is that okay?

@kbknapp
Copy link
Member

@kbknapp kbknapp commented Sep 22, 2015

@sru Yep that's exactly what I did in #265 which doesn't affect things too heavily. I compared benches before and after, and no tests had to be changed to accommodate.

I've heard it said that, "All strs are OsStrs, but not all OsStrs are strs"...if you like logic questions 😉

@kbknapp
Copy link
Member

@kbknapp kbknapp commented Sep 22, 2015

@remram44 #265 allows returning a Result Err on invalid unicode chars, or an optional lossy version. Granted this doesn't help with things like path/file names which have invalid unicode, but it's better than the current state of clap because it won't panic! unless you don't care about it or want it to panic! on invalid unicode.

@WildCryptoFox
Copy link

@WildCryptoFox WildCryptoFox commented Sep 22, 2015

@kbknapp Just a tip: str and OsStr both implement AsRef<OsStr>

@remram44
Copy link
Author

@remram44 remram44 commented Sep 22, 2015

OsStr doesn't implement AsRef<str>, no.

homu added a commit that referenced this issue Sep 22, 2015
Issue 262 - Result and Lossy options for invalid Unicode

This allows not `panic!`ing on invalid unicode characters, and the option of returning lossy results. This does **not** turn all results into an `OsStr` preserving invalid unicode.

If this helps with #262 we can merge (after review), otherwise we'll continue to discuss if there's a better way to handle this issue.
@WildCryptoFox
Copy link

@WildCryptoFox WildCryptoFox commented Sep 22, 2015

Oops. My haste. Sorry. edited previous comment Thanks.

@kbknapp
Copy link
Member

@kbknapp kbknapp commented Sep 22, 2015

@remram44 try the latest master and let us know if it works for your use case. If so, we'll publish a new version to crates.io, close this issue, and open a more general unicode handling tracking issue.

@kbknapp
Copy link
Member

@kbknapp kbknapp commented Sep 22, 2015

@remram44 v1.4.1 is out on crates.io now which adds the safer *_safe() methods and *_lossy() version of the get_matches family.

See #269 for general invalid unicode discussion for the future

@kbknapp kbknapp closed this as completed Sep 25, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug C-enhancement
Projects
None yet
Development

No branches or pull requests

4 participants