Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the last unsafe block #1852

Merged
merged 4 commits into from
Apr 23, 2020
Merged

Remove the last unsafe block #1852

merged 4 commits into from
Apr 23, 2020

Conversation

dylni
Copy link
Contributor

@dylni dylni commented Apr 22, 2020

Closes #1594

This was relatively clean. All uses were compatible except one. Another is now improved to not truncate. But, most changes only replace OsStr with ArgStr to avoid repeated conversions.

Notes:

  • The MSRV for os_str_bytes is listed here. For common platforms, it's 1.32.0. Others didn't provide OsStrExt or weren't supported by Rust until more recently.
  • There's a minor difference in how lossy conversions are done. OsStr has its own implementation that was used previously and adds less REPLACEMENT_CHARACTERs. To give the same result now would require converting to OsStr before the lossy conversion, which would be slower.
  • Should osstringext.rs be renamed to argstr.rs?

@CreepySkeleton
Copy link
Contributor

I'll give it a proper review tomorrow. Past-midnight reviews may be fun but they aren't particularly correct ;)

Should osstringext.rs be renamed to argstr.rs?

Yes, please.

The MSRV for os_str_bytes is listed here. For common platforms, it's 1.32.0. Others didn't provide OsStrExt or weren't supported by Rust until more recently.

If I get it right, the only platform > 1.40 (current clap MSRV) is the one rust doesn't support yet, not officially. If that's so, we can proceed as is.

There's a minor difference in how lossy conversions are done. OsStr has its own implementation that was used previously and adds less REPLACEMENT_CHARACTERs. To give the same result now would require converting to OsStr before the lossy conversion, which would be slower.

Nah. The most common usage for *lossy* - I may be wrong here - is "got some data that's mostly UTF-8 (or almost guaranteed to be), need to print it to TTY/console in a somewhat readable way, a bit of mangling is fine". The worst offender here is error messages, filepaths in particular. In this case, IO will take 99.99% of time almost definitely. Anyway, we can optimize it further if somebody comes complaining about performance; I believe copying is fine here.

I wonder what you mean by "adds less REPLACEMENT_CHARACTERs"? I always thought that the replacement symbol is used like: "every byte from input that is not a part of a valid codepoint will be replaced with �", so every implementation must insert the same number of them given the same input is provided. Am I wrong?

@dylni
Copy link
Contributor Author

dylni commented Apr 22, 2020

Past-midnight reviews may be fun but they aren't particularly correct ;)

Agreed ;)

If I get it right, the only platform > 1.40 (current clap MSRV) is the one rust doesn't support yet, not officially.

I think you're right that HermitCore isn't officially supported yet. Either way, the limitation is only due to the availability of std::os::unix::ffi::OsStrExt. Since that trait was already used, I don't think the platform has been supported by Clap up to this point.

Anyway, we can optimize it further if somebody comes complaining about performance; I believe copying is fine here.

I'm not sure if this is suggesting that I should convert it to OsStr first. The current implementation does not copy unnecessarily, but it changes the output as I show below.

I wonder what you mean by "adds less REPLACEMENT_CHARACTERs"?

There's a difference in when OsString adds replacement characters and when String does, but I don't think their behavior is guaranteed.

Take b"\xED\xA0\xBD" as an example. This is the encoded form of the wide character 0xD83D. Yet, this code will panic on Windows, because different numbers of REPLACEMENT_CHARACTERs are added:

use std::ffi::OsString;
use std::os::windows::ffi::OsStringExt;

fn main() {
    assert_eq!(
        String::from_utf8_lossy(b"\xED\xA0\xBD"),
        OsString::from_wide(&[0xD83D]).to_string_lossy(),
    );
}
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `"���"`,
 right: `"�"`', src/main.rs:5:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Copy link
Member

@pksunkara pksunkara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. But I will wait for @CreepySkeleton to review it. I have one question though, it looks like the ArgStr is a pretty common abstraction that can be used in multiple other packages. Could we extract it out?

@CreepySkeleton
Copy link
Contributor

I'm not sure if this is suggesting that I should convert it to OsStr first. The current implementation does not copy unnecessarily, but it changes the output as I show below.

Never mind. For some reason I thought it always copies the contents, but I would be OK with that too. Probably has something to do with post-midnight.

There's a difference in when OsString adds replacement characters and when String does, but I don't think their behavior is guaranteed.

Well, if it looks like implementations are allowed to behave frivolous here and, since I doubt someone out there is going to rely on precise number of �, we can do as we please. Let's follow the path of least resistance and leave it as is.

Copy link
Contributor

@CreepySkeleton CreepySkeleton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is marvelous! Very clean and fleshed out. Your crate is amazing.

src/parse/parser.rs Show resolved Hide resolved
@CreepySkeleton
Copy link
Contributor

ArgStr is a pretty common abstraction that can be used in multiple other packages. Could we extract it out?

Not much point. It's just a tiny wrapper around Cow.

@dylni could you please rebase onto mater?

@dylni
Copy link
Contributor Author

dylni commented Apr 23, 2020

@pksunkara

I have one question though, it looks like the ArgStr is a pretty common abstraction that can be used in multiple other packages. Could we extract it out?

I'm considering creating a wrapper crate. I've typically used bstr to reduce repetition, but I can see that a few edge cases aren't intuitive to handle correctly. This would take time, which I don't have much of currently, but I can let you know if I publish something that makes this easier.

@dylni
Copy link
Contributor Author

dylni commented Apr 23, 2020

@CreepySkeleton

Well, if it looks like implementations are allowed to behave frivolous here and, since I doubt someone out there is going to rely on precise number of �, we can do as we please.

Agreed.

This is marvelous! Very clean and fleshed out. Your crate is amazing.

Thanks! This is always great feedback to receive.

could you please rebase onto mater?

Done.

@pksunkara
Copy link
Member

@dylni Please do keep me updated about it.

bors r+

@bors
Copy link
Contributor

bors bot commented Apr 23, 2020

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OsStrExt3::from_bytes should be unsafe
3 participants