Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove most unsafe code from identifier.rs. #271

Closed
wants to merge 1 commit into from
Closed

Remove most unsafe code from identifier.rs. #271

wants to merge 1 commit into from

Conversation

Nilstrieb
Copy link

@Nilstrieb Nilstrieb commented Apr 2, 2022

The old implementation, while being really efficient and fast, was very unsafe and had problems with miri's raw pointer tagging.

This rewrite does come at a performance cost (0.25ns -> 0.34ns in the prerelease parsing test), but I don't think that this will be noticeable anywhere.

The only remaining unsafe code is a str::from_utf8_unchecked to convert the inline array to a str efficiently.

I would understand it if you don't want to merge this because of the performance hit, but I'd still like to hear your opinion on it. Another alternative solution would be to put the whole inline pointer sized string into a new crate and use it.

@Nilstrieb
Copy link
Author

Ah, it doesn't build on 1.31, I'm gonna fix that really quick.

The old implementation, while being really efficient and fast, was very unsafe and had problems with miri's raw pointer tagging.

This rewrite does come at a performance cost (0.25ns -> 0.34ns in the prerelease parsing test), but I don't think that this will be noticeable anywhere.

The only remaining unsafe code is a `str::from_utf8_unchecked` to convert the inline array to a `str` efficiently.
Copy link
Owner

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be willing to consider a PR that makes this code friendlier to miri's raw pointer tagging, but removing unsafe by itself is not a goal, so I am going to close this PR.

Regarding performance, this change increases the size of Prerelease and BuildMetadata by 3x and the size of Version by 1.8x. You'd mostly see the impact not on repeatedly parsing a single version string into a Version on the stack (which is what you measured), but in programs that operate on enormous numbers of Version objects, like some use cases of https://github.com/dtolnay/db-dump, and maybe https://github.com/dtolnay/cargo-tally. They'd reach higher peak memory usage, spend more time copying, and experience less spatial locality.

In any case, if the original data structure can't be made compatible with raw pointer tagging, that is the opposite of a reason to get rid of it. That is what makes it interesting and useful for informing miri design and standard library design; see rust-lang/miri#1936.

@dtolnay dtolnay closed this Apr 2, 2022
@Nilstrieb
Copy link
Author

Thanks for the quick response! I wasn't aware of those projects that use so many versions, this change would definitely impact them negatively. Making the original structure compatible with raw pointer tagging should be trivial on 64 bit, but 32 bit or even 16 bit are more tricky.

Repository owner locked and limited conversation to collaborators Jun 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants