New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds intern and an intern_static! macro #255

Merged
merged 2 commits into from Jul 23, 2017

Conversation

Projects
None yet
4 participants
@atheriel
Contributor

atheriel commented Jul 20, 2017

It's a common pattern in the C codebase to call intern("some static c string"). I thought that it would be nice to have a similar API available in Rust. Unfortunately, there's no way of doing this with the std::cstring interface that avoids heap-allocating and unnecessary copies. I've written a macro that avoids this by concatenating a null byte to the end of a string at compile time, and casting to a *const c_char. This allows us to write

let symbol = unsafe { intern_static!("my-symbol-1") };

This macro is, and always will be, unsafe. If you pass it a string with internal NULL bytes, e.g. intern("my-\0dangerous-\0string"), then C will not correctly deduce the length of the string, and we'll leak memory. However, for symbols defined at compile time and clearly audited by the project, I don't think this is a problem.

One could also add a safe, non-static version of intern using the std::cstring machinery.

@atheriel atheriel referenced this pull request Jul 20, 2017

Merged

Port fontp to Rust #248

/// Intern (e.g. create a symbol for) a `&'static str`, avoiding
/// allocation. This macro is inherently unsafe: if the string contains internal
/// NULL bytes (`\0`), we will violate memory safety.

This comment has been minimized.

@birkenfeld

birkenfeld Jul 20, 2017

Collaborator

Why does it violate memory safety? It would just use the shorter string since you're using strlen to determine the length. (And all you're passing is a pointer to static memory, which can't leak.)

This comment has been minimized.

@atheriel

atheriel Jul 20, 2017

Contributor

My comment here comes from my understanding is that handling of strings with internal NULL bytes is undefined behaviour in C, and from the "unsafe" denotation on the equivalent std method . You might be right, though.

This comment has been minimized.

@birkenfeld

birkenfeld Jul 20, 2017

Collaborator

Let's assume intern_1 didn't take the length; then as far as the C API is concerned the string would end at the first null byte. That's not undefined behavior, but of course not what you intended either.

The method on CString is unsafe because it maintains the invariant that its string has the same length for Rust and C consumers, so not checking must be an unsafe function.

/// NULL bytes (`\0`), we will violate memory safety.
macro_rules! intern_static {
($s:expr) => ({
let __intern_s = concat!($s, "\0") as *const str as *const [::libc::c_char] as *const ::libc::c_char;

This comment has been minimized.

@birkenfeld

birkenfeld Jul 20, 2017

Collaborator

This should be concat!(...).as_ptr() as *const libc::c_char

This comment has been minimized.

@atheriel

atheriel Jul 20, 2017

Contributor

Right.

@s-kostyaev

This comment has been minimized.

s-kostyaev commented Jul 20, 2017

This macro is, and always will be, unsafe. If you pass it a string with internal NULL bytes, e.g. intern("my-\0dangerous-\0string"), then C will not correctly deduce the length of the string, and we'll leak memory. However, for symbols defined at compile time and clearly audited by the project, I don't think this is a problem.

I'm newbe in rust, but can it be checked at compile time inside macro?

@birkenfeld

This comment has been minimized.

Collaborator

birkenfeld commented Jul 20, 2017

Actually, since intern_1 gets the length as well, I don't think we even need the trailing NULL byte.

/// NULL bytes (`\0`), we will violate memory safety.
macro_rules! intern_static {
($s:expr) => ({
let __intern_s = concat!($s, "\0") as *const str as *const [::libc::c_char] as *const ::libc::c_char;

This comment has been minimized.

@DavidDeSimone

DavidDeSimone Jul 20, 2017

Collaborator

By the way, Rust has hygienic macros. No longer do we have to name our variables in macros horrible underscore salads (unless you like the convention and did it for a reason besides to avoid shadowing at macro call sites).

This comment has been minimized.

@atheriel

atheriel Jul 20, 2017

Contributor

I did it for convention, although it has been a while since I've looked at the source code for the standard library macros. Happy to change it, though.

@atheriel

This comment has been minimized.

Contributor

atheriel commented Jul 20, 2017

Actually, since intern_1 gets the length as well, I don't think we even need the trailing NULL byte.

@birkenfeld Well, that seems obvious in retrospect. Don't know why I was overthinking it. I'll change it.

@atheriel

This comment has been minimized.

Contributor

atheriel commented Jul 20, 2017

@s-kostyaev Using a procedural macro, yes. Using the basic macro_rules!, no.

@s-kostyaev

This comment has been minimized.

s-kostyaev commented Jul 20, 2017

@atheriel Thanks for reply. Maybe would be better to use procedural macro in this case and add this compile time check? It can prevent memory leak.

@birkenfeld

This comment has been minimized.

Collaborator

birkenfeld commented Jul 20, 2017

Maybe would be better to use procedural macro in this case and add this compile time check? It can prevent memory leak.

No leaks possible in this case.

@s-kostyaev

This comment has been minimized.

s-kostyaev commented Jul 20, 2017

If you pass it a string with internal NULL bytes, e.g. intern("my-\0dangerous-\0string"), then C will not correctly deduce the length of the string, and we'll leak memory.

@birkenfeld

This comment has been minimized.

Collaborator

birkenfeld commented Jul 20, 2017

There's no "passing a string". There's only passing a pointer to a string constant in static memory.

@s-kostyaev

This comment has been minimized.

s-kostyaev commented Jul 20, 2017

Static memory can't leak. Thanks for explanation.

@atheriel

This comment has been minimized.

Contributor

atheriel commented Jul 21, 2017

I should be able to rewrite this and rebase tonight. Thanks for all the comments!

@atheriel atheriel force-pushed the atheriel:intern-static branch from 6065f1d to 466321c Jul 21, 2017

@atheriel

This comment has been minimized.

Contributor

atheriel commented Jul 21, 2017

@birkenfeld How does this look?

@birkenfeld

This comment has been minimized.

Collaborator

birkenfeld commented Jul 22, 2017

Can't intern directly take a &str, and the macro can be removed?

Adds a wrapper for intern_1.
Signed-off-by: Aaron Jacobs <atheriel@gmail.com>

@atheriel atheriel force-pushed the atheriel:intern-static branch from 466321c to a84daad Jul 22, 2017

@atheriel

This comment has been minimized.

Contributor

atheriel commented Jul 22, 2017

@birkenfeld

Ok, now it's just Travis complaining that the function is never used, which generates an error (@Wilfred how to proceed?). Otherwise looks fine!

Make intern() generic over string slice types.
Signed-off-by: Aaron Jacobs <atheriel@gmail.com>
@atheriel

This comment has been minimized.

Contributor

atheriel commented Jul 22, 2017

@birkenfeld I've fixed the dead code issue. I also realized that this function can clearly be generic, to accomodate other string-like types.

@birkenfeld

This comment has been minimized.

Collaborator

birkenfeld commented Jul 22, 2017

Nice!

@birkenfeld birkenfeld merged commit c688163 into Wilfred:master Jul 23, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@atheriel atheriel deleted the atheriel:intern-static branch Jul 23, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment