Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c_str! NUL checks and CBounedStr #258

Draft
wants to merge 1 commit into
base: rust
Choose a base branch
from

Conversation

nbdd0121
Copy link
Member

@nbdd0121 nbdd0121 commented May 9, 2021

Implement CBoundedStr<N>.

Depends on #257
Depends on #273

@nbdd0121
Copy link
Member Author

nbdd0121 commented May 9, 2021

@ojeda I've seen a weird rustdoc triple error in https://github.com/Rust-for-Linux/linux/runs/2537235143?check_suite_focus=true. I then tried to add --target flags to rustdoc, but I got a mysterious

error[E0461]: couldn't find crate `core` with expected target triple target-9716764123553335401
  |
  = note: the following crate versions were found:
          crate `core`, target triple target-10667598549189129223: /home/gary/rust-for-linux/rust/libcore.rmeta

Which doesn't make any sense to me since the same target json is specified but somehow rustc and rustdoc uses different hash. Do you have any clue?

@wedsonaf
Copy link
Member

@nbdd0121, I'm having a hard time convincing myself the extra complexity is worth the benefits they're bringing.

(I also think we should only resort in proc macros when no other reasonable alternatives exist.)

Would you mind sharing more of your thoughts on this?

@nbdd0121
Copy link
Member Author

@nbdd0121, I'm having a hard time convincing myself the extra complexity is worth the benefits they're bringing.

(I also think we should only resort in proc macros when no other reasonable alternatives exist.)

Would you mind sharing more of your thoughts on this?

A few reasons:

  • CStr must not have any interior NULs. Current c_str macro cannot check it, as both string and byte strings can have interior NULs.
  • CStr should be any bytes terminated by NUL. This means CStr is more like byte string than a string. concat! macros do not work with byte strings.
  • The new proc macro introduced in this PR accepts both string and byte string, it performs validity checks and produce a byte string. Byte string has an a inherent advantage that its type is &[u8; N] instead of str, so the length can be checked at compile time.

As for complexity, I am doing a refactoring to rename libmodule to libmacros and move c_str macros there, Hopefully after refactoring the structure will be less complex.

@nbdd0121
Copy link
Member Author

Also note that the module! macro will eventually have to parse strings and byte strings, e.g. to handle non-ASCII author names. So the infrastructure introduced in this PR really is going to be shared.

@wedsonaf
Copy link
Member

@nbdd0121, I'm having a hard time convincing myself the extra complexity is worth the benefits they're bringing.
(I also think we should only resort in proc macros when no other reasonable alternatives exist.)
Would you mind sharing more of your thoughts on this?

A few reasons:

  • CStr must not have any interior NULs. Current c_str macro cannot check it, as both string and byte strings can have interior NULs.

This is indeed nice, but I don't think it is essential, especially because it does not affect safety (even though it of course may affect correctness).

  • CStr should be any bytes terminated by NUL. This means CStr is more like byte string than a string. concat! macros do not work with byte strings.

Also nice to have, but do we have a case where we'd like to use concat! but can't?

  • The new proc macro introduced in this PR accepts both string and byte string, it performs validity checks and produce a byte string. Byte string has an a inherent advantage that its type is &[u8; N] instead of str, so the length can be checked at compile time.

Where do we need this? The descriptor table in devices?

I'm still not convinced these niceties are worth the extra complexity. I'm especially concerned about when we're trying to push this upstream: I'd rather avoid turning reviewers off if they look at this and think "Woah, we need this much extra code to handle C strings?". (I would certainly feel this way, and do a little bit about modules too...)

(Also, apologies for encouraging you to do the work before a discussion [that you rightfully sought in the meeting]. Next time we should discuss beforehand, perhaps on the mailing list or a Github issue before spending time on the implementation.)

@nbdd0121
Copy link
Member Author

nbdd0121 commented May 11, 2021

Also nice to have, but do we have a case where we'd like to use concat! but can't?

The restriction of concat! causes CStr to wrap around str instead of [u8]. What if C side gives Rust a string? Do we want to pay the cost of checking it is UTF-8 even though we don't really need to? What if C gives Rust side a non-UTF8 string?

Where do we need this? The descriptor table in devices?

of_device_id for example. Currently there is no way to initialize it during compilation. With append_nul and CBoundStr, you can do:

static DEVICE_ID: of_device_id  = of_device_id {
   compatible: c_bounded_str!("some string").expand_to_char_array()
   /* ... */
}

(or even define another macro that expand to this, say padded_char_array!("some string"))

I'm still not convinced these niceties are worth the extra complexity. I'm especially concerned about when we're trying to push this upstream: I'd rather avoid turning reviewers off if they look at this and think "Woah, we need this much extra code to handle C strings?". (I would certainly feel this way, and do a little bit about modules too...)

It's actually just a few lines. I'ved pushed the refactored code (WIP), you can see it is just a few lines. Much of the code is the infrastructure. With these infrastructure we can actually simplify the module! macro and make it less complex.

@nbdd0121
Copy link
Member Author

Some additional justification for vendoring buffer and lit from syn:

@nbdd0121
Copy link
Member Author

Will split into multiple PRs.

@nbdd0121 nbdd0121 marked this pull request as draft May 17, 2021 18:26
@nbdd0121 nbdd0121 mentioned this pull request May 17, 2021
@nbdd0121 nbdd0121 changed the title CStr overhaul c_str! NUL checks and CBounedStr May 17, 2021
@ksquirrel

This comment has been minimized.

@ksquirrel

This comment has been minimized.

@TheSven73
Copy link
Collaborator

I tested your PR by merging it into #276. Consider me as this PR's very first customer. Customer is king :) It allows me to eliminate the runtime check that verifies whether the string fits in 128 characters. Also, the unsafe call to transmute disappears. Sweet.

A few comments:

  1. Something odd. After merging your PR, it looks like kernel.o is built twice! I noticed this because the Raspberry Pi defconfig generates a warning in the spinlock Rust abstraction. I now see that warning twice.
  2. As a user of CBoundedStr, I find the use of relax_bound() somewhat counter-intuitive. In my use case, I create a small string, then expand it into a bounded string of length 128. The API requires an explicit call to relax_bound():
let of_match_tbl = OfMatchTable::new(c_bounded_str!("brcm,bcm2835-rng").relax_bound())?;

It would be more intuitive if there were no need for relax_bound() at all. From the viewpoint of a user, it should just squeeze into the required final bound at the end of the chain, and complain at build time if there's an issue with that.

  1. When I specify a string larger than the bound (128 characters in my case), I see a cryptic link time error. Is this the build time check kicking in?
ld.lld: error: undefined symbol: build_error
>>> referenced by str.rs:322 (/home/sva/repos/kernel-rust/rust/kernel/str.rs:322)
>>>               char/hw_random/bcm2835_rng_rust.o:(_RINvMs5_NtCsbDqzXfLQacH_6kernel3strINtB6_11CBoundedStrKj9a_E11relax_boundKj80_ECsbDqzXfLQacH_16bcm2835_rng_rust) in archive drivers/built-in.a
make: *** [Makefile:1285: vmlinux] Error 1

@bjorn3
Copy link
Member

bjorn3 commented May 20, 2021

When I specify a string larger than the bound (128 characters in my case), I see a cryptic link time error. Is this the build time check kicking in?

I think so. It might make sense to allow using a custom name for the non-existent symbol that is referenced to cause the link error. If using a linker that accepts arbitrary symbol names, maybe even include the assertion itself.

As a user of CBoundedStr, I find the use of relax_bound() somewhat counter-intuitive. In my use case, I create a small string, then expand it into a bounded string of length 128. The API requires an explicit call to relax_bound():

Would the following work?

let of_match_tbl = OfMatchTable::new(c_bounded_str!(_, "brcm,bcm2835-rng"))?;

The _ should infer to the right size. If not you can try explicitly specifying the right size.

@TheSven73
Copy link
Collaborator

TheSven73 commented May 20, 2021

The _ doesn't work, unfortunately. The macro doesn't expect it. I don't particularly want to specify the bound explicitly - that should be up to the receiving function... At least, as an entitled customer, I feel it should be :)

@ksquirrel

This comment has been minimized.

@nbdd0121
Copy link
Member Author

  1. As a user of CBoundedStr, I find the use of relax_bound() somewhat counter-intuitive. In my use case, I create a small string, then expand it into a bounded string of length 128. The API requires an explicit call to relax_bound():
let of_match_tbl = OfMatchTable::new(c_bounded_str!("brcm,bcm2835-rng").relax_bound())?;

It would be more intuitive if there were no need for relax_bound() at all. From the viewpoint of a user, it should just squeeze into the required final bound at the end of the chain, and complain at build time if there's an issue with that.

The _ doesn't work, unfortunately. The macro doesn't expect it. I don't particularly want to specify the bound explicitly - that should be up to the receiving function... At least, as an entitled customer, I feel it should be :)

Rust can only use _ for inferring types but constant generic parameters 😢. But I have added a dedicated macro rule so you can now write c_bounded_str!(_, "blah") 😃.

  1. When I specify a string larger than the bound (128 characters in my case), I see a cryptic link time error. Is this the build time check kicking in?
ld.lld: error: undefined symbol: build_error
>>> referenced by str.rs:322 (/home/sva/repos/kernel-rust/rust/kernel/str.rs:322)
>>>               char/hw_random/bcm2835_rng_rust.o:(_RINvMs5_NtCsbDqzXfLQacH_6kernel3strINtB6_11CBoundedStrKj9a_E11relax_boundKj80_ECsbDqzXfLQacH_16bcm2835_rng_rust) in archive drivers/built-in.a
make: *** [Makefile:1285: vmlinux] Error 1

This is indeed a bit cryptic, but at least it provides some info about the crate and function that fails the assertion. If you specify manually the bound in c_bounded_str! or otherwise use the value in constant context (e.g. when initializing a static variable), you should see error from rustc instead, which is way clearer.

Perhaps in the future we can have a tool that extracts the callgraph from disassembly or debuginfo and gives you a static stack trace, but that's a large amount of work and probably wouldn't worth the effort until we're upstreamed :)

@TheSven73
Copy link
Collaborator

Rust can only use _ for inferring types but constant generic parameters . But I have added a dedicated macro rule so you can now write c_bounded_str!(_, "blah") .

Speaking purely from a user's POV, I would prefer to see relax_bound() happen by default. It's great to simply write a string, and let the receiving function determine how much padding should be added. Not sure how much trouble that would cause for you...

This is indeed a bit cryptic, but at least it provides some info about the crate and function that fails the assertion.

True! TBH I'm very happy that this works. Good build time checks are awesome!

@ksquirrel

This comment has been minimized.

rust/kernel/str.rs Outdated Show resolved Hide resolved
rust/kernel/str.rs Outdated Show resolved Hide resolved
rust/kernel/str.rs Outdated Show resolved Hide resolved
rust/kernel/str.rs Outdated Show resolved Hide resolved
rust/kernel/str.rs Outdated Show resolved Hide resolved
rust/kernel/str.rs Outdated Show resolved Hide resolved
rust/kernel/str.rs Outdated Show resolved Hide resolved
rust/kernel/str.rs Outdated Show resolved Hide resolved
rust/kernel/str.rs Outdated Show resolved Hide resolved
rust/kernel/str.rs Outdated Show resolved Hide resolved
@ksquirrel

This comment has been minimized.

`CBoundedStr<N>` is a `CStr` with length known to be less than `N`.
It can be used in cases where a known length limit exists.

Signed-off-by: Gary Guo <gary@garyguo.net>
@ksquirrel
Copy link
Member

Review of c77d581b13e3:

  • ✔️ Commit c77d581: Looks fine!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants