Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module!: take strings instead of byte strings #252

Open
ojeda opened this issue May 6, 2021 · 6 comments · May be fixed by #278
Open

module!: take strings instead of byte strings #252

ojeda opened this issue May 6, 2021 · 6 comments · May be fixed by #278
Labels
• lib Related to the `rust/` library.

Comments

@ojeda
Copy link
Member

ojeda commented May 6, 2021

Given it is a proc macro, we could take normal strings for the fields and ensure they are ASCII if/where needed.

This would make the interface a bit leaner.

For author, it would be particularly fitting anyway, because we would like to allow names requiring UTF-8, such as non-romanized names. There are kernel modules with MODULE_AUTHORs with non-ASCII characters already (and encoded as UTF-8) e.g.

MODULE_AUTHOR("周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>");
MODULE_AUTHOR("漆鹏振 (Qi Pengzhen) <aric.pzqi@ingenic.com>");

@ojeda ojeda added • lib Related to the `rust/` library. prio: normal good first issue Good for newcomers labels May 6, 2021
@nbdd0121
Copy link
Member

nbdd0121 commented May 9, 2021

This is actually non-trivial, because binary string constants cannot contain non-ASCII characters, so module proc_macro needs to properly parse the string literal and convert it to a binary string.

@bjorn3
Copy link
Member

bjorn3 commented May 9, 2021

str::as_bytes gives the raw UTF-8 bytes of a string. You can directly pass this to Literal::byte_string I believe. Rustc automatically escapes the individual bytes: https://github.com/rust-lang/rust/blob/bba8710616e5e4722215c0d6b27abaedca03ebad/compiler/rustc_expand/src/proc_macro_server.rs#L566-L572

@nbdd0121
Copy link
Member

nbdd0121 commented May 9, 2021

str::as_bytes gives the raw UTF-8 bytes of a string. You can directly pass this to Literal::byte_string I believe. Rustc automatically escapes the individual bytes: https://github.com/rust-lang/rust/blob/bba8710616e5e4722215c0d6b27abaedca03ebad/compiler/rustc_expand/src/proc_macro_server.rs#L566-L572

However there is no way to get a String or Vec<u8> from a Literal other than parsing it yourself. It's easy once once it's parsed. If you look at #258 you'll see that's exactly what I do.

@bjorn3
Copy link
Member

bjorn3 commented May 9, 2021

Ah, I see what you mean. If licensing would allow it copying syn or rustc_ast would be the easiest solutions.

@ojeda ojeda removed the good first issue Good for newcomers label May 11, 2021
@nbdd0121
Copy link
Member

Are there scenarios that we need to take non-UTF-8 strings?

@ojeda
Copy link
Member Author

ojeda commented May 12, 2021

I do not know, but even if we happen to need it, I do not think we should worry about that for the time being, in particular if it makes things more complex.

@nbdd0121 nbdd0121 linked a pull request May 19, 2021 that will close this issue
nbdd0121 added a commit to nbdd0121/linux that referenced this issue Aug 5, 2022
For simplicity (avoid parsing), escape sequences are not yet handled.

Link: Rust-for-Linux#252
Link: https://lore.kernel.org/lkml/YukvvPOOu8uZl7+n@yadro.com/
Signed-off-by: Gary Guo <gary@garyguo.net>
nbdd0121 added a commit to nbdd0121/linux that referenced this issue Aug 5, 2022
For simplicity (avoid parsing), escape sequences and raw string literals
are not yet handled.

Module names, aliases and license strings are restricted to ASCII only.

Link: Rust-for-Linux#252
Link: https://lore.kernel.org/lkml/YukvvPOOu8uZl7+n@yadro.com/
Signed-off-by: Gary Guo <gary@garyguo.net>
ojeda added a commit to Rust-for-Linux/rust-out-of-tree-module that referenced this issue Aug 6, 2022
See commit 593e65924b2e ("rust: take str literal instead bstr literal
in `module!` macro") in the main repository:

    For simplicity (avoid parsing), escape sequences and raw string literals
    are not yet handled.

    Module names, aliases and license strings are restricted to ASCII only.

    Signed-off-by: Gary Guo <gary@garyguo.net>

Link: Rust-for-Linux/linux#252
Link: https://lore.kernel.org/lkml/YukvvPOOu8uZl7+n@yadro.com/
Link: Rust-for-Linux/linux@593e659
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
ojeda pushed a commit to ojeda/linux that referenced this issue Nov 10, 2022
Instead of taking binary string literals, take string ones instead,
making it easier for users to define a module, i.e. instead of
calling `module!` like:

    module! {
        ...
        name: b"rust_minimal",
        ...
    }

now it is called as:

    module! {
        ...
        name: "rust_minimal",
        ...
    }

Module names, aliases and license strings are restricted to
ASCII only. However, the author and the description allows UTF-8.

For simplicity (avoid parsing), escape sequences and raw string
literals are not yet handled.

Link: Rust-for-Linux#252
Link: https://lore.kernel.org/lkml/YukvvPOOu8uZl7+n@yadro.com/
Signed-off-by: Gary Guo <gary@garyguo.net>
[Reworded, adapted for upstream and applied latest changes]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
ojeda pushed a commit to ojeda/linux that referenced this issue Nov 10, 2022
Instead of taking binary string literals, take string ones instead,
making it easier for users to define a module, i.e. instead of
calling `module!` like:

    module! {
        ...
        name: b"rust_minimal",
        ...
    }

now it is called as:

    module! {
        ...
        name: "rust_minimal",
        ...
    }

Module names, aliases and license strings are restricted to
ASCII only. However, the author and the description allows UTF-8.

For simplicity (avoid parsing), escape sequences and raw string
literals are not yet handled.

Link: Rust-for-Linux#252
Link: https://lore.kernel.org/lkml/YukvvPOOu8uZl7+n@yadro.com/
Signed-off-by: Gary Guo <gary@garyguo.net>
[Reworded, adapted for upstream and applied latest changes]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Nov 10, 2022
Instead of taking binary string literals, take string ones instead,
making it easier for users to define a module, i.e. instead of
calling `module!` like:

    module! {
        ...
        name: b"rust_minimal",
        ...
    }

now it is called as:

    module! {
        ...
        name: "rust_minimal",
        ...
    }

Module names, aliases and license strings are restricted to
ASCII only. However, the author and the description allows UTF-8.

For simplicity (avoid parsing), escape sequences and raw string
literals are not yet handled.

Link: Rust-for-Linux#252
Link: https://lore.kernel.org/lkml/YukvvPOOu8uZl7+n@yadro.com/
Signed-off-by: Gary Guo <gary@garyguo.net>
[Reworded, adapted for upstream and applied latest changes]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
ojeda pushed a commit to ojeda/linux that referenced this issue Dec 2, 2022
Instead of taking binary string literals, take string ones instead,
making it easier for users to define a module, i.e. instead of
calling `module!` like:

    module! {
        ...
        name: b"rust_minimal",
        ...
    }

now it is called as:

    module! {
        ...
        name: "rust_minimal",
        ...
    }

Module names, aliases and license strings are restricted to
ASCII only. However, the author and the description allows UTF-8.

For simplicity (avoid parsing), escape sequences and raw string
literals are not yet handled.

Link: Rust-for-Linux#252
Link: https://lore.kernel.org/lkml/YukvvPOOu8uZl7+n@yadro.com/
Signed-off-by: Gary Guo <gary@garyguo.net>
[Reworded, adapted for upstream and applied latest changes]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Dec 2, 2022
Instead of taking binary string literals, take string ones instead,
making it easier for users to define a module, i.e. instead of
calling `module!` like:

    module! {
        ...
        name: b"rust_minimal",
        ...
    }

now it is called as:

    module! {
        ...
        name: "rust_minimal",
        ...
    }

Module names, aliases and license strings are restricted to
ASCII only. However, the author and the description allows UTF-8.

For simplicity (avoid parsing), escape sequences and raw string
literals are not yet handled.

Link: Rust-for-Linux#252
Link: https://lore.kernel.org/lkml/YukvvPOOu8uZl7+n@yadro.com/
Signed-off-by: Gary Guo <gary@garyguo.net>
[Reworded, adapted for upstream and applied latest changes]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
ojeda pushed a commit to ojeda/linux that referenced this issue Dec 4, 2022
Instead of taking binary string literals, take string ones instead,
making it easier for users to define a module, i.e. instead of
calling `module!` like:

    module! {
        ...
        name: b"rust_minimal",
        ...
    }

now it is called as:

    module! {
        ...
        name: "rust_minimal",
        ...
    }

Module names, aliases and license strings are restricted to
ASCII only. However, the author and the description allows UTF-8.

For simplicity (avoid parsing), escape sequences and raw string
literals are not yet handled.

Link: Rust-for-Linux#252
Link: https://lore.kernel.org/lkml/YukvvPOOu8uZl7+n@yadro.com/
Signed-off-by: Gary Guo <gary@garyguo.net>
[Reworded, adapted for upstream and applied latest changes]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
• lib Related to the `rust/` library.
Development

Successfully merging a pull request may close this issue.

3 participants