Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Prototype] Python stubs generation #2379

Closed
wants to merge 6 commits into from

Conversation

CLOVIS-AI
Copy link
Contributor

Hi, this is a prototype to see if it's possible to generate a Python stub (.pyi) automatically from PyO3.

This is a very early prototype and definitely should not be merged as-is, but I'd be grateful for feedback on how to improve it.

The idea would be to generate parts of the stub as PyO3 parses the different classes, then to use an external program (e.g. included in Maturin) to combine the parts into a single file. Currently the parts are printed to stdout for ease of debugging.

@davidhewitt
Copy link
Member

Thanks for working on this!

Have you got an example of a tool that would process these outputs into .pyi files?

As this branch progresses I'd prefer it to be part of the #]pyo3(signature)] rather than a new annotation if it makes sense. We can perhaps discuss that once #2302 is merged (hopefully only a couple of weeks off).

@@ -0,0 +1,67 @@
use syn::{GenericArgument, PathArguments, Type};

pub fn map_rust_type_to_python(rust: &Type) -> String {
Copy link
Member

@davidhewitt davidhewitt May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than embedding this information into the macro, it would be cool if we could somehow use a trait to calculate this. I think that'd mean code generating a function (maybe as part of #[pymodule]) which could be run to build stubs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but I have no idea where to extract that information from.
I liked the compile-time approach because it would be easier to transparently include into Maturin (it could automatically enable the cargo feature, collect the generated files, merge them together and continue on without any code modifications), but I'm not yet sure that all information is available then.

Is your idea something like this?

trait PythonType {
    fn python_type() -> &'static str
}

It would probably be easy to implement for all built-in PyO3 types, and I guess #[pyclass] and #[derive(FromPyObject)] can generate it as well.

I'm not sure how the other needed information (full method signature, etc) can be passed to the runtime though.

@CLOVIS-AI
Copy link
Contributor Author

Have you got an example of a tool that would process these outputs into .pyi files?

After including the .pyi in the Maturin package (the docs are really good 😄), you can import your extension from Python.
If you run MyPy on that, it will use the stub internally (e.g. if you call a function not present in the stub, it will display a warning, if you call a function with different arguments/names/types than declared in the stub, it will display a warning, etc).

As this branch progresses I'd prefer it to be part of the #]pyo3(signature)] rather than a new annotation if it makes sense.

Originally I thought it would be good to have something like text_signature, but on second thought parsing it seems like it will be quite difficult. If there is a way to extract which Rust type corresponds to which Python type, type_signature becomes completely useless. I haven't looked at the signature PR yet so I don't know how it would fit, but the syntax of the type declarations (assuming one is needed) is not really important to me as long as it's easy to find and clear to read.

@CLOVIS-AI
Copy link
Contributor Author

Looking at the signature PR, something like signature(a: "int") could probably be used to override the stub.

@davidhewitt
Copy link
Member

Sorry for the delay. Do you mean that this generates the .pyi directly from the proc macros? How does it handle exported pymodule structure?

@CLOVIS-AI
Copy link
Contributor Author

Do you mean that this generates the .pyi directly from the proc macros?

Yes. It generates pyi fragments that are unordered (they all start with # <RustClassName>). An external program is necessary to order them though (e.g. a Maturin flag).

Currently, it is able to export most of the class and method/function/attribute signatures, however the conversion from Rust types to Python types is hard-coded (and really ugly).

How does it handle exported pymodule structure?

At the moment, it doesn't. I hope that it's possible to get the structure at compile-time, but I doubt it is.

Ideally, I'd like:
• to be able to associate any rust type with a python type,
• to discover the module structure somehow

@davidhewitt
Copy link
Member

Yes. It generates pyi fragments that are unordered (they all start with # ). An external program is necessary to order them though (e.g. a Maturin flag).

Ah, cool. I've thought about doing similar in the past, though I got a bit spooked. How does it handle developers making edits? Do the fragments have to be removed manually before each build?

to be able to associate any rust type with a python type

I think the only way that this can be done reliably is to make a trait, and instead of generating fragments directly from the macros, generate code which then uses that trait to generate the fragments.

Maybe we can also solve the module discovery at the same time, by writing a configuration file which says what #[pymodule] to start with. I'm wondering about something like the following:

  • We morph this a bit to instead have a binary tool in the PyO3 crate.
  • The tool does the following:
    • Reads a configuration file with module structure and target Rust source code.
    • Generates code using the parsing that we've already got here (maybe with a bit of extra work to handle module discovery etc.), and then compiles that and runs it.
    • The output of running that code is a complete type annotation file.

What do you think of that?

@CLOVIS-AI
Copy link
Contributor Author

If it has to work at runtime, would it be easier to just have a function PyModule::generateStubs(&self, File) and let the user call it themselves in their unit tests or whatever?

I'm not really a fan of duplicating the module structure in code and in configuration, but I don't really know if we have a choice either.

@davidhewitt
Copy link
Member

let the user call it themselves in their unit tests or whatever?

TBH that may well be the right approach, that way the user can check the generated .pyi file into VCS. 👍

Reminds me of https://matklad.github.io/2022/03/26/self-modifying-code.html (at least the part of running generation at test time).

Having the file permanently present will probably work very nicely for development (rather than by generating at install time, local development may not work so well depending on how things are set up).

In cases where the generated .pyi isn't perfect we could potentially even have a mechanism to let user specify overrides or additional stuff to include in the output as part of the test.

@CLOVIS-AI
Copy link
Contributor Author

That sounds like a good solution for user interaction. Now the two missing problems...

I do not know how stubs handle modules. My understanding is that the recommended way is to something like this:

project/
  first.py
  first.pyi
  second.py
  second.pyi
  third/
    __init__.py
    __init__.pyi (???)

What's the structure expected by Maturin?

And the other question would be how to extract the information we want from the Rust code. Creating a trait seems fine but I haven't really used proc-macros before, any recommendations on how to get started?

@CLOVIS-AI
Copy link
Contributor Author

After looking at it some more, the main problem is linking the function metadata with their class.

I can statically generate the required data for each function, but I don't see how to give it to the class.
It seems like it's a similar problem to the multiple-pymethods trick, and it has the same problems you had.

@CLOVIS-AI
Copy link
Contributor Author

Closing in favor of #2447 which is much cleaner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants