Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancements to autodoc/embedsignature #5415

Merged
merged 2 commits into from
May 18, 2023
Merged

Conversation

dalcinl
Copy link
Member

@dalcinl dalcinl commented May 1, 2023

Add embedsignature.pure compiler directive to generate pure-Python type annotations compatible with type hinting syntax. This mostly amounts to:

  • Not emitting C type declarations in signatures.
  • Preferring explicit argument type annotations over C types.
  • Mapping C type names to the closest Python type name.

See #3150

@dalcinl dalcinl requested review from scoder and da-woods May 1, 2023 09:13
@dalcinl dalcinl self-assigned this May 1, 2023
@dalcinl
Copy link
Member Author

dalcinl commented May 1, 2023

Note: The changes from this PR are opt-in: code that sets embedsignature.pure = False (or do not set it at all, False is the default value) should not see any differences in the embedded signatures.

Copy link
Contributor

@scoder scoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems cool overall. It's definitely an opt-in change, given that a lot of type information is lost here.

We probably need a test for passing PyLong values into a fused function in Py2, seems to be missing.

Cython/Compiler/PyrexTypes.py Show resolved Hide resolved
Cython/Compiler/AutoDocTransforms.py Outdated Show resolved Hide resolved
@dalcinl
Copy link
Member Author

dalcinl commented May 2, 2023

It's definitely an opt-in change, given that a lot of type information is lost here.

My main motivation is to emit something that would be valid Python syntax and compatible with type hints, I'm using these pure signatures to generate *.pyi stubs, see here . That means we cannot emit C type names as is. And I think you agree that honoring explicit type annotations is OK. Then what remains is how to map a C type to some syntactically valid Python identifier/expression.

I decided to map C integral/real-floating/complex-floating to Python int/float/complex type names. But we could do things differently. For example, map long double complex to a underscorified identifier like long_double_complex , or whatever rule you like, I'm totally open to suggestions. Would you prefer I do it that way?

@scoder
Copy link
Contributor

scoder commented May 2, 2023

I'm using these pure signatures to generate *.pyi stubs, see here .

Cool. It's usually a good idea to begin generated files with a comment like "# This file was generated by …", to avoid accidental manual changes after the fact, and to leave a note how to regenerate the file on changes. (There's an additional debate whether generated files should be under version control, but I doubt that there's a general answer to that question.)

what remains is how to map a C type to some syntactically valid Python identifier/expression.

Not just syntactically valid, also semantically meaningful. Mapping C integer types to Python int is quite restrictive, for example, because we call PyNumber_Index() on conversion internally, so valid input arguments are not just int but any integer-like type. That's probably worth expressing here. The closest is probably SupportsIndex.

EDIT: Actually, I think we even accept Python float values for C integer types and vice versa. So the input type range for numeric C types is basically any number.

@scoder
Copy link
Contributor

scoder commented May 2, 2023

Also note that return types are very different from argument types. If the C return type is known, then we can map it to an exact Python type and write that as return type annotation. If a C argument type is given, then the C-API conversion rules apply and the range of Python types that we accept for it may be broader. It would be great to express that in the generated type annotations.

There's also a possible mapping from func(list arg=None) to func(arg: Optional[list] = None). The list of beneficial mappings is probably not that short.

@scoder
Copy link
Contributor

scoder commented May 2, 2023

You're essentially working on #3150

@dalcinl
Copy link
Member Author

dalcinl commented May 2, 2023

It's usually a good idea to begin generated files with a comment like "# This file was generated by …",

I almost never get contributions for low-level stuff, so this has not been an issue, but you are definitely right.

There's an additional debate whether generated files should be under version control

The generation step involved compiled code (importing the ext module), and that complicates matters with cross-compilation. I have CI tests that make sure that the checked-in code is always in sync with the generated output.

Not just syntactically valid, also semantically meaningful.

Of course, but the "semantically meaningful" part is quite hard in the duck-typing scenario.

That's probably worth expressing here. The closest is probably SupportsIndex.

You are right, of course, but maybe practicality beats purity? I don't thing the stdlib stubs from typeshed are 100 % accurate either.

EDIT: Actually, I think we even accept Python float values for C integer types and vice versa. So the input type range for numeric C types is basically any number.

Yes, we accept float for C integer types, because at some point we fall back to PyNumber_Int(x) (as long as x is not str). I do not like and never liked this behavior, the conversion is two broad and lossy (2.5 -> 2), explicit is better than implicit, ect., but complaints without patches are pointless (not to mention major backward compatibility concerns).

There's also a possible mapping from func(list arg=None) to func(arg: Optional[list] = None)

I didn't want to go there just yet, we also have the new generic alias syntax func(arg: list | None = None). For the time being, users are expected to add the full annotation, e.g. func(list arg: list[int] | None = None). Then embedsignature.pure=True will output func(list arg: list[int] | None = None).

You're essentially working on #3150

Well, my changes here are much humble. This PR does not pretend to solve all the issues with typing stub generation, but it is nonetheless a small step forward while keeping the focus on this being about docstrings and signatures.

This was a rather long reply. At this point I'm a bit confused whether you are OK with this PR as it is now, of there are strictly necessary changes that you want me incorporating. Your concerns about specializing py_type_name() were addressed. Tests for Python 2 long where added. Anything I should look at or anything concrete you want me to change? Otherwise, could you please confirm whether your approve this PR?

@scoder
Copy link
Contributor

scoder commented May 3, 2023

the "semantically meaningful" part is quite hard in the duck-typing scenario.

Absolutely. And that's the only real issue that I see here. If we settle on a specific kind of output, then users will start relying on it, and it will be difficult to change in the future.

The main problem here are the different audiences. Type annotations are for humans, whereas static types are for compilers. Mostly. Just because Cython can generate code for a type conversion does not mean that it's intended to be used that way.

For correctness, the type annotations that we generate would have to be as broad as possible, meaning: any index instead of int, any number instead of float, etc. Whether that is what users actually want, I don't know. It's what their API accepts, in any case.

docs/src/userguide/source_files_and_compilation.rst Outdated Show resolved Hide resolved
tests/run/fused_types.pyx Outdated Show resolved Hide resolved
@scoder
Copy link
Contributor

scoder commented May 3, 2023

My gut feeling is that users who write C int or long probably intend it to accept Python int, really. But on the other side, users who have a Python float value and want to pass it into the API would then have to call int() on it, or their static type checker would complain. Although the API would handle the float input just fine and efficiently. That's what annoys me a bit.

The fact that we allow two kinds of declarations and prefer the Python one is pragmatic, but also bares a risk of confusion and mix of syntaxes. I'm really not sure that I want to encourage that. But it would be good to read a few more opinions.

@scoder
Copy link
Contributor

scoder commented May 3, 2023

Remember that "Python type declarations" are actually just annotations.

… unless you use them as the way you declare types for Cython. In that case, they are type declarations, not just annotations. The fact that you can mix both in .pyx files is, well, limited to .pyx files. It doesn't help users who write their Cython code in .py modules to tell them that their life could be so much easier and cooler if they switched to non-Python syntax extensions and renamed their files. We are currently trying to discourage writing new code in Cython syntax and use Python syntax instead, simply because there is much more tooling available for writing Python code than for Cython code.

If you want to reject arguments with both C types and annotations

I didn't suggest to make them invalid syntax, but I also wouldn't want to present mixing both forms of type declarations as the best of both worlds. I don't want Python type annotations to be considered a second class citizen that is only looked at if there is nothing more convincing and can otherwise be happily ignored by the compiler.

def foo(int arg: Annotated[int, MaxValueAllowed(42)]) -> Any:
    ...

Why is this better than the following?

def foo(arg: Annotated[cython.int, MaxValueAllowed(42)]) -> Any:
    ...

@scoder
Copy link
Contributor

scoder commented May 3, 2023

I actually think this discussion belongs into #3150. Let's continue it there.

@dalcinl
Copy link
Member Author

dalcinl commented May 3, 2023

My gut feeling is that users who write C int or long probably intend it to accept Python int, really.

Indeed

But on the other side, users who have a Python float value and want to pass it into the API would then have to call int() on it, or their static type checker would complain.

Which maybe is a good thing, totally line with Python's explicit is better than implicit?
I believe almost no builtin (i.e. implemented in C) function/method in CPython works the Cython way.
I understand that float -> int demotion is how things work in C and Cython started with a strong C semantics background, but Cython should perhaps consider adding a compiler option to disable this behavior. After all, isn't Cython supposed to favor Python semantics over C?

any index instead of int,

As Cython accepts Py float for a C int argument, SupportsIndex would not be accurate.

any number instead of float, etc.

Python complex is a number, but Cython does not accept it as a C float argument. So numbers.Number would be inaccurate for float, maybe SupportsFloat would do.

The fact that we allow two kinds of declarations and prefer the Python one is pragmatic, but also bares a risk of confusion and mix of syntaxes.

Consenting adults? Again, remember that my proposal is opt-in. If you do not opt in, you will get whatever Cython is doing right now.

But it would be good to read a few more opinions.

So, we are stuck on how to map a C type to an annotation, and only when an explicit annotation is not provided (I assume I convinced you we cannot disable C types with annotations in this #5415 (comment)).

I have the feeling that sometimes you forget this PR is about docstrings and not about generating accurate typing stubs.

To summarize alternatives, if we have:

#cython: embedsignature=True,
#cython: embedsignature.pure=True
cdef foo(unsigned long log x): pass

then the generated embedded signature could be

  • foo(x), that is, no type annotation at all, the C type information is lost
  • foo(x: int), what I've implemented in this PR so far.
  • foo(x: unsigned_long_long), which is still valid Python syntax and somehow preserve C type information.
  • foo(x: SomethingElse), and then we may go back and forth about what SomethingElse should be, but maybe it may not be quite accurate for C integral types.

I would not object strongly to the following relatively trivial mapping:

  • C integral -> SupportsInt (actually, this may be the most accurate Py type matching current Cython behavior)
  • C (real) floating -> SupportsFloat
  • C complex floating -> SupportsComplex

but I still think that int, float, complex are easier human-consume in a documentation context (We are dealing with docstrings here!).

@scoder
Copy link
Contributor

scoder commented May 3, 2023

We are dealing with docstrings here!

There's also __text_signature__, which would probably pick this up, wouldn't it?

@dalcinl
Copy link
Member Author

dalcinl commented May 3, 2023

There's also text_signature, which would probably pick this up, wouldn't it?

That would require implementing the text_signature format. You already made me remove from this PR the bits that would simplify that work. Nonetheless, __text_signature__ does not support annotations. Bether said, inspect.signature() barfs if your __text_signature__ has annotations [link].
So, if we ever implement __text_signature__, we would have to strip all annotations.

PS: As I said before, I think __text_signature__ would only be useful if using #cython: binding=False, which is no longer even the default.

@scoder
Copy link
Contributor

scoder commented May 4, 2023 via email

@dalcinl dalcinl marked this pull request as draft May 4, 2023 05:37
@dalcinl
Copy link
Member Author

dalcinl commented May 4, 2023

Your reluctance to accept this PR made me think harder. We can do slightly better: have a compiler directive embedsignature.format of type str with values "full" (the current Cython format), "clinic" (the argument clinic format), and "pure" (what this PR implements so far). We document what the pure format does, and add a warning about it being subject to future change.

@scoder
Copy link
Contributor

scoder commented May 4, 2023

That sounds very reasonable. I'd call the "pure" version "python", though, and the "full" one "c" (and require lower case and reject everything else). I think that's clearer.

@dalcinl dalcinl marked this pull request as ready for review May 4, 2023 13:56
@dalcinl dalcinl requested a review from scoder May 4, 2023 13:56
@dalcinl
Copy link
Member Author

dalcinl commented May 4, 2023

@scoder I took the liberty of using "cython" instead of "c" for the default format (that is, the one only one we had before this PR). I also added support for

Please double check the updated documentation entry for the new compiler directive.

Regarding implementation, I used the one_of helper, so I think it will accept lowercase strings and reject otherwise.

@dalcinl dalcinl force-pushed the dalcinl/autodoc branch 2 times, most recently from d724ba4 to a887986 Compare May 4, 2023 14:30
@scoder
Copy link
Contributor

scoder commented May 4, 2023

using "cython" instead of "c" for the default format

I had also first used cython in my comment, but then changed it to c because it really uses C argument notation. People who learn Cython in Python syntax might be confused about what the difference between cython and python is here. c is a clearer distinction.

@dalcinl
Copy link
Member Author

dalcinl commented May 4, 2023

c is a clearer distinction.

OK, fair enough. I'll make the change.

@dalcinl
Copy link
Member Author

dalcinl commented May 4, 2023

Done. I'll wait CI and your approval to merge. Otherwise, click yourself the merge button.

Copy link
Contributor

@scoder scoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just a few comments.

Comment on lines 866 to 869
change over time. If set to ``clinic`` and the ``binding``
directive is set to False, Cython will generate signatures
compatible with CPython's Argument Clinic. Default is
``c``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what happens if emberdignature.format=clinic and binding=True?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No signature is generated, because CythonFunction does not support the __text_signature__ descriptor like CPython's builtin_function_or_method.

How would you like to rephrase the documentation?

As I told you before, this clinic format is of little use in Cython 3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I told you before, this clinic format is of little use in Cython 3.

Perhaps a stupid question, but why include it at all then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd say "The clinic format generates signatures that can be understood by CPython's argument clinic tool and used to generate the __text_signature__ attribute. This mainly useful when binding=False, since the functions generated with binding=True do not have a __text_signature__ attribute."

I'm not sure I'd disable it when binding=True (even if it's a bit pointless there).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"little use" does not mean "no use". Those that for whatever reason want to set binding=False may find the new option useful to get __text_signature__ and then the inspect module to work closer to CPython builtin functions and methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'd disable it when binding=True (even if it's a bit pointless there).

Is not only that is pointless. it looks awful as well, e,g

($self, *args, **kwargs)
--
<BLANKLINE>
<BLANKLINE>
Function docstring goes here.

Cython/Compiler/AutoDocTransforms.py Outdated Show resolved Hide resolved
@scoder scoder added this to the 3.0 milestone May 4, 2023
@dalcinl dalcinl force-pushed the dalcinl/autodoc branch 2 times, most recently from 7bb9ae3 to d0cebcd Compare May 4, 2023 23:23
docs/src/userguide/source_files_and_compilation.rst Outdated Show resolved Hide resolved
Comment on lines 866 to 869
change over time. If set to ``clinic`` and the ``binding``
directive is set to False, Cython will generate signatures
compatible with CPython's Argument Clinic. Default is
``c``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd say "The clinic format generates signatures that can be understood by CPython's argument clinic tool and used to generate the __text_signature__ attribute. This mainly useful when binding=False, since the functions generated with binding=True do not have a __text_signature__ attribute."

I'm not sure I'd disable it when binding=True (even if it's a bit pointless there).

@@ -854,6 +854,25 @@ Cython code. Here is the list of currently supported directives:
signature, which cannot otherwise be retrieved after
compilation. Default is False.

``embedsignature.format`` (``c`` / ``python`` / ``clinic``)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@da-woods I reworded the documentation as per your comments. Note however that the argument clinic tool does not generate signatures, instead, you have to write them by hand following the proper format.

@dalcinl
Copy link
Member Author

dalcinl commented May 13, 2023

@scoder @da-woods I'm waiting for further comments or your explicit approval.

@dalcinl
Copy link
Member Author

dalcinl commented May 17, 2023

Can we get this PR merged before the next beta release?

@scoder scoder merged commit a174dcd into cython:master May 18, 2023
@scoder
Copy link
Contributor

scoder commented May 18, 2023

Thanks. Very nice improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants