Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process functions: Infer argument valid_type from type hints #5900

Merged

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Feb 22, 2023

Fixes #3569

An advantage of process functions over calculation jobs and work chains
are that they are light weight and very easy to start using. But there
are also disadvantages; due to the process specification being inferred
from the function signature, the user cannot benefit from all the
features of the process specification. For example, it is not possible
to explicitly define a help string or valid type for the arguments of
the process function.

With PEP 484 type hints were introduced in Python 3.5 that made it
possible to indicate the expected types for function arguments. Here we
make use of this functionality to infer the valid type of function
arguments based on the provided type hint, if any.

As an example, a user can now define the following calcfunction:

@calcfunction
def add(a: int, b: int):
    return a + b

and when it is called with anything other than an int or Int for
either of the two arguments, a validation error is raised reporting that
an argument with an invalid type was provided.

The new functionality is fully optional and process functions without
typing will continue to work as before. If incorrect typing is provided,
a warning is logged and the typing is ignored.

The annotation of the wrapped process function is parsed using the
inspect.get_annotation method. This was added in Python 3.10 and so to
provide support in older versions we install the get-annotations
backport package. Since we have to use eval_str=True in the call to
get unstringized versions of the types. This will fail for the
backport implementation if the type uses union syntax of PEP 604, e.g
str | int instead of typing.Union[str, int], even if this
functionality is enabled using from __future__ import annotations.

@sphuber sphuber changed the title Process functions: Infer argument valid_type from type hints Process functions: Infer argument valid_type from type hints Feb 23, 2023
@sphuber
Copy link
Contributor Author

sphuber commented Feb 23, 2023

Note that this only parses type hinting of function arguments, not the return type. We could look into adding the parsing of return type as well and convert them into output ports. The difficulty I see here is that from the type annotation we only have the valid type, but we don't get the labels of the outputs, which we need when declaring the ports. So barring starting to parse the actual Python code of the function body to see the return statements and parsing the keys out of this, I am not sure how to do this. But the latter seems extremely frought with all sorts of complications and difficulties, which is why I haven't done so far.

@sphuber sphuber force-pushed the feature/5896/expose-process-function-inputs branch from 13dca0c to b9b67c3 Compare February 23, 2023 12:36
@danielhollas
Copy link
Collaborator

@sphuber this looks very cool! 👏 I have one question, from a developer perspective, I would be unsure whether to use Python built-in types or AiiDA Node types for the annotations. Would the following calcfunctions have equivalent behaviour?

@calcfunction
def add_aiida_types(a: Int, b: Int):
    return a + b

@calcfunction
def add_python_types(a: int, b: int):
    return a + b

@sphuber sphuber force-pushed the feature/5896/expose-process-function-inputs branch from b9b67c3 to c657085 Compare February 24, 2023 13:46
@sphuber
Copy link
Contributor Author

sphuber commented Feb 24, 2023

Indeed, both work and behavior is equivalent as in the valid_type that will be inferred will be orm.Int. I added support for inferring from the Python base types, since we recently added the behavior of values of those types automatically being converted to the AiiDA node. So I thought it would be weird if the type hinting of Python base types would not work. But happy to be convinced of this otherwise if there is a strong argument.

Note that it is also possible to mix the notations (although I don't really see the point):

@calcfunction
def add_python_types(a: int | orm.Float, b: int):
    return a + b

I was also working on adding parsing of docstrings, such that the dynamically generated input ports would set the help attribute based on the docstring of the parameter. I already have it working, but it requires a third-party library to parse the docstring since the standard lib doesn't provide anything.

@chrisjsewell
Copy link
Member

chrisjsewell commented Feb 24, 2023

I was also working on adding parsing of docstrings, such that the dynamically generated input ports would set the help attribute based on the docstring of the parameter

Note, the other possibility here, and potentially for adding other input port information is: https://docs.python.org/3/library/typing.html#typing.Annotated

(p.s. given I have just created https://sphinx-autodoc2.readthedocs.io, I can tell you all about doctrings lol)

@sphuber
Copy link
Contributor Author

sphuber commented Mar 3, 2023

@chrisjsewell any comments on this PR?

One open question is if we want to keep the support for Python versions older than 3.10. In that case we need the get-annotations dependency, which is currently not yet available on Conda. Not sure if this is worth the effort and we should simply have the functionality only available for Python 3.10 and above.

@chrisjsewell
Copy link
Member

chrisjsewell commented Mar 6, 2023

One open question is if we want to keep the support for Python versions older than 3.10. In that case we need the get-annotations dependency, which is currently not yet available on Conda. Not sure if this is worth the effort and we should simply have the functionality only available for Python 3.10 and above.

If you want me to put get-annotations on conda, its pretty easy?

I was wondering what the difference between https://docs.python.org/3/library/inspect.html#inspect.get_annotations and https://docs.python.org/3/library/typing.html#typing.get_type_hints is 🤷
Apparently "its subtle" python/cpython#102405 (this was the original issue https://bugs.python.org/issue43817)

@chrisjsewell
Copy link
Member

@chrisjsewell any comments on this PR?

In general though, yep it looks good cheers 👌

@sphuber
Copy link
Contributor Author

sphuber commented Mar 6, 2023

Thanks for the review @chrisjsewell .

If you want me to put get-annotations on conda, its pretty easy?

If you could that'd be great. I have never done it before. This would enable this functionality for Python 3.9 and older, but as noted in the commit, it doesn't work with from __future__ import annotations as described in PEP 563, so one can only use "old-style" type hints. Anyway, maybe this is better than nothing.

I will start adding the docs, if you could make the conda release for get-annotations we can get this on the road. Thanks!

@sphuber sphuber force-pushed the feature/5896/expose-process-function-inputs branch 2 times, most recently from e186731 to 077b998 Compare March 11, 2023 16:12
if TYPE_CHECKING:
from .exit_code import ExitCode

__all__ = ('calcfunction', 'workfunction', 'FunctionProcess')

LOGGER = logging.getLogger(__name__)

FunctionType = TypeVar('FunctionType', bound=Callable[..., Any])
FunctionType = t.TypeVar('FunctionType', bound=t.Callable[..., t.Any])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note in #5901 I have started naming these with a Tv suffix, like FunctionTv, because pylint no longer allows Type pylint-dev/pylint#6003

Also can you make sure this file is no longer in the exclude list in the pre-commit-config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask the reason for choosing Tv as the suffix? When going over #5901 it already struck me a bit weird, since I read it automatically as "television" 😅 I wasn't going to mention it, since in the end it is an arbitrary choice, but if it is pylint that complains, why not use what they suggest as naming namely T as suffix. This also makes more sense to me. I would propose we just maintain that convention. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisjsewell I have removed the file from the black list. Since the naming of the type var is still an open discussion and the latest version of pylint is anyway not being used, I will leave that for another PR. Since you had no other comments, I take it this is good to go and will merge tomorrow.

@sphuber sphuber force-pushed the feature/5896/expose-process-function-inputs branch 2 times, most recently from c84a117 to 7d3c958 Compare March 15, 2023 17:37
The `inspect.getfullargspec` method is used to analyze the signature of
the function wrapped by the process function decorator. This method is
outdated and really just kept around for backwards-compatibility.
Internally it is using the `inspect.signature` method which has a more
modern interface, so here we change the code to use that method instead.
This change also prepares the next move that will allow to parse any
type annotations for signature parameters to dynamically determine the
`valid_type` attribute of the dynamically determined input ports.
An advantage of process functions over calculation jobs and work chains
are that they are light weight and very easy to start using. But there
are also disadvantages; due to the process specification being inferred
from the function signature, the user cannot benefit from all the
features of the process specification. For example, it is not possible
to explicitly define a help string or valid type for the arguments of
the process function.

With PEP 484 type hints were introduced in Python 3.5 that made it
possible to indicate the expected types for function arguments. Here we
make use of this functionality to infer the valid type of function
arguments based on the provided type hint, if any.

As an example, a user can now define the following calcfunction:

    @calcfunction
    def add(a: int, b: int):
        return a + b

and when it is called with anything other than an `int` or `Int` for
either of the two arguments, a validation error is raised reporting that
an argument with an invalid type was provided.

The new functionality is fully optional and process functions without
typing will continue to work as before. If incorrect typing is provided,
a warning is logged and the typing is ignored.

The annotation of the wrapped process function is parsed using the
`inspect.get_annotation` method. This was added in Python 3.10 and so to
provide support in older versions we install the `get-annotations`
backport package. Since we have to use `eval_str=True` in the call to
get unstringized versions of the types. This will fail for the
backport implementation if the type uses union syntax of PEP 604, e.g
`str | int` instead of `typing.Union[str, int]`, even if this
functionality is enabled using `from __future__ import annotations`.
@sphuber sphuber force-pushed the feature/5896/expose-process-function-inputs branch from 7d3c958 to b39fafb Compare March 16, 2023 13:29
@sphuber sphuber merged commit fc6d734 into aiidateam:main Mar 16, 2023
@sphuber sphuber deleted the feature/5896/expose-process-function-inputs branch March 16, 2023 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Input / output validation for calcfunctions and workfunctions
3 participants