-
Notifications
You must be signed in to change notification settings - Fork 27
Description
I'm trying to thinking about a way for pydantic to communicate extra field information to hypothesis which is:
- reusable by other libraries - e.g. doesn't use hypothesis types
- doesn't require any understanding of pydantic internals - e.g. not based on pydantic core schema
- can be extended without further integration discussion - e.g. is a proper protocol, not a list of types
For reference here is the hypothesis plugin from pydantic V1. The types we'd like to support with this system are as follows:
- EmailStr - a string that is a valid email address, could crudely be just a regex
- NameEmail - email and name in the
name <email>format, can be a regex + email - PyObject (now ImportString) - string that can represent anything, the hypothesis plugin used a random attribute of
math - Color - could be a regex
- luhn valid card number - currently generated by trial and error
- IPvAnyAddress - either IPv4 or IPv6
- JsonWrapper - a JSON string, possibly with a type hint
Everything else I think is covered by using Annotated and things already defined by annotated-types.
Note in V2 some of the above are implemented as arguments to Annotated (albeit with an alias), some are legitimate custom types.
One idea I thought of was to use JSON Schema - you provide a method or property on either type or argument to
Annotated which returned some JSON Schema. But looking at the above list, I don't think JSON Schema would help with many.
Therefore here's my proposal:
annotated-types defines a new property or method on types and arguments to Annotated which returns the following
pieces of information (could be a tuple, a dict with specific keys, or a dataclass defined herein):
documentation_example- a canonical example of the datatype as might be shown in documentation, e.g.john@example.comrandom_example- a random varying example of the datatype, e.g.ad90cj-i3ljlkd@dk33w4poedd.co.uktype_code- a string that libraries can use to identify the type (e.g.email), and thereby decide to do more powerful things - e.g. hypothesis has a better strategy for generating random email addresses than pydantic will. Would beNoneif notype_codeexists for a field.
The idea is that annotated-types defines:
- the above data structure
- a list of agreed
type_codes, starting with the ones from above
Example usage
EmailStr: would emit ExtraTypeInfo('email', 'john@example.com', 'ad90cj-i3ljlkd@dk33w4poedd.co.uk').
hypothesis would ignore the crude random example and use it's own strategy for email addresses since it recognises the email type code.
Color: would emit ExtraTypeInfo('color', '#ff0000', '#00ff00'), if hypothesis doesn't recognize color it could
fallback to using the random example generated by pydantic.
If a user wanted their own UKPostCode type (alias of Annotated[str, UKPostCodeMetadata]), then UKPostCodeMetadata could emit ExtraTypeInfo(None, 'W1A 1AA', 'sp119dg'), None for type code since no type code exists for uk post codes, hypothesis would use just use the random example 'sp119dg'.
In theory another tool could use this data (e.g. for generating documentation) with no knowledge of hypothesis or pydantic.