Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add docs for GroupedMetadata and BaseMetadata #15

Merged
merged 7 commits into from
Jul 29, 2022

Conversation

adriangb
Copy link
Contributor

No description provided.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@Zac-HD
Copy link
Member

Zac-HD commented Jul 27, 2022

OK, here's my current proposal:

  • We remove all inheritance from the library
  • We turn Interval into a function which returns typing.Unpack[whatever, constraints] (or a single constraint if there's only one)
    • actually... is Annotated[T, *Foo] or Annotated[T, Unpack[Foo]] valid? If we need this changed we'd better be quick!
  • We document that consumers of the annotated-types API are expected to iterate over Unpack members

I guess if Unpack doesn't work we can't do this. I also don't have a good sense of how Pydantic 2 intends to represent Fields using annotated types - do you need something that can't be represented with Annotated[T, <various things here?]?

@adriangb
Copy link
Contributor Author

adriangb commented Jul 27, 2022

  • We remove all inheritance from the library
  • We turn Interval into a function which returns typing.Unpack[whatever, constraints] (or a single constraint if there's only one)

Interesting, I was not aware of typing.Unpack, it's cool they added that! I think that would be a cool way to implement things in some cases (e.g. our Interval could indeed become a function returning a typing.Unpack), but I think for something like Pydantic where Field is already a class (well it's a function returning a FieldInfo class) that also gets used as default values, I don't think that would be viable.

I also don't have a good sense of how Pydantic 2 intends to represent Fields using annotated types

I would like @samuelcolvin to chime in, but my thought was that in Pydantic v2 Field would inherit from annotated_types.GroupedMetadata and unpack itself into standard metadata (aside from it's own Regex implementation and such).

  • actually... is Annotated[T, *Foo] or Annotated[T, Unpack[Foo]] valid? If we need this changed we'd better be quick!

I think both are valid

  • We document that consumers of the annotated-types API are expected to iterate over Unpack members

Agreed we should document this, if nothing else because we already use Annotated[..., *Interval(...)] and according to the 3.11 docs typing.Unpack[...] should be interpreted the same as if * had been used.

All in all, here was my idea of how a consumer would generally use the inheritance we provide:

from dataclasses import dataclass

from typing import Annotated, Any, Iterable, Iterator, get_args, get_origin, Unpack
from annotated_types import BaseMetadata, GroupedMetadata, Len


# a generic flattener, maybe we provide this in the future?
def flatten_metadata(args: Iterable[Any]) -> Iterable[BaseMetadata]:
    for arg in args:
        if isinstance(arg, BaseMetadata):
            yield arg
        elif isinstance(arg, GroupedMetadata):
            yield from arg
        elif get_origin(arg) is Unpack:
            packed_args = iter(get_args(arg))
            next(packed_args)
            yield from packed_args


# Pydantic classes
@dataclass
class Regex(BaseMetadata):
    pattern: str

@dataclass
class Alias(BaseMetadata):
    alias: str

@dataclass
class FieldInfo(GroupedMetadata):
    min_length: int | None = None
    regex: str | None = None
    alias: str | None = None

    def __iter__(self) -> Iterator[BaseMetadata]:
        if self.min_length is not None:
            yield Len(self.min_length)
        if self.regex is not None:
            yield Regex(self.regex)
        if self.alias is not None:
            yield Alias(self.alias)


def Field(
    min_length: int | None = None,
    regex: str | None = None,
    alias: str | None = None,
) -> Any:
    return FieldInfo(
        min_length=min_length,
        regex=regex,
        alias=alias,
    )


# user's code
# notice this is completely backwards compatible with Pydantic v1
PhoneNumber = Annotated[str, Field(min_length=10, alias="phoneNumber")]


# Pydantic's consumer
known_metadata = {
    Len,
    Regex,
    Alias,
}
def build_pydantic_core_schema_from_pep593_metadata(
    tp: type, allow_unknown: bool,
) -> Any:
    args = get_args(tp)
    assert args and args[0] is Annotated
    for metadata in flatten_metadata(args[1:]):
        if metadata in known_metadata:
            # build pydantic-core schema
            pass
        elif not allow_unknown:
            raise TypeError("unknown annotated-types metadata")
        else:
            pass


# Hypothesis' consumer
known_metadata = {
    Len,
    Regex,
    Alias,
}
supported_packages = {
    "annotated_types",
    "pydantic",
}
def _(
    tp: type
) -> Any:
    args = get_args(tp)
    assert len(args) > 2 and args[0] is Annotated
    for metadata in flatten_metadata(args[2:]):
        if metadata in known_metadata:
            # do thing
            continue
        if next(iter(metadata.__module__.split("."))) in supported_packages:
            raise RuntimeError(
                "Please update hypothesis"
            )

So BaseMetadata exists just to namespace metadata "in the annotated-types ecosystem" from arbitrary stuff, making it easier to write utility functions like flatten_metadata (that could exist here, in pydantic, in hypothesis or independently in each one). Yes, we could get rid of this namespace, but that makes it a bit harder to write reusable code because you can't layer your filtering/case handling and instead always have to check each object against an exhaustive list of all of the metadata/constraints you support. I will say there are other mechanisms that could achieve the same thing without inheritance (an __is_annotated_types_metadata__ attribute?) but they tend to not hook in as nicely to static type checking and such.

While GroupedMetadata also servers the same use case to some extent (I am using it in an isinstance check above), it also gives Pydantic a backwards compatible way to hook it's Field class into annotated-types in a way which requires no special handling from consumers. Even a consumer that errors on unknown metadata would still be able to handle a subset of Field(...) (as long as no Pydantic specific constraints are passed in). It also allows implementers the flexibility of having something that can be used both in PEP593 annotations and as default values (which is what Pydantic does and from discussions like this one it may something others want to support as well). I don't think you can/would want to inherit from typing.Unpack, so that doesn't provide the same functionality.

@samuelcolvin
Copy link
Contributor

samuelcolvin commented Jul 27, 2022

Sorry I've been a bit AWOL on this. From a quick scan, looks good but let me know there's anything specific you need my help on.

Ignore that, I've just seen your question.

@adriangb
Copy link
Contributor Author

Sorry I've been a bit AWOL on this. From a quick scan, looks good but let me know there's anything specific you need my help on.

I'm assuming you didn't read my wall of text in the 15 seconds elapsed between my comment and yours 😉

@samuelcolvin
Copy link
Contributor

Okay, I've now read your comment.

I'm happy for pydantic V2 Field to unpack itself to make it useful for other libraries.

The only edge case I can think of where this might not be desired is when users want to get the Field instance for secondary checks - if Annotated[int, Field(..., gt=42, description='foobar')] "unpacks itself", would it still be possible to retrieve the original Field instance?

@Zac-HD
Copy link
Member

Zac-HD commented Jul 27, 2022

# a generic flattener, maybe we provide this in the future?
def flatten_metadata(args: Iterable[Any]) -> Iterable[BaseMetadata]:

Remember that we also specify interpretations for partial(operator.___, x) and slice objects!

I'm kinda torn on the flattener; on one hand it's complicated enough that a flatten-and-canonicalize function would be pretty useful, on the other I want to keep the library as small as possible. I think the argument that it's complicated wins here and we should ship it.

Note that this should actually return Iterable[object]: making the user responsible for filtering is a bit more work, but makes extending it to handle other types much easier.

Yes, we could get rid of this [BaseMetadata] namespace, but that makes it a bit harder to write reusable code because you can't layer your filtering/case handling and instead always have to check each object against an exhaustive list of all of the metadata/constraints you support.

My argument is that you always need a mechanism to do the exhaustive check anyway, and should avoid having additional ways to check metadata (which could then get out of sync, etc). Caching is a sufficient answer to perf concerns here IMO.

While GroupedMetadata also servers the same use case to some extent (I am using it in an isinstance check above), it also gives Pydantic a backwards compatible way to hook it's Field class into annotated-types in a way which requires no special handling from consumers.

If we specify that:

  • Consumers of annotated-types metadata must iterate over Unpack items
  • Pydantic fields must be written as Annotated[T, *Field(...)] or Annotated[T, Unpack[Field(...)]] (not Annotated[T, Field(...)])
  • Iterating over a Field instance yields annotated-types constraints (among whatever else seems useful)

then it will "just work" without requiring consumers to have any knowledge whatsoever of Pydantic. However, the Unpack[Field(...)] syntax is pretty darn awkward, so I'd probably only support this if the Annotated type is generated by the library rather than hand-written by users 😕

The best alternative is probably "just have and tell people to inherit from a GroupedMetadata class"; it's inelegant but it would work.

Even a consumer that errors on unknown metadata would still be able to handle a subset of Field(...) (as long as no Pydantic specific constraints are passed in).

I don't really care about errors from such consumers, they're already living in a world where anyone can put anything in an Annotated.

I'm happy for pydantic V2 Field to unpack itself to make it useful for other libraries.

The only edge case I can think of where this might not be desired is when users want to get the Field instance for secondary checks - if Annotated[int, Field(..., gt=42, description='foobar')] "unpacks itself", would it still be possible to retrieve the original Field instance?

You could have Field.__iter__ yield from (...constraints..., FieldWrapper[that_field])? Little awkward to avoid an infinite loop of unpacking 🤷‍♂️

I was imagining that the unpacked constraints would fully describe the Field though, including if necessary a "FieldResidue" object to hold all the other attributes.

@adriangb
Copy link
Contributor Author

Remember that we also specify interpretations for partial(operator.___, x) and slice objects!

Those are always going to be special cased, I was omitting for brevity.

I'm kinda torn on the flattener; on one hand it's complicated enough that a flatten-and-canonicalize function would be pretty useful, on the other I want to keep the library as small as possible. I think the argument that it's complicated wins here and we should ship it.

I'm not arguing that we should ship it or not, I'm just sketching how it might look if we ship it or if Pydantic/Hypothesis implement it.

My argument is that you always need a mechanism to do the exhaustive check anyway, and should avoid having additional ways to check metadata (which could then get out of sync, etc). Caching is a sufficient answer to perf concerns here IMO.

My concern isn't about performance but rather about usability. Yes, you always have to do some sort of exhaustive check at the end, but it's much easier to handle filtering, raising errors for unknown metadata, etc. when you can layer these things (e.g. filter out all metadata that is unrelated to annotated-types).

Pydantic fields must be written as Annotated[T, *Field(...)] or Annotated[T, Unpack[Field(...)]] (not Annotated[T, Field(...)])

@samuelcolvin can comment in more detail but this would be breaking change for Pydantic

You could have Field.iter yield from (...constraints..., FieldWrapper[that_field])? Little awkward to avoid an infinite loop of unpacking 🤷‍♂️

I was imagining that the unpacked constraints would fully describe the Field though, including if necessary a "FieldResidue" object to hold all the other attributes.

I'm lost here, sorry

@Zac-HD
Copy link
Member

Zac-HD commented Jul 27, 2022

OK, yeah, never mind my speculative stuff, GroupedMetadata looks good to me.

@adriangb
Copy link
Contributor Author

The only edge case I can think of where this might not be desired is when users want to get the Field instance for secondary checks - if Annotated[int, Field(..., gt=42, description='foobar')] "unpacks itself", would it still be possible to retrieve the original Field instance?

I did the following test:

from typing import Annotated, Any, get_args, get_type_hints
import typing

class ArbitraryIterable:
    def __iter__(self):
        return iter(["foo"])

def foo(
    unpack: Annotated[Any, typing.Unpack[ArbitraryIterable()]],
    star: Annotated[Any, *ArbitraryIterable()]
) -> Any:
    ...

print(get_args(get_type_hints(foo, include_extras=True)["unpack"])[1])  # *<__main__.ArbitraryIterable object at 0x102a94ad0>
print(get_args(get_type_hints(foo, include_extras=True)["star"])[1])  # foo

Maybe Unpack is just too new to have good support from get_type_hints and co, or maybe I'm doing something wrong.

In any case, I think it's a lot easier for us to say "consumers should iterate over any subclass of GroupedMetadata (which will work for any version of Python and requires nothing fancy at all) vs. saying "in Python 3.11+ use *Field(...) and in older versions use Unpack[Field(...)], which is pretty funky, requires an import for users, is backwards incompatible for Pydantic and might break at any time (there are no guarantees as to the runtime behavior of Unpack).

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'll personally check for BaseMetadata, but I've been convinced that this design makes sense. Let's ship it!

And 💖 thanks again Adrian (and Samuel!) for the conversations, they can be wearing at times but I do think we get better designs out the other end. Or in some cases keep better designs 😅

@adriangb
Copy link
Contributor Author

Discussions are good @Zac-HD, thank you for the pushback!

@samuelcolvin I'll let you review this before we merge since this will be the last PR before shipping this (after which it would be harder to turn back!)

Copy link
Contributor

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've slept and consumed coffee and actually reviewed this. Sorry for hurried answer yesterday 🙈.

Over all I'm in favour of this except I don't think we should encourage libraries to inherit from BaseMetadata as argued by @Zac-HD. It's worth noting that we could add that encouragement in a future release, but removing it/moving to discouraging inheritance from BaseMetadata is harder.

Still, @adriangb if you're convinced about this, I'm easy either way.

Encouraging inheritance from GroupedMetadata makes lots of sense - the other route would be to have some dunder attribute which identified a metadata group which wouldn't require annotated-types to be installed and imported in order to mark a metadata group. But since pydantic will have annotated-types as a required dependency, I'm happy with an abc as marker, it's definitely more canonical than a marker attribute.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
adriangb and others added 2 commits July 28, 2022 08:18
Co-authored-by: Samuel Colvin <samcolvin@gmail.com>
Co-authored-by: Samuel Colvin <samcolvin@gmail.com>
@codecov-commenter
Copy link

codecov-commenter commented Jul 28, 2022

Codecov Report

Merging #15 (fadff22) into main (f59cf6d) will increase coverage by 0.18%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #15      +/-   ##
==========================================
+ Coverage   95.89%   96.07%   +0.18%     
==========================================
  Files           2        2              
  Lines         146      153       +7     
  Branches       36       39       +3     
==========================================
+ Hits          140      147       +7     
  Misses          6        6              
Impacted Files Coverage Δ
annotated_types/__init__.py 93.47% <ø> (+0.21%) ⬆️
annotated_types/test_cases.py 100.00% <100.00%> (ø)

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@adriangb
Copy link
Contributor Author

Over all I'm in favour of this except I don't think we should encourage libraries to inherit from BaseMetadata as argued by @Zac-HD. It's worth noting that we could add that encouragement in a future release, but removing it/moving to discouraging inheritance from BaseMetadata is harder.

Still, @adriangb if you're convinced about this, I'm easy either way.

My opinion is that yes, we should encourage it, but since I'm in the minority here I'm happy to punt this down the road. My next question then is: should we explicitly tell others to not inherit from the class and say that it should only be used to recognize metadata coming from annotated-types? Or should we just not document it at all (i.e. completely punt on it)?

@Zac-HD
Copy link
Member

Zac-HD commented Jul 28, 2022

Let's punt on it, we can always work out what we're doing later 😆

@adriangb
Copy link
Contributor Author

Ok let's just remove that section. I'm taking a flight / vacation so not sure how soon I'll get around to it, if either of you want to push the changes please feel free!

We're just punting on this, and might revisit later - but for now making no API promises is the most flexible way to avoid compat issues in future.
@Zac-HD Zac-HD merged commit 7351d1b into annotated-types:main Jul 29, 2022
@adriangb adriangb deleted the docs branch July 29, 2022 05:04
adriangb added a commit to adriangb/annotated-types that referenced this pull request May 31, 2023
Co-authored-by: Samuel Colvin <samcolvin@gmail.com>
Co-authored-by: Zac Hatfield-Dodds <zac.hatfield.dodds@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants