Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intersection PEP #49

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Intersection PEP #49

wants to merge 3 commits into from

Conversation

mark-todd
Copy link
Collaborator

PR relating to #48. Still some work to be done on this, but feel free anyone to have a review if you'd like @CarliJoy @erictraut @NeilGirdhar @gvanrossum @DiscordLiz @vergenzt . It builds a lot on the work of @mikeshardmind, but I've tried to thin it down a bit and add some more examples. It also approaches the problem from a slightly different angle conceptually, but the result is somewhat similar.

Broadly the main update I'm planning is to finish the intersect simulator and link it as the reference implementation (Proof of Concept I suppose), then we can move to opening a discussion related to the PEP on python discuss.

I'll also add something discussing the equivalence of Callable to certain Protocols and maybe add an example.

Finally, this issue is still outstanding: https://discuss.python.org/t/take-2-rules-for-subclassing-any/47981, but my hope is that this may not be critical to finalising the PEP. The behaviour as far as Intersection is concerned is now well defined under the PEP, and I'm thinking that if this proves to be a major issue perhaps it will arise during later discussion.

@mark-todd
Copy link
Collaborator Author

I've now updated to resolve the issues above - any comments please take a look. If there are no further comments, next week I'll read it over again, then open an issue on the discuss forum for a draft PEP.

The main thing I think this might be missing is potentially some of the rejected ideas (as there were many!), but I've tried to summarise any major ones.

As far as I know, the implementation and specification are complete in that it seems to resolve all outstanding issues I'm aware of - if there's anything I've missed here please let me know.

Quite excited to think we may actually have a fully fledged specification - getting there!

@DiscordLiz
Copy link
Collaborator

This approach has the same flaw with __getattr__ that the prior did, it's just hidden differently. You'll want to explore the test case that led to the prior approach falling apart: python/typing#1672 (comment)

@ mikeshardmind had a simple formulation that still works with Any, but based on his most recent messages here, it seems like he's not going to advocate for the simpler option remaining. As I understand it, he'd rather intersections not be added until the underlying issues are resolved.

A :> Intersection[T1, T2, ... Tn] if and only if A :> T1 and A :> T2 .... and A :>Tn
A <: Intersection[T1, T2, ... Tn] if and only if A <: T1 and A <: T2 .... and A <:Tn

That's the entire definition of what can and can't be inferred from an intersection expressed in 2 short lines describing subtyping and supertyping relationships. He didn't like the implications of trying to combine this with a bad definition of Any and tried to work around that. It being independent of the definition of Any and having no special casing might mean we can ignore Any having a bad definition and just get something that works for most people. If it doesn't work for someone, they can pick up improving the definition of Any from the case where it doesn't work.

@mark-todd
Copy link
Collaborator Author

mark-todd commented Apr 17, 2024

@DiscordLiz I added this section: https://github.com/CarliJoy/intersection_examples/blob/intersection-pep-new/docs/specification.rst#any-subclassing-rules

to try and resolve this. Broadly my thought it that by relaxing the inheritance restriction on Any by making it a structural type, it doesn't matter if Intersection diverges from the way subclassing with Any currently works. There is a potential ambiguity in the way that methods with __getattr__ and things that inherit from Any work, but by considering C and D below as different types we go around this issue:

class A:
    foo: str = "foo"

class C(Any, A):
    pass

D: Intersection[Any, A]

There nothing to say that objects of type C and D above have to behave in exactly the same way. D has all attributes as Any and must inherit from A, while C inherits from A and has all attributes as Any except foo which is str. In this case C is a subtype of D, but D is not a subtype of C.

@DiscordLiz
Copy link
Collaborator

DiscordLiz commented Apr 17, 2024

There is a potential ambiguity in the way that methods with getattr and things that inherit from Any work, but by considering C and D below as different types we go around this issue:

We were always considering an intersection as different to a specific class declaration. Simply saying we do this doesn't solve the problems that were raised around this.

I don't know how to explain this any better, you're still conflating parts of the type system that should not be conflated after multiple explanations throughout the discussions, and it seems to be having an impact on what you are writing ending up missing the problem. Maybe someone else can explain better.

What type checkers need are the subtyping rules. I have seen no indication that the extra complexity you have here solves any problems. It does creates new ones, treating Any as a structural type rather than Any means you need to define it's behavior as a structural type without breaking any more fundamental rule and having a decidable interface and rules for how type checkers address that in an intersection. You have not provided something that accomplishes that.

The focus on splitting inheritance and structural types has you missing the more fundamental rule that both must abide by. Both of these imply rules for subtyping relationships. It would be better to focus on a consistent subtyping relationship that does not try to treat either of these differently. You can refer to my last post for the minimal relationships to define an intersection. I think any proposal that does not use these needs to show definitively why it behaves better. The prior attempt did not actually behave better than those, and was abandoned.

You also retained a section on ordering, but intersections don't have to be ordered at all, that was a synthetic requirement of the previous attempt that we showed definitively doesn't solve the problem, it's unclear why you've retained this as none of the new language resolves those issues either.

@mark-todd
Copy link
Collaborator Author

I don't know how to explain this any better, you're still conflating parts of the type system that should not be conflated after multiple explanations throughout the discussions, and it seems to be having an impact on what you are writing ending up missing the problem. Maybe someone else can explain better.

@DiscordLiz Apologies, I am trying to understand this - what parts of the type system am I conflating?

What type checkers need are the subtyping rules. I have seen no indication that the extra complexity you have here solves any problems.

Broadly, I'm trying to define a set of rules that would be easy to implement for type checkers. I think the sets of rules you define here:

A :> Intersection[T1, T2, ... Tn] if and only if A :> T1 and A :> T2 .... and A :>Tn
A <: Intersection[T1, T2, ... Tn] if and only if A <: T1 and A <: T2 .... and A <:Tn

...are not in conflict with the specification, but I think they don't cover all situations. We've discussed before non-LSP violating method clashes that can occur, and these rules don't really show how to resolve that.

The focus on splitting inheritance and structural types has you missing the more fundamental rule that both must abide by. Both of these imply rules for subtyping relationships.

This is true - I'm really trying to break down what it means to be a subtype into rules that will be easier to implement, and resolve unspecified cases at the same time. Also, I think the statement about Any being a structural type may already be effectively present. Suppose we had:

class A:
    def foo(self):
        pass
 
def test(x: T) -> Intersection[A, T]:
    class New(type(x), A):
        pass
    return New()

class B:
    def bar(self):
         pass

b: Any = B()

x = test(b) # Intersection[Any, A]

So here we have created an Any intersection, as let's say B was an unknown type.

Based on the ruleset:
A subtype of x must be a subtype of A and a subtype of Any

Subtype of A

A subtype of A, must inherit from A and have attribute foo.

Subtype of Any

A subtype of Any, must inherit from Any and have all attributes as Any

Now here we have an issue - B does not inherit from Any, and neither does x. My point is that subtypes of Any do not have the inheritance restriction, so it seems it's already been defined as a structural type.

If there is a PEP somewhere I've missed that lays out what it means to be a subtype I'd be interested to read it.

You also retained a section on ordering, but intersections don't have to be ordered at all, that was a synthetic requirement of the previous attempt that we showed definitively doesn't solve the problem, it's unclear why you've retained this as none of the new language resolves those issues either.

I've summarised a few of the issues the ordered nature resolves:
https://github.com/CarliJoy/intersection_examples/blob/intersection-pep-new/docs/specification.rst#ordering

we showed definitively doesn't solve the problem

Could you point me to this? I know there was discussion about the issue with __getattr__ and Any, but there are other advantages to it being ordered.

@DiscordLiz
Copy link
Collaborator

DiscordLiz commented Apr 17, 2024

Could you point me to this? I know there was discussion about the issue with getattr and Any, but there are other advantages to it being ordered.

Type information is used for more than just static type checking. It’s also used for runtime type checking and for edit-time features like completion suggestions and signature help. For every Python developer who uses pyright for static type checking, there are 40x as many developers who do not use static type checking but rely on static type evaluation for language server features. These developers are more likely to be using untyped or partially-typed code, so they will more often hit the case where a class has an unknown base class. If we were to change the spec to treat all methods and attributes of such a class as Any, it would significantly harm their user experience — enough so that I don’t think I could justify making such a change in pyright regardless of what the typing spec says.

From a message explaining why a spec that just turns everything into Any in such types would be harmful to users: https://discuss.python.org/t/take-2-rules-for-subclassing-any/47981/5

It's also not even correct to do so.

class A:
    foo: int


B = Any

C: type[Intersection[B, A]] = ...
c: Intersection[B, A] = C()

c.foo  # int

The ordering was only ever a bad compromise, and the reason Any was shuffled to the end in the attempt at that was to specifically avoid this issue. foo here can't be Any, as any type that isn't int would have been a type error.

@DiscordLiz
Copy link
Collaborator

The focus on splitting inheritance and structural types has you missing the more fundamental rule that both must abide by. Both of these imply rules for subtyping relationships.

This is true - I'm really trying to break down what it means to be a subtype into rules that will be easier to implement, and resolve unspecified cases at the same time. Also, I think the statement about Any being a structural type may already be effectively present.

Any isn't a structural type. It's a concept that says "some compatible type is present here". I cannot stress enough how important the different between a higher order concept to express an uknown is different from a structural type that expresses a structure.

Casting to and from Any is a free operation that doesn't require an explicit cast or checking, the type system assumes compatibility as that's the purpose.

You don't have to break this down into "Easier to implement", type checkers already need to have the facilities to handle subtyping relationships, and doing so is causing important things to be gotten wrong, not all simplifications preserve the truth.

@mark-todd
Copy link
Collaborator Author

@DiscordLiz Thanks for the above - I'm working on something to express the same concepts in the current PEP in terms of subtyping, just a quick query on this bit:

class A:
    foo: int


B = Any

C: type[Intersection[B, A]] = ...
c: Intersection[B, A] = C()

c.foo  # int

The ordering was only ever a bad compromise, and the reason Any was shuffled to the end in the attempt at that was to specifically avoid this issue. foo here can't be Any, as any type that isn't int would have been a type error.

Does this have to be the case? I don't see why c.foo couldn't be Any here, e.g.:

C: type[Intersection[B, A]] = ...
c: Intersection[B, A] = C()

c.foo  # Any

D: type[Intersection[A, B]] = ...
d: Intersection[A, B] = D()

d.foo  # int

What's wrong with the above?

@DiscordLiz
Copy link
Collaborator

DiscordLiz commented Apr 18, 2024

Does this have to be the case? I don't see why c.foo couldn't be Any here, e.g.:

Any allows more than int, this is an unsafe widening of the type and should not be done by the type system. If C is a subtype of A, then C.foo can't not be compatible with int. Any is special. It's compatible but the way it is compatible is a problem to do this kind of transformation with in a way that looses known type information, becase now from the type system perspective, you're allowed to assign anything to it, even things which break the rules for things that are a subtype of A

@mark-todd
Copy link
Collaborator Author

mark-todd commented Apr 18, 2024

I want to avoid retreading too much ground here, but with a simple modification:

class A:
    def __init__(self, x: int) -> None:
        pass


B = Any

C: type[Intersection[B, A]] = ...
c: Intersection[B, A] = C(1)

If C is a subtype of A, then C.foo can't not be compatible with int.

This statement applies in the case I originally gave, but in the case above that doesn't necessarily apply (As __init__ can take any form). By making the Intersection ordered we could just make the one with the defined __init__ take priority by switching A and B, so the issue goes away.

Any allows more than int, this is an unsafe widening of the type and should not be done by the type system

The ordering effectively shifts the responsibility to the user to make sure they don't create effects like this unless that's what they want to do. If the result should always have an attribute foo of type int, then that means this was a mistake, and the order should be the other way around. It shifts responsibility but it also shifts control to the user.

@DiscordLiz
Copy link
Collaborator

The ordering effectively shifts the responsibility to the user to make sure they don't create effects like this unless that's what they want to do.

We established we can't ban Any from intersections because the user might not have control over where they come in (typevars) The user does not have that capability to avoid this problem. The prior ordered form avoided this by moving Any last, always, but that was flawed. You're doing something far worse with this, leaving a loaded gun pointed all all users of gradual typing.

@DiscordLiz
Copy link
Collaborator

DiscordLiz commented Apr 18, 2024

And even if the user could, why should the tools that are meant to get typing right intentionally get it wrong? That means users need to be more of an expert on type checking than type checker maintainers.

@mark-todd
Copy link
Collaborator Author

We established we can't ban Any from intersections because the user might not have control over where they come in (typevars)

So for typevars consider the following:

There are two ways the below function could be implemented:

class X:
    foo: int
T = TypeVar("T")
def foo(x: T) -> Intersection[X, T]:
    ...

In the above if the function is used with Any, then the result will always have the attribute foo.

class X:
    foo: int
T = TypeVar("T")
def foo(x: T) -> Intersection[T, X]:
    ...

In the above if the function is used with Any, the result may not have the attribute foo, if T overwrites this.

What I'm saying is that offering these options means the function designer chooses which way they want to go with it. They may not control the type of T, but they choose the order in the intersection.

And even if the user could, why should the tools that are meant to get typing right intentionally get it wrong? That means users need to be more of an expert on type checking than type checker maintainers.

I'm not so convinced that there is a correct order for these objects. And the PEPs how to teach section puts a large emphasis on the ordering being integral to the way this is taught, so really to use it at all a user would be familiar with this concept.

@mark-todd
Copy link
Collaborator Author

You're doing something far worse with this, leaving a loaded gun pointed all all users of gradual typing.

Is there a particular use case you're imagining here that would negatively impact these users? I've thought about quite a few different ways this could be used, but so far haven't found anything drastic - I'd be interested to know if there's something I haven't considered.

@DiscordLiz
Copy link
Collaborator

Is there a particular use case you're imagining here that would negatively impact these users? I've thought about quite a few different ways this could be used, but so far haven't found anything drastic - I'd be interested to know if there's something I haven't considered.

Returning an intersection to a user, as in mixins and class decorators, the only cases we have for intersection that can't already be solved with just protocols.

@mikeshardmind
Copy link
Collaborator

I think this should be required reading for anyone working on this, as it is another language with gradual typing adding a full type system. https://elixir-lang.org/blog/2023/09/20/strong-arrows-gradual-typing/, running into the same problems we have with Any (with Elixir's dynamic), and finding solutions that preserve useful types for users.

I also think I did a bit of a disservice by stepping away when I did without better detailing the problems of the ordered intersection, but I needed to take a few days for myself.

It is possible to reduce typechecker complexity to remain linear rather than polynomial here without introducing an ordering or giving up on the subtyping-based outcome, it's just going to be more complicated to work on. That benefit was not unique to having an ordering, it was just more obviously available as an easy optimization that could be made if the ordering was sound. The ordering isn't sound.

If you want to add intersections without user denotable type bounds, cutting to the chase:

  • T & Any is irreducible.
  • x where x is T & Any must not violate the rules of T even with the abilities of Any, it is a logical and
  • Callable signatures should not be reduced either. (int) -> int & (str) -> str should be left as-is, then you only need rules for evaluating if a function signature is compatible with each (again, a logical and)

With user-denotable type bounds, there's more powerful ways to express this that allow avoiding some sharp edges around Any.

@mark-todd
Copy link
Collaborator Author

I think this should be required reading for anyone working on this, as it is another language with gradual typing adding a full type system. https://elixir-lang.org/blog/2023/09/20/strong-arrows-gradual-typing/, running into the same problems we have with Any (with Elixir's dynamic), and finding solutions that preserve useful types for users.

@mikeshardmind Thanks, I'll have a read

It is possible to reduce typechecker complexity to remain linear rather than polynomial here without introducing an ordering or giving up on the subtyping-based outcome, it's just going to be more complicated to work on. That benefit was not unique to having an ordering, it was just more obviously available as an easy optimization that could be made if the ordering was sound.

I can see that there might be a way to resolve an unordered intersection in linear time, but the simplicity of implementing an ordered intersection is still there.

As you noted in your closing points though the __init__ clashes are still an issue, and currently only solved by ordered intersection. I also have a feeling that there will be lots of code that features LSP violations, as this is quite a new feature, so it might be handy to be able to support these too.

Callable signatures should not be reduced either. (int) -> int & (str) -> str should be left as-is, then you only need rules for evaluating if a function signature is compatible with each (again, a logical and)

We're also back to this whole thing of evaluating when Callables are compatible, which may be not trivial. Secondly, I'm not sure how this intersection of callables would be implemented, we end up back at how to generate overloads when merging classes.

The ordering isn't sound.

As far as I can tell there's a whole heap of issues that get solved with ordering, and the problems it creates seem pretty minimal. At least, I haven't seen any compelling examples against it.

@mikeshardmind
Copy link
Collaborator

We're also back to evaluating when Callables are compatible, which may be not trivial. Secondly, I'm not sure how this intersection of callables would be implemented, we end up back at how to generate overloads when merging classes.

It's simple, and in trying to map this to overloads we (myself included) made it more complicated than it needs to be.

  1. Don't generate overloads, they can't be equivalent in the absence of type negation.
  2. For each callable in the intersection, individually check compatibility (Already something type checkers need to be able to do)
  3. If any fail, it's not compatible.
  4. Optionally short circuit on first failure (type checker choice, this is performance vs error message not correctness)

As far as I can tell there's a whole heap of issues that get solved with ordering, and the problems it creates seem pretty minimal. At least, I haven't seen any compelling examples against it.

I don't mean to be rude here, but the entire reason why there was such a long discussion on Any & T, as well as why the attempt to come up with a clever way to hide the problem with reordering Any before, were the problems with this. There's no way to discuss this without leaving out prior nuanced arguments or feeling like we're rehashing ground that I imagine people are tired of rehashing as I certainly am.

There are multiple reasons, and I'm not going to rehash all and the rationale of each or provide specific cases

  1. In the face of known type information, erasing it as Any erases things that people expect the type system to know about and check for them, so this harms people using this for static analysis. (detectable False negatives on typed code)
  2. The majority of users of typing are not writing types, but language server users. Having type information vanish in an intersection harms language servers providing accurate completion hints, so this harms those who do not have strong knowledge about typing and are indirect consumers of it.
  3. Widening to Any without a user explicitly doing so is theoretically unsound, so this harms all future attempts to improve the soundness and expressiveness of the type system.

The only group not negatively impacted by this would be the people who do not use typing at all, and I'm not even sure that's true, as they may end up with worse libraries they use, but don't use the typings of, especially if library authors have to do gymnastics to make this work how they expect it to.

The only benefit to this appears to be making it easier to implement. While making something easier to implement is usually a laudable goal, care must be taken to ensure that it does not consume the usefulness of what is being implemented.

I can understand wanting to have something to show for this and get it done, but getting this wrong could be disastrous with how many things this has the potential to interact with. Other than HKTs, this is probably the single most complex thing that is being seriously proposed for python's type system. It's not an easy task to make sure everything we have to consider is being given proper weight.

With all of that said, I don't want to say any of this and leave you without a way forward either, so....

I think there are only 3 paths forward that remain viable right now, but if you find another we can certainly discuss it.

  1. Go with a strict subtyping-based rule. (@DiscordLiz provided the necessary rules above, there are a couple of corollaries from it that may be useful to type checkers, some of which can be found in An issue with the accepted definition of Union #22 )
  2. Improve on the surrounding parts of the type system, then come back to this with an easier-to-work-with foundation.
  3. Have intersections only work for disjoint types for now (This implies either constraining Any or banning Any) and revisit extending what is allowed with the benefit of seeing what people still have use cases for beyond this.

My opinion is that while any of those 3 would be fine,

  1. would have the strongest immediate benefit to users without compromising on important issues
  2. would be the best longterm outcome, but could push this into something that never gets worked on because it feels like there is an insurmountable barrier to contribution.
  3. would be easy to be correct, but would be very limiting on typevariable use if users can't denote certain requirements

@mark-todd
Copy link
Collaborator Author

@mikeshardmind Thanks for breaking this down, appreciate it. So I'll park the discussion about Any for now - although I don't understand why expanding to Any where more information is available is so critical, I can accept that this may be a requirement.

The path forward I'm not sure I understand why it was rejected is the one you were proposing in the previous PEP edition (where Any is the lowest priority). I appreciate the __getattr__ case might present an issue, but I did have another thought which is based on the idea of a "fallback".

Based on certain conditions, if Any is present in a intersection Any becomes the "fallback" which is used when all other objects in the intersection do not have the required attribute. There might be another situation where the __getattr__ return type is the "fallback" (which takes priority would be determined by the desired outcome).

As you noted in your closing points though the __init__ clashes are still an issue, and currently only solved by ordered intersection.

Also, I'm still not sure how this could be resolved in an unordered intersection, which is why I'm keen to resolve issues with the ordered one. Banning an intersection with clashing __init__ would be one way to go, but I have a feeling this would lead to many more issues. This reason alone I think is quite compelling, all others aside.

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Apr 18, 2024

The path forward I'm not sure I understand why it was rejected is the one you were proposing in the previous PEP edition (where Any is the lowest priority). I appreciate the getattr case might present an issue, but I did have another thought which is based on the idea of a "fallback".

I determined that getting the special casing right on this would be harder and more fragile than it was worth. (specifically, it would be harder than getting the full theory-drive intersection correct while keeping it useful to a wide audience, and would have issues of fragility)

Every level of special casing needed here makes it more and more complex, harder to reason about, and more sensitive to other changes/additions to the type system. The moment it was shown that simply re-ordering Any to the end was insufficient, any additional layered special casing was more complex than just doing it "right", the ordering was meant as a means to approximate the same behavior without needing as much to be specified to simplify things and it stopped being simpler as we discovered more issues, which I took as a sign (for myself) to take a break and come back to it later.

As you noted in your closing points though the __init__ clashes are still an issue, and currently only solved by ordered intersection.

Also, I'm still not sure how this could be resolved in an unordered intersection, which is why I'm keen to resolve issues with the ordered one. Banning an intersection with clashing __init__ would be one way to go, but I have a feeling this would lead to many more issues. This reason alone I think is quite compelling, all others aside.

clashes of constructors (__init__, (metaclass) __call__, and (metaclass) __new__) are already special-cased (much to my disappointment), so a special case around this, in particular, is unnecessary, and would already be covered by the existing ones for subtyping rules. While I'm disappointed in the special casing, it's very too late to undo that, and recent improvements to the specification on this do manage to avoid most issues and I think it's nearing a pragmatic enough point as to be a non-issue.

@mark-todd
Copy link
Collaborator Author

I determined that getting the special casing right on this would be harder and more fragile than it was worth. (specifically, it would be harder than getting the full theory-drive intersection correct while keeping it useful to a wide audience, and would have issues of fragility)

@mikeshardmind I see, that makes sense, but I'm not sure I agree.

Every level of special casing needed here makes it more and more complex, harder to reason about, and more sensitive to other changes/additions to the type system. The moment it was shown that simply re-ordering Any to the end was insufficient, any additional layered special casing was more complex than just doing it "right", the ordering was meant as a means to approximate the same behavior without needing as much to be specified to simplify things and it stopped being simpler as we discovered more issues, which I took as a sign (for myself) to take a break and come back to it later.

The way I see it, there are really only two special cases we've identified so far for an ordered intersection: Any, and __getattr__. If there are many more of these that arise, then I think I agree, but this doesn't seem like a dealbreaker for it as things stand (__init__ doesn't need to be a special case in ordered intersection).

If we look at the special cases for unordered intersection on the other hand, there are many: Any, signatures for __init__, __call__, __new__, Callables (although there is now a defined logic for resolving this above), other non-LSP violating methods of same name (I imagine this would resolve similarly). Some of these special cases may have existing logic in python to handle them, but that still makes it a much more complex specification. What's more, I believe __getattr__ would still end up being a special case in an unordered intersection, it'd just be get handled differently.

Special cases will arise in any specification like this, but the ordered version seems to have fewer and is easier to explain the logic for, as for any conflict there is a default way to resolve it. In the unordered version, everytime two classes "compete" for the same attribute or method, a resolution method has to be determined, so by it's nature it must be more complex to specify.

@mark-todd
Copy link
Collaborator Author

I also think it's worth noting that waiting for a lot of pre-requisites to be resolved internally is quite unrealistic. Even if right now all of the pre-requisites appear on a TODO list, any one of these could be delayed, and it's unlikely all of them will come to fruition. This is a tool that I personally think could be very useful, and we're incredibly close to a specification that would be implementable. It seems a shame to me to park the discussion for a future point that will in all likelihood never happen.

@mikeshardmind
Copy link
Collaborator

If we look at the special cases for unordered intersection on the other hand, there are many: Any, signatures for init, call, new, Callables (although there is now a defined logic for resolving this above), other non-LSP violating methods of same name (I imagine this would resolve similarly). Some of these special cases may have existing logic in python to handle them, but that still makes it a much more complex specification. What's more, I believe getattr would still end up being a special case in an unordered intersection, it'd just be get handled differently.

None of these require special-casing the specification with option 1 or 2 of the 3 options I listed for going forward

I also think it's worth noting that waiting for a lot of pre-requisites to be resolved internally is quite unrealistic. Even if right now all of the pre-requisites appear on a TODO list, any one of these could be delayed, and it's unlikely all of them will come to fruition. This is a tool that I personally think could be very useful, and we're incredibly close to a specification that would be implementable. It seems a shame to me to park the discussion for a future point that will in all likelihood never happen.

Only option 2 in the list of 3 options above requires temporarily shelving the discussion. Shelving it for that is to prevent other unfixable issues down the road that are similar to what we are facing now with decisions about the type system intentionally ignoring LSP because that was deemed easier.

If intersections got accepted with ordering rather than with either purely subtyping (possible now) or with denotable lower and upper bounds and constraints (would require groundwork), I think it would create a situation where HKTs would be impossible to support properly, and that's a gigantic detriment. HKTs are significantly more important to people than intersections are, they are also just more complex and not as easy to tackle in the current landscape.

If you think this is something that needs to be finished now, go with the subtyping approach then.

@mark-todd
Copy link
Collaborator Author

Ok, let's suppose we go with option 1 with the subtyping rules. If something is a subtype of T & Any it must be a subtype of T and a subtype of Any. We're still back to the old 4 vs 5 debate from a few months ago, on if an attribute of the irreducible quantity can be said to follow type T if it's present on T. Which means we're kind of back to square 1.

If intersections got accepted with ordering rather than with either purely subtyping (possible now) or with denotable lower and upper bounds and constraints (would require groundwork), I think it would create a situation where HKTs would be impossible to support properly, and that's a gigantic detriment. HKTs are significantly more important to people than intersections are, they are also just more complex and not as easy to tackle in the current landscape.

This is interesting. HKTs would be cool - is there any possibility of this happening in reality though? If there is and an ordered intersection presents a blocker (although I don't see why) then that is something to consider. I guess my follow up is why does it present a blocker?

My main reasoning is that if the only real blocker to an ordered intersection is that __getattr__ becomes a special case, that's not much of an argument when the alternative is trying to resolve attributes of T & Any which may be unresolvable.

I also still don't really understand why __getattr__ isn't a special case for unordered too - with ordered we were saying that the ordering priority may be reversed in some cases with this attribute, but with unordered we still need to define which item in the intersection takes priority. E.g.

from typing import Any, cast

class A:
    def __getattr__(self, item: str) -> int:
        return 1

B = Any

class C:
    foo: str

x= cast(A & B & C, type("D",(B,A,C),{})())
x.foo # str I suppose
x.bar # What type goes here?

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Apr 19, 2024

It was my intent to after taking some more time for myself, to dig into this vs explicit denotable lower and upper type bounds, I still intend on that at some later point, but we painted ourselves into a corner we didn't need to with some of the original discussions, and looking elsewhere for answers and perspective on why certain attempts at simplification were misguided and where they fell short of making things simpler has helped

If something is a subtype of T & Any it must be a subtype of T and a subtype of Any. We're still back to the old 4 vs 5 debate from a few months ago, on if an attribute of the irreducible quantity can be said to follow type T if it's present on T. Which means we're kind of back to square 1.

This, just like callable is simpler than we made because we were looking at it through the wrong lens at the time. The thing I linked about how Elixir is solving this is very illuminating as to how we framed the discussion made finding a solution harder than it should have been. We were working under the assumption that we must provide a single resulting type and that that type itself could not be an intersection, but there's no actual rule that that be the case.

It's irreducible in most cases, and that includes the attributes. But that makes things Easier not harder if context is preserved properly.

A.foo & B.foo can't get resolved to a single non-intersection type unless A.foo and B.foo are trivally overlapping (ie A.foo is exactly B.foo) or only one of A.foo and B.foo exist.

When using x.foo where x is a subtype of A & B, x.foo's use must be consistent with both A.foo and B.foo
When assigning to x.foo, the assignment must be consistent with assignment to A.foo and B.foo

This preserves both what those in favor of 4 wanted (a wider upper bound due to the presence of Any) and preserving the lower bound provided by any known specific type (those in favor of option 5, as well as the use case of indirect users via language servers). It's also significantly simpler to implement as it doesn't require a polynomial algorithm in the case of resolving certain types, and instead only a linear one at each usage.

I also still don't really understand why __getattr__ isn't a special case for unordered too - with ordered we were saying that the ordering priority may be reversed in some cases with this attribute, but with unordered we still need to define which item in the intersection takes priority. E.g.

Because the rules for __getattr__ will continue to work as they already do. It is already a special case, but its specialness is currently only acknowledging how attribute access works in python to match the data model, and we'd avoid adding another special case for it that behaves slightly differently.

We keep the existing rules for when __getattr__ is used, and any updates to those in the specification at any later point just work, without having to consider intersections again to ever improve that.

This is interesting. HKTs would be cool - is there any possibility of this happening in reality though?

Continues to be brought up with varying interest, it's significantly more likely with the new type alias syntax and type variable defaults, as the applications are more accessible and natural to more developers, and it's been stated as a priority for a few people heavily involved in python's typing. I'd say that yes, there's a real possibility of this happening.

If there is and an ordered intersection presents a blocker (although I don't see why) then that is something to consider. I guess my follow up is why does it present a blocker?

I really don't want to try and work through an example of this right now, there's still a lot of unknowns. I forsee an issue with each an intersection of higher kinded types, and with intersections within higher kinded types if the behavior is inconsistent with other subtyping.

I can't definitely say that this would be unresolvable for HKTs, but I can definitively say that every special case that exists will increase complexity of HKTs and that any single one of those cases could potentially be the straw that was too much. There's too many unknowns with it right now, and it hasn't been explored enough to know. I'm taking the pragmatic stance that if we don't need new special cases, we should do it without them to reduce the number of things that don't "simply work" together.


class A: ...
class B: ...
Specification
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to specify this in terms of subtyping and consistent subtyping:

  • T is a subtype of Intersection[A, B] if T is a subtype of both A and B;
  • Intersection[A, B] is a subtype of T if either A or B is a subtype of T.

I will omit the consistent subtyping rules here, as I think there is still no agreement as to how gradual types should be treated.

This approach has a benefit of being precise and also free from unnecessary details: we don't have to talk about attributes, methods, subclassing, structural types etc etc, as these are also part of the subtyping relations and are specified elsewhere.lasses/protocols etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I would say the details are often the places where these need resolving - e.g.
The definition of Protocol may specify a subtype must have method x of some form.

If two Protocols both have method x of different forms, we arrive from this definition that a subtype of both must have method x in both forms - it's not clear how to interpret this (hence why conversation on this particular point led to the overload debate).


So one possibility to fulfill an intersection is for a class to be a child of all intersected classes.
A "structural type" is considered to be one which cannot be mixed via inheritance to a regular class.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition does not follow from PEP 544. In fact, subclassing structural types in the PEP 544 sense (i.e. subclasses of typing.Protocol) is valid.

Can you clarify why you went with "structural type" here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came out of some of the discussions, I was looking for a way to clearly separate the classes that don't require inheritance, and this seems like quite a useful existing definition.

issubclass(C, A & B) # True
An empty intersection is considered to be invalid, as it does not satisfy the first rule.
However, it is possible for this to occur in ``TypeVarTuple`` expressions like
``Intersection[*Ts]``. In these cases an empty intersection would resolve to ``typing.Never``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not say that an empty intersection is always typing.Never?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general rule, we would expect Union to behave in the same way - I believe under the current specification Union[] is not valid so this seems like the correct dual. There is an interesting point here - I've just realised that unpacking TypeVarTuples is not supported for Union, so perhaps it also should not be supported by Intersection.


::
``Intersection`` does not forbid any incompatibility of type parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this is trying to say. Can you clarify, please?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. Intersection[int, str] would still be valid, even if in practice it would impossible to create.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI this is not how it works in Mypy currently. The intersection with isinstance of something like

x = ''
if isinstance(x, int): ...

Will lead to a Subclass of "str" and "int" cannot exist: would have incompatible method signatures. I don't care too much about this, I just wanted to let you know.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine mypy just mirrors the CPython implementation detail, where str and int cannot be subclassed at the same time?

It is important to note that once a type checker evaluated anything to ``Never`` within an
intersection it can stop further evaluations an return ``Never``.
This way a lot of edge cases by mixin types that can't be mixed are handled easily.
Inheritance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we have to spell that out separately, as it follow from the subtyping rules.

However, it will be useful to clarify whether an intersection is allowed a second argument to isinstance() and issubclass().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's a good point re isinstance and issubclass. I think based on some initial tests with Union, I would say these are valid:

class A:
    pass

class B:
    pass

x = A()

class Y(A):
    pass

print(isinstance(x, A | B))
print(issubclass(Y, A | B))

Both of these are valid and return True, therefore it follows that these would also be valid and return True:

class A:
    pass

class B:
    pass

class Y(A, B):
    pass

x = Y()

print(isinstance(x, Intersection[A, B]))
print(issubclass(Y, Intersection[A, B]))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably shouldn't have isinstance or issubclass behavior. General rule is that parameterized generic forms aren't valid 2nd arguments. Union support was special-cased and this was later considered a mistake.

Intersection types allow to succinctly combine multiple protocols (see PEP-544) into a single
structural type.
For example, instead of
Protocols can now be combined in any way, to produce a new type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in any way" here is potentially misleading, since later in the PEP you talk about order being significant.

Is it an error for the two protocols to have a non-zero API overlap? What should type checkers do in this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so I suppose I meant they can be combined in any way in the sense, any order is valid, any combination of overlapping protocols is valid - perhaps this is unclear.

Is it an error for the two protocols to have a non-zero API overlap? What should type checkers do in this case?

This is where ordering really comes into it's own - if two protocols have an API overlap, one of them takes priority and this method appears in the intersection. This is really what I meant about "in any way" - there are no invalid Protocol combinations.

[Clearly explain why the existing language specification is inadequate to address the problem that
the PEP solves.]

This allows:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOC why did you decide to leave mixing out? They are distinct from protocols in that they are supposed to be subclassed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just covered by other parts of the specification - the rules on inheritance specify this is required, but if an example would make things clearer we can definitely add one.


Protocol Reduction
------------------
Introducing ordering has many benefits, including the fact that it simplifies and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the implications of ordering for subtyping relations? Is A&B a subtype of B&A? Is it a consistent subtype of B&A? What if we have A&Any?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say A&B is a subtype of B&A if A and B differ (have no overlapping non dunder properties) or the overlapping properties are identical.
If they have different properties they aren't identical.

If I got the ordering correct A & Any is kind of Any because Any overwrites everything.
But for Any & A is all properties of A but everything that is not a property of A is Any

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If A and B are classes they are very likely to have a non-empty difference, including e.g. __init__.

I missed 99% of prior discussion, so it is very likely this has been covered already, but I find the idea of ordered intersections very confusing. I can see how unordered intersections are problematic, but I personally don't think the extra complexity of ordered intersections is justified.

Also worth noting that I'm looking at this from the IDE angle. And there you ~have to treat A & Any as an unknown subtype of Any, which never produces any static type errors.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point of clarity, it needs to be treated as an unknown subtype of each of A and Any, and some things within that can produce static type errors where being a subtype of A provides enough information. This came up in prior discussion, was proven to be so, and over on discourse, when discussing subtyping of Any, ignoring the other supertypes was loudly untenable for language servers.

Copy link
Collaborator

@superbobry superbobry Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for chiming in Michael. Like I said, I'm sure this has been discussed before. Can you link me to the relevant Discourse thread to avoid repeating the argument here?

My gut feeling, still, is that intersections are best left unordered in the PEP, because the ordering-aware design is complex, unintuitive and does not match the behavior of intersection types in other mainstream languages (e.g. TS and Java/Kotlin).

Copy link
Collaborator

@mikeshardmind mikeshardmind Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on them needing to be unordered. The ordering attempts were (both my previous ordering attempt which I went back to the drawing board on, and Mark's here) attempts to simplify the specification with an approximation of behavior that should be equivalent, it doesn't work though The discourse thread starting here https://discuss.python.org/t/take-2-rules-for-subclassing-any/47981/5 Has Eric's objection to ignoring the other bases with multiple supertypes, this with multiple inheritance involving Any, rather than intersections involving Any, but the problems are the same here.

If you don't end up reading the whole thread, this section of my post has a way to handle this that works https://discuss.python.org/t/take-2-rules-for-subclassing-any/47981/7#further-detailing-lsp-compatible-any-and-type-bounds-2 for both intersections and multiple inheritance, but I was steered away from it at first to try and get a smaller change to work, we found that the smaller change did not work.


I don't think TS will be a good model here (not rehashing that right now either) and we want to look at Kotlin and Elixir. In particular, Elixir has an in-progress handling of this that solves this well even with dynamic() by considering function domains. https://elixir-lang.org/blog/2023/09/20/strong-arrows-gradual-typing/

This gives us an answer for gradual callables/methods in an intersection:

(int) -> int & (Any) -> Any

Here,, we can compare function domains and arrive at....

((int) -> int & Any) & ((Any & ~int) -> Any)

I think the best way forward is just the subtyping rules paired with an assumption that since Any is for gradual typing and not yet-expressible concepts, Any can't contain something that would be a type error if expressed, and only represents a compatible unknown.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best way forward is just the subtyping rules paired with an assumption that since Any is for gradual typing and not yet-expressible concepts, Any can't contain something that would be a type error if expressed, and only represents a compatible unknown.

Yeah, I like this. Do we need to build consensus before refining the PEP in the way you described?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to build consensus before refining the PEP in the way you described?

Probably should try to, otherwise we have competing approaches being worked on simultaneously.

Copy link
Collaborator Author

@mark-todd mark-todd May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my main point of contention here is that the subtyping rules seem insufficient to describe all cases without a lot of further clarification. Or at least, whenever I try to apply them to some of the examples it leads to situations that don't make sense without further resolution.

My gut feeling, still, is that intersections are best left unordered in the PEP, because the ordering-aware design is complex, unintuitive and does not match the behavior of intersection types in other mainstream languages (e.g. TS and Java/Kotlin).

The "not matching other languages" part is a fair enough comment, but I actually think in the context of python ordered intersections are easier to understand than unordered ones. Perhaps I haven't explained it well, but really by introducing ordering any edge case that might be encountered immediately has an intuitive explanation - and there are analogies to explain it.

I think really the issue here is about expectation - it might be that this specification behaves in a way that the community finds unexpected, but I guess my hope is that by explaining it clearly this can be resolved. @superbobry Did the "how to teach" section make sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subtyping rules seem to work fine for me, check latest comment in #17 for how this works for callables

A "structural type" is considered to be one which cannot be mixed via inheritance to a regular class.
(See PEP 544 for details)
Some of the structural types are:
- ``typing.TypedDict``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you are referring to subclasses of typing.TypedDict and typing.Protocol?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's correct, I can rephrase to make this clearer

**⚠️ NOTE: This is document is work in progress, help wanted, see https://github.com/CarliJoy/intersection_examples/blob/main/README.md**


This following PEP was written originally on the PyCON US by Kevin Millikin (@kmillikin) and Sergei
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor clarification: it wasn't really written on PyCon US :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But triggered?
For the sake of making the processes on how PEPS envolve I would like to keep the reference to the PyCON (but more accurate).

IMHO we should be transparent how community works and make it clear, especially for this PEP, that it went a looonnnngg way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. Kevin and me were working on a Python language server, and we needed intersections to faithfully model the types arising in control flow.


Type narrowing in control flow
Copy link
Collaborator

@superbobry superbobry Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section is important, since type checkers are likely to refine type information based on control flow.

Can you clarify how ordering works with control flow. For example, given

def f(x: Any):
  if isinstance(x, T):
    x  # is the type here Any&T or T&Any?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - this is an interesting example!

(removed and modifying as I realised something)

Yeah this is really spicy. I'm going to discuss it in the context of concrete classes, as it's easier to reason about and the same rules apply:

class A:
    pass

class B:
    pass

def f(x: A):
  if isinstance(x, B):
    x  # is the type here A&B or B&A?

Here's the weird part - being an instance of B doesn't necessarily mean all of B's methods are exposed. A could have overwritten some. I actually think the type here is as follows:

Intersection[A,B] | Intersection[B,A]

A nice feature of the ordered intersection is that it's expressive enough to also become an unordered intersection where needed. Alternatively this could be treated as is (in VS Code the type here is:
<subclass of A and B>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the desire for ordered intersections arises from a desire to resolve all intersection types eagerly, and not have intersection types propagate. But I think this is not correct.

The type of (A & B).foo does not need ordering to decide whether A or B "wins" -- the type of (A & B).foo is simply A.foo & B.foo, always. If A.foo and B.foo are related as subtypes (this includes "same type"), then this intersection will immediately simplify. If A.foo and B.foo do not have a subtype relationship, then we cannot simplify the intersection, and it propagates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants