Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpretation of type Any & T #31

Closed
mark-todd opened this issue Dec 15, 2023 · 127 comments
Closed

Interpretation of type Any & T #31

mark-todd opened this issue Dec 15, 2023 · 127 comments

Comments

@mark-todd
Copy link
Collaborator

mark-todd commented Dec 15, 2023

Following on from the lengthy discussion here #1, it was concluded that Any & T is the best resultant type for intersecting Any. This is symmetric with the way union works Any | T -> Any | T. Conversation on this new thread will investigate the types of the attributes and methods of this intersection type. Just a note, inspired by @CarliJoy we can number and summarise different interpretations. We should also seek to again create a real world example that explores the new types, and compares different interpretations.

Also if a summary of each new interpretation could be succinctly provided alongside a usage example that would be grand :)

@CarliJoy
Copy link
Owner

CarliJoy commented Dec 15, 2023

Assume this simple but verbose example.
I will handle only accessing properties at the moment.

Module a_untyped

class A:
    common: str
    a: int
    a_str: str

    def fn_a(self, a: int) -> int:
        return a

    def fn_mix(self, mix: int) -> int:

    ab_mix: "A"

Module b_typed

class B:
    common: str
    b: str

    def fn_b(self, b: str) -> str:
        return b

    def fn_mix(self, mix: str) -> str:
        return mix

    ab_mix: "B"

main.py

import typing as t
from a_untyped import A
from b_typed import B

def needs_int(x: int) -> None:
    """Example function that needs to be called with integer to work"""

def needs_str(x: str) -> None:
    """Example function that needs to be called with string to work"""

def can_handle_any(x: t.Any) -> None:
    """Example function that is able to handle any type"""

def use_ab_intersected(x: A & B) -> None:
    # All this examples should work, as they work in runtime
    needs_str(x.common)

    needs_int(x.a)
    needs_int(x.fn_a(1))
    needs_int(x.fn_mix(1))  # should be detected as the correct call
    needs_int(len(x.a_str.split()[0]))
    can_handle_any(x.a)

    needs_str(x.b)
    needs_str(x.fn_b("b"))
    needs_str(x.fn_mix("b"))
    can_handle_any(x.b)

    use_ab_intersected(x.ab_mix)  # Intersect the property is valid
    needs_int(x.ab_mix.b)
    can_handle_any(x.ab_mix.a)

    # This should raise an error detected by the type checker (if both types known)
    invalid1 = x.not_existing
    invalid2 = x.fn_mix(object)

In general even so a variable is Any we can assume that runtime implementation of the object hold by this variable has a concrete implementation.

But if package is not marked as typed, everything is imported as Any, so let's assume A is typed as Any.

As an user I would want the following behaviour. This is probably not possible, but we should check rules against this example.

def use_ab_intersected(x: A & B) -> None:
    # All this examples should work, as they work in runtime
    needs_str(x.common)

    needs_int(x.a) # type: ignore - required
    needs_int(x.fn_a(1))  # type: ignore - required
    needs_int(x.fn_mix(1)) # type: ignore - required
    needs_int(len(t.cast(str, x.a_str).split()[0]))  # this needs a cast
    can_handle_any(x.a) # should be fine, because it is `Any` already

    needs_str(x.b)  # Hard requirement, this *must* work
    needs_str(x.fn_b("b"))  # Hard requirement, this *must* work
    needs_str(x.fn_mix("b"))  # Hard requirement, this *must* work
    can_handle_any(x.b) # Hard requirement, this *must* work

    use_ab_intersected(x.ab_mix)  # Hard requirement, this *must* work
    needs_int(x.ab_mix.b) # Hard requirement, this *must* work
    can_handle_any(x.ab_mix.a) # Hard requirement, this *must* work

    # This should raise an error detected by the type checker (if both types known)
    invalid1 = x.not_existing  # type checker should complain
    invalid2 = x.fn_mix(object) # type checker should complain

@CarliJoy
Copy link
Owner

Hmm looking at the example it seems that we have to decided between allowing can_handle_any(x.a) and raising type errors on invalid accesses.

As for the most A related call we have to # type: ignore or cast things anyway, for accessing properties I would tend for the T & Any -> T.

This is in contract to the example given by @ippeiukai here.

Copied and modified below for easier reference:

By the way, in terms of gradual typing, I feel Intersection[T, Any] = T is awful.

Module 1

some_fn(some_obj)

Module 2

def wrap_obj(val: T) -> T & SomeAddedTrait:
  …

a = wrap_obj(some_obj)
some_fn(a)

Here, let’s assume some_fn and some_obj are provided by two different libraries both currently untyped. What if some_fn got typed before some_obj gets typed? I expect I can do anything to a that I can do to some_obj. I don’t expect a to be reduced to mere SomeAddedTrait.

TODO: formulate example more concrete, I don't get it atm,@ippeiukai could you formulate it with proper files and imports like I did above

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Dec 15, 2023

By the way, in terms of gradual typing, I feel Intersection[T, Any] = T is awful.

I'll type up some examples when I have more time, hopefully later today, but both of the following are true here:

  • In terms of gradual typing, T & Any = T is awful
  • In terms of gradual typing T & Any = Any or otherwise unchecked (one interpretation of Any & T) is awful

I think @DiscordLiz had a good point in the prior thread about Any & T never being an intentional construct. When is this going to come up? Do people intending on having Any have a use case for intersections, or is this sufficiently advanced that people should be expected to have typed their bases better and we should err on the side of safety for users on this? I think we need to start from where we expect this to occur before we can evaluate the impact of any choice. It's clear it can only happen at the boundary between typed and either untyped or gradually typed things, but can we get more specific than that and determine if there's ever an intentional use for T & Any which isn't better served by other existing things?

@mark-todd
Copy link
Collaborator Author

@CarliJoy Thanks for this example! It's very thorough - I've reduced it down a bit to consider a particular subcase, not because the others aren't relevant, but just because I think it might be easier to discuss in pieces:

import typing
from intersection_examples import Intersection
class A:
    common: str

# A effectively gets cast to Any, so nobody knows the type common even exists

class B:
    common: int

def needs_int(x: int) -> None:
    """Example function that needs to be called with integer to work"""

def needs_str(x: str) -> None:
    """Example function that needs to be called with string to work"""

def use_ab_intersected(x: Intersection[A, B]) -> None:
    # Note: A&B is an LSP violation, but we don't know that it is, because
    # we don't know the type of A
    needs_int(x.common) # should be fine - this is a property of B
    needs_str(x.common) # This should also be fine - if A is used at runtime
    # It could be string? Or could it?
    
    # Follow up idea - do we have to assume the intersection created with Any
    # has not created any LSP violations?

My conclusion from this is that similarly to other types we intersect, perhaps we have to assume LSP has not been violated by the intersection with the Any type - although we have no way to be sure

@gvanrossum
Copy link

I am still groping my way around this subject. I am unsure of the technical meaning of "gradual typing" that seems to be used by some participants.

In the meantime, I have an example, which has actually convinced me that this proposal is fine:

def f(a: Any & int):
    x = a + 1
    reveal_type(x)

Given that a: Any & T means thata has all the attributes of T and they have the types specified by T, in this example, x has __add__(self, arg: int) -> int so the type of x should be revealed as int. That matches my intuition.

@DiscordLiz
Copy link
Collaborator

DiscordLiz commented Dec 15, 2023

My conclusion from this is that similarly to other types we intersect, perhaps we have to assume LSP has not been violated by the intersection with the Any type - although we have no way to be sure

Which makes it intersections all the way down and no different than being Any, cause we have to assume LSP wasn't violated.

If that were intentional, someone could skip the intersection and just write Any, and that works today.

I think @DiscordLiz had a good point in the prior thread about Any & T never being an intentional construct. When is this going to come up? Do people intending on having Any have a use case for intersections, or is this sufficiently advanced that people should be expected to have typed their bases better and we should err on the side of safety for users on this?

I don't think it's intentional, but I do think it can happen. The middle ground (that was option 5? and we're only considering options 4 and 5 now? Sorry, I seem to have missed the reason for splitting this to a new thread.) seems to avoid the accidental issues possible in 4, but I'd like more examples of it from @mikeshardmind

Given that a: Any & T means thata has all the attributes of T and they have the types specified by T, in this example, x has add(self, arg: int) -> int so the type of x should be revealed as int. That matches my intuition.

That would be the behavior in option 5. Option 4, the return type can only be int & Any, it's recursive with that option, and more pervasive than I think is beneficial.

@NeilGirdhar
Copy link
Collaborator

NeilGirdhar commented Dec 15, 2023

Given that a: Any & T means thata has all the attributes of T and they have the types specified by T, in this example, x has __add__(self, arg: int) -> int so the type of x should be revealed as int. That matches my intuition.

Just something for you to consider. Let's just look at an example that doesn't involve Any:

class C:
  def f(self) -> None: pass
class D:
  def g(self) -> None: pass
class E(C, D): pass

def f(x: C):
  if isinstance(x, D):
    reveal_type(x)  # C & D
    x.f()  # should be okay since x inherits from C
    x.g()  # should be okay since x inherits from D

f(E())

This is why the interface of C&D is the union of the interfaces of C and D.

Now, consider:

  • what is the interface of Any?
  • what is the interface of Any & C?
  • So if x has type Any & C, what is the type of x.f(), x.g(), and x.h()?

It's really worth reading Eric's elucidating comment.

For reference, there are two proposals being considered here. I'll call them 4 and 5 (relating to names from #1).
Proposal 4 says: x.f(), x.g(), x.h() are all Any.
Proposal 5 says: x.g(), x.h() are both Any, but x.f() is None.

@NeilGirdhar
Copy link
Collaborator

NeilGirdhar commented Dec 15, 2023

Even though I'm not yet in favor of proposal 5, I have a way to illustrate the problem it may be trying to solve:

Consider what isinstance does:

x: Any
assert isinstance(x, U)
y: T
assert isinstance(y, U)

The general case illustrated by y gives y type T & U. But there's a special case: if we start with something of type Any, then we don't want to end up with Any & U, and we instead set x's type to U. The reason we don't want Any & U is because that would cripple type checking: You would not be able to check most things related to the interface of x.

Essentially, there are two isinstance operators. One acts is an intersection that acts on ordinary types and type variables, and the other is a type replacement that acts replaces an Any type with whatever type is provided. Currently, type checkers do a lot of type replacement, and only a rudimentary intersection in some special cases.

I think the people in camp 5 are worried that this pernicious behavior that we avoid using the type replacement will creep in when we enable intersections generally.

I considered a 6th proposal: encourage to choose replacement if they want to protect against the spread of Any. This should only be relevant in some special cases involving things like unbounded generics. This can be done like this:

def f[T](x: T):
     assert isinstance(x, U)
     # Normally, this would have type T & U, but you can force replacement by casting.
     x_as_u = cast(x, U)
     # The over-broad interface of T & U is avoided.

This would allow us to leave the ordinary meaning of T & Any alone without any special cases, but potentially eliminate the problems associated with it.

I think we need to see some concrete examples to test if this could work.

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Dec 15, 2023

If I need a cast everywhere I use an intersection to avoid issues with Any, then I don't need an intersection, I can cast to a protocol that matches my use.

You're just replacing one wart with another worse one that disables other checks here, and requires runtime checkable protocols and a runtime cast use.

The reason we don't want Any & U is because that would cripple type checking: You would not be able to check most things related to the interface of x.

That's exactly the issue with option 4. It cripples type checking for the person with fully typed info on their side of the intersection and is no better for the person with Any on their yet untyped object than not using an intersection and using Any.

@NeilGirdhar
Copy link
Collaborator

NeilGirdhar commented Dec 16, 2023

That's exactly the issue with option 4. It cripples type checking for the person with fully typed info

So we're on the same page, and I think I understand your point of view.

I see how the deference proposal prevents false negatives, but it does so at the cost of false positives. Historically, the Python typing community has preferred solutions that avoid false positives even if they cause false negatives. For example, the adoption of the sequence protocol into the typeshed causes false negatives in order to avoid false positives.

I think we just need to see at least a few common examples to get a better handle of what the tradeoffs are.

If I need a cast everywhere I use an intersection

I don't think it would be "everywhere you use an intersection", but as I stated in the previous comment, it would only be in the cases of intersections with unbound type variables, and similar. Like I said, this was just something I thought of. We need to see some examples to know what it would look like.

@mark-todd
Copy link
Collaborator Author

I am still groping my way around this subject. I am unsure of the technical meaning of "gradual typing" that seems to be used by some participants.

In the meantime, I have an example, which has actually convinced me that this proposal is fine:

def f(a: Any & int):
    x = a + 1
    reveal_type(x)

Given that a: Any & T means thata has all the attributes of T and they have the types specified by T, in this example, x has __add__(self, arg: int) -> int so the type of x should be revealed as int. That matches my intuition.

@gvanrossum I think your example makes sense, but the controversial part comes from the other side of the intersection - as you describe the result must have all the properties of int so using it with __add__ makes sense here. But on the other side we have Any, and therefore x must also have all the properties of Any. What does it mean to have all the properties of Any? If it encompasses everything, can we say it says any property we like? What if Any also has a method __add__ that receives str?

My conclusion from this is that perhaps it has to be some subset of every available option, e.g. Any type that fulfils this LSP condition. Or perhaps no methods that "clash" with those from the other half of the intersection.

@DiscordLiz

Which makes it intersections all the way down and no different than being Any, cause we have to assume LSP wasn't violated.

Perhaps we should restrict our examples to those that fulfil this condition? Even in the case we're untyped, it seems we have to assume that this method clash hasn't occurred.

@NeilGirdhar @mikeshardmind Specifics aside, I think whatever solution we come up with, it shouldn't require cast or type:ignore to pass through the type checker successfully. I mentioned the below relating to an earlier debate, but I think the same philosophy applies here - given the choice between:

  • Being given a type that's too broad for the attributes available - there won't be any errors raised when we do some things that are invalid.
  • Being given a type that's too narrow for the attributes available - there will be errors raised when we do things that are valid

The second one just seems much worse to me than the first, but I guess that is personal preference somewhat.

I guess this is maybe another way of phrasing @NeilGirdhar 's previous point about false positives vs false negatives.

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Dec 16, 2023

The False positives case is preferable because we still give the user a way to handle it without a type ignore and without additional runtime costs. The False negative case cannot be remedied except by removing Any, and the whole reason we can't ban Any from intersections is that type checkers have stated this could happen in a way users don't have a way to remedy or even a clear way to inform them.

Consider this:

from untyped_library import SomeParserAugmentor  # user may not realize this is untyped at first
from typed_library import SomeParser  # generic over T, provides .parse(self, arg: str) -> T

def my_func(input_: str, parser: SomeParser[int] & SomeParserAugmenter ) -> None:
    """ Typed permissively so that other people can use the function """

   # This errors, we restrict the interface of Any when it clashes with what is typed
   # This also means the person writing the intersection is the one who gets the error
    x = parser.parse(input_, fix_mojibake=True) 


class _Parser(SomeParser, SomeParserAugmentor):
    """ Internal fallback where we create only the baseline functionality """
    pass
...


# this is fine, subclassing from Any means the attributes needed are assumed there. 
# This is the case currently with subclassing Any, not even just with this proposal
my_func(some_user_content, _Parser(int))  

In this case, the user actually has the tools to fix this without a type ignore in multiple ways.

  1. Because they are informed that something isn't meeting their expectations, they may contribute upstream types
  2. They can also add a protocol stating that the attribute exists:
from untyped_library import SomeParserAugmentor  # user may not realize this is untyped at first
from typed_library import SomeParser  # generic over T, provides .parse(self, arg: str) -> T
from typing import Protocol

class FixesMojiBake[T](Protocol[T]):
    def parse(self, input_: str, fix_mojibake: bool=False) -> T: 
        ...
    

def my_func(input_: str, parser: SomeParser[int] & SomeParserAugmenter & FixesMojiBake) -> None:
    x = parser.parse(input_, fix_mojibake=True)  # This no longer errors

class _Parser(SomeParser, SomeParserAugmentor):
    """ Internal fallback where we create only the baseline functionality """
    pass

# this is still fine
my_func(some_user_content, _Parser(int))  

With this, they've now explicitly stated they expect this compatible method to exist. This gives users the information they need to type things without compromising on safety for the portions of code that are already typed, and while being easy to accommodate situations outside of their control with untyped libraries. Note that this is safer thana type ignore, it would rely on the nature of Any for the thing Assigned to it, but it requires explicitly escaping out for things that could otherwise be accidentally hidden in terms of misuse by restricting the interface on usage to what is explicitly stated. This also automatically continues working should the untyped library become typed in the future without something lurking around for compatibility that could be silencing errors.

I think if we pair this approach with a note that type checkers should inform users when Any is present in an intersection that is causing an error and point them to a reference of how to handle this, we can make this easy to learn and ergonomic as well.

@gvanrossum
Copy link

A few observations after having read Eric's comment and trying to reason various things out in my head.

  1. It is very easy to confuse yourself by mixing up the meaning of & with that of | (or vice versa). This happened to myself a few times. I think it's due to the fact that after x: T1 | T2, we consider valid properties of x those that are valid for T1 and for T2, while after x: T1 & T2, intuitively, the set of valid properties is the union of the valid properties of T1 and T2. (It's a similar confusion as often happens around contravariance.)
  2. If is easy to reason things out for proper types, including edge cases like object and Never.(top and bottom, respectively), by thinking of types as sets of objects. The intersection and union operators work sensibly in this case, and all the rules you can derive this way (e.g. distributivity, subtyping, and attribute type evaluation, as shown in Eric's comment). However, Any is *not* just another edge case. It is the defining building block of gradual typing, and behaves like top or bottom depending on which side of the subtyping operator (<:`) it appears.
  3. Using assert isinstance(x, T) in examples is not all that helpful, because at runtime T is constrained to concrete types. We need to go back to presenting examples in terms of whether x = y is allowable or not according to the type system given specific types for x and y (and possibly what the type of x is afterwards, in situations where narrowing happens).
  4. Generics present a whole nother can of worms. I haven't spent enough time thinking about these.
  5. There are probably also traps due to the difference between structural subtyping (Protocol) and nominal subtyping.
  6. And perhaps others due to the pragmatic treatment of TypedDict.
  7. Folks, please proofread your messages.

I'l write more later.

@gvanrossum
Copy link

[me]

Given that a: Any & T means thata has all the attributes of T and they have the types specified by T, in this example, x has __add__(self, arg: int) -> int so the type of x should be revealed as int. That matches my intuition.

@NeilGirdhar

Just something for you to consider. Let's just look at an example that doesn't involve Any:

Great, it should be possible to reason this out by considering types as sets of objects.

class C:
  def f(self) -> None: pass
class D:
  def g(self) -> None: pass
class E(C, D): pass

def f(x: C):
  if isinstance(x, D):
    reveal_type(x)  # C & D
    x.f()  # should be okay since x inherits from C
    x.g()  # should be okay since x inherits from D

f(E())

This is why the interface of C&D is the union of the interfaces of C and D.

But why introduce a new term, "interface"? Python's type system never uses this word -- we talk about types. I feel it's misleading to introduce this term -- probably because of my point (2). We should only consider questions of the form "if x has type T1 and y has type T2, is x = y acceptable?" And in addition, questions of the form "given x has type T and OP(a) is some operation (such as getting an attribute of a specific name, or calling it with specific arguments), is OP(x) valid?"

Now, consider:

* what is the interface of `Any`?

I feel this is not helping clear reasoning, given that in x = y, the rules when x has type Any are totally different from the rules when y has type Any. (And different again for OP().)

* what is the interface of `Any & C`?

Should be rephrase in terms of what happens to x = y or perhaps OP(x), separately for the case where x: Any & C and y: Any & C.

* So if `x` has type `Any & C`, what is the type of `x.f()`, `x.g()`, and `x.h()`?

That's a valid question. I think it should be answered by considering separately what the type of an expression is when x: Any and when x: C, and then doing an & over the resulting types.

Interestingly, since the answer might well be "an error occurs" for one of the branches of the & operator, maybe we should consider what the result should be of Error & T. Thinking about some examples, it seems that the Error type should disappear and the answer should be T -- IOW, it is not an error unless all branches of the & operator have type Error. This is nicely the dual of the error behavior for |, where the rule is that it's an error if it is an error for any branch of the | operator.

But now we get to think about when an error should be reported to the user -- surely not as soon as it occurs, in the case of &, or else x.attr where x: C & D would report an error if C supports attr but D doesn't -- and the intersection is meant to not report an error in that case.

Also, it seems that whatever rules we come up with for Error should also apply to Any? Except they don't -- It would seem that Error | T reduces to Error, but Any | T is irreducible. (I now understand that word. :-) It seems that we perhaps might regret that choice, but even those who regret it seem to agree that it's too late to change -- and I'm with Eric in that I think we should make sure that & is the dual of | in this respect, like it is everywhere else.

It's really worth reading Eric's elucidating comment.

For reference, there are two proposals being considered here. I'll call them 4 and 5 (relating to names from #1). Proposal 4 says: x.f(), x.g(), x.h() are all Any. Proposal 5 says: x.g(), x.h() are both Any, but x.f() is None.

I don't know what the type of x is here, but it can't be from the code snippet you gave at the top of your comment (which has neither Any nor h()). Could you provide the whole context?

@NeilGirdhar
Copy link
Collaborator

NeilGirdhar commented Dec 16, 2023

Specifics aside, I think whatever solution we come up with, it shouldn't require cast or type:ignore to pass through the type checker successfully....I guess this is maybe another way of phrasing @NeilGirdhar 's previous point about false positives vs false negatives.

Exactly what I was getting at :)

I think if we pair this approach with a note that type checkers should inform users when Any is present in an intersection that is causing an error and point them to a reference of how to handle this, we can make this easy to learn and ergonomic as well.

This is a great idea that actually works with both proposals. We could leave the "correct" semantics of proposal 4 coupled, and encourage type checkers to implement an optional warning when someone accesses an attribute on something with type Any & T. This warning could then be resolved by:

  • using type-ignore,
  • adding a cast, or
  • disabling the warning.

3. Using assert isinstance(x, T) in examples is not all that helpful, because at runtime T is constrained to concrete types.
4. Generics present a whole nother can of worms.

I agree that we should explore other ways in which Any & T pops up naturally. However, I think the is-instance example works when the type of x before the instance check is a type variable bound to Any.

3. We need to go back to presenting examples in terms of whether x = y is allowable or not according to the type system...

But why introduce a new term, "interface"? Python's type system never uses this word -- we talk about types. I feel it's misleading to introduce this term -- probably because of my point (2). We should only consider questions of the form "if x has type T1 and y has type T2, is x = y acceptable?"

Right, I see where you're coming from. If that's the language you like, then I think you'd find Jelle's comment the most clarifying. I was using language closer to Eric's comment.

As per Jelle's comment, two things constitute the "type" (as returned by reveal_type) of x:

  • For which y is x = y is allowable? And,
  • What operations can you do with x?

These correspond to what Eric called subtyping and attribute type evaluation. And the latter is what you're calling OP(x).

That's a valid question. I think it should be answered by considering separately what the type of an expression is when x: Any and when x: C, and then doing an & over the resulting types.

Yes, you're right of course. I made a mistake in my evaluation.

Could you provide the whole context?

I think I just made an error when evaluating the values 😄

Regarding the ideas surrounding Error, I need to think about it more.

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Dec 16, 2023

This is a great idea that actually works with both proposals. We could leave the "correct" semantics of proposal 4 coupled, and encourage type checkers to implement an optional warning when someone accesses an attribute on something with type Any & T. This warning could then be resolved by:

Please review the important distinction with false negatives. There is no way for a user to know they need to resolve a false negative arising from types outside of their control because the user will not get information. This would be particularly insidious if the user thought they had something typed that wasn't. Additionally, the entire reason we are having to consider this is that we cannot ban Any from intersections entirely, they can arise deeper in the type checker, per Eric's comment taking that off the table. This means in those cases, there's nowhere to give the user the error for Any existing in the intersection, and there would only be a resulting false negative. This cannot be used to resolve this in the case of proposal 4 if we continue to accept previously accepted arguments.

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Dec 17, 2023

Also, it seems that whatever rules we come up with for Error should also apply to Any? Except they don't -- It would seem that Error | T reduces to Error, but Any | T is irreducible. (I now understand that word. :-) It seems that we perhaps might regret that choice, but even those who regret it seem to agree that it's too late to change -- and I'm with Eric in that I think we should make sure that & is the dual of | in this respect, like it is everywhere else.

We need to be particularly careful that anything we say is a dual and should have symmetry actually is or we may create worse situations by mis-considering something. It would be one thing to intentionally decide something isn't useful as symmetrical, it would be another to inadvertently state things that arent symetrical must be treated as if they are. This issue here elaborates on Eric's comment and shows that only some of what was said was true by nature of universal properties.

Importantly: from it:

The duality between intersection and union lies in reversing the subset relation, not in replacing an "and" with an "or".

@gvanrossum
Copy link

Duly noted that duality is tricky.

I have my thoughts about false positives vs. false negatives but they are colored by my experience spinning up mypy as it was being written in a production code base with millions of lines of existing (untyped) code. False positives would cost the tiny mypy team days of explaining to random senior engineers why they had to make changes to working code to satisfy the upstart type checker, while false negatives were what everybody got all the time (since most code was still untyped).

I am despairing at ever coming to terms with this topic. I wrote the original text in PEP 483 and I admit it was vague to the point of being unhelpful. (Since it is not a standards track PEP, I'd be happy to entertain PRs that fix specific issues with it, though not with proposed full rewrites.)

I am still hopeful that we can define intersections in a way that's the dual of unions (using the correct definition of dual).

@NeilGirdhar
Copy link
Collaborator

There is no way for a user to know they need to resolve a false negative arising from types outside of their control because the user will not get information.

Why can't the type checker produce a warning when a user accesses an attribute of something that has type Any & T?

This means in those cases, there's nowhere to give the user the error for Any existing in the intersection,

In the cases that the attribute access happens deep in some library, the user hasn't been "robbed of a potential error". If there is an error, it's the library author that needs to see it.

@mikeshardmind
Copy link
Collaborator

There is no way for a user to know they need to resolve a false negative arising from types outside of their control because the user will not get information.

Why can't the type checker produce a warning when a user accesses an attribute of something that has type Any & T?

That was Eric's statement that took banning Any in intersections off the table, and was taken at face value given the work he has put into pyright. I don't know why it can't, I see no reason why it shouldn't be possible to bubble it up to where it was sourced from, but I'm not actively maintaining a type checker right now either. Considering this was used to take an option off the table and nobody contested that, we can't simultaneously accept it to take one option off the table and reject it to say something else isn't an issue. If you'd like to contest that, I'd happily bring back the argument to ban Any in an intersection because I can't see a single case where intentionally using it in one is beneficial. It would be better to omit the intersection entirely and just type Any instead of Something & Any with some of the proposed behaviors. (If the presence of Any in the intersection effectively allows everything as compatible, and silences all errors, it's no different than Any even if we have theory that says we shouldn't remove it, the practical use is no different.)

In the cases that the attribute access happens deep in some library, the user hasn't been "robbed of a potential error". If there is an error, it's the library author that needs to see it.

This isn't necessarily true though. It's not an error in the library, it's a lack of information that would be silently hiding any issues in an intersection resulting between my code (typed) and library code (untyped). This would become yet another social pressure to add typing, to libraries that may have various reasons not to, such as still using things the type system does not yet have a way to express.

From a pragmatic standpoint, if someone says "I need to be able to use something as this type and this other type", if one of those types isn't typed, do we really want to discard information on the type that is typed?

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Dec 17, 2023

I have my thoughts about false positives vs. false negatives but they are colored by my experience spinning up mypy as it was being written in a production code base with millions of lines of existing (untyped) code. False positives would cost the tiny mypy team days of explaining to random senior engineers why they had to make changes to working code to satisfy the upstart type checker, while false negatives were what everybody got all the time (since most code was still untyped).

I can appreciate that perspective. Largely, I would agree on minimizing false positives. When I was coming up with the middle ground, the considerations I took with it were the following:

  • The false positives possible here arise for the users of an intersection, not for the author of other untyped code.
  • The false positives are resolvable in a way that remains safe if the other code becomes typed in the future (no type ignore supressing information)
  • The false positive's resolution can be used as a stepping stone to a stub or type information being improved, as it promotes the use of duck typing of things which are not yet fully typed.
  • That I do not see a case where the author of an intersection would intentionally include Any in it, rather than just use Any until the operands of their intersection were typed.
  • That (and this may end up revisited) disallowing Any in intersections entirely was considered non-viable.
  • That this would be the first way where a user expecting something to be typed because they typed it could interact with other types and not be checked anymore. The boundary between typed and untyped vanishes.
  • The alternative of false negatives would push those with typed code that is "Broken" by this to push for the full removal of any code that isn't typed. I don't think this is largely positive socially, but I'll leave that at that for now, it doesn't need a full rehashing.

@NeilGirdhar
Copy link
Collaborator

NeilGirdhar commented Dec 17, 2023

If you'd like to contest that,

I don't want to contest that because I agree with Eric that Any & T can occur deep in library code, e.g., as a result of unbound type variables.

I'm saying that in your own code when the you access an attribute of something having type Any & T, then the type checker could warn you that this access is not being type-checked. Wouldn't that resolve your worry about type checking letting errors slide for users?

If you feel strongly about this, maybe you should reopen #1 and try to swing the consensus to the proposals you were supporting (1 and 2). This thread was supposed to be about 4 versus 5.

it's a lack of information that would be silently hiding any issues in an intersection resulting between my code (typed) and library code (untyped).

Can you give an example of library code that accesses an attribute of something having type T & Any and it hiding an error in user code?

From a pragmatic standpoint, if someone says "I need to be able to use something as this type and this other type", if one of those types isn't typed, do we really want to discard information on the type that is typed?

Yes, OP(x) is necessarily broader. This is analogous to how a parameter that is annotated to have type T & U becomes Any when T or U happens to be Any.

The alternative of false negatives would push those with typed code that is "Broken" by this to push for the full removal of any code that isn't typed

I don't see how that helps since if something is annotated T & U, and T is Any, then no one is going to push for the annotation T & U to be removed. They may push for (or wait for) T to be given a narrower definition.

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Dec 17, 2023

I'm saying that in your own code when the you access an attribute of something having type Any & T, then the type checker could warn you that this access is not being type-checked. Wouldn't that resolve your worry about type checking letting errors slide for users?

No. Categorically, no, because at that point I'd rather make it an error to include Any in the intersection than an optional warning. Many of the cases we have motivating this are decorators with intersections as the return type. The behavior of (Op(Any & T) being equivalent to Op(Any) means the intersection there is pointless for those specifying their needs, and may lead to surprising outcomes for users. My view on this is not only from the point of safety and correctness, but also on making the type system less surprising to users. If Any & T is no more specific about what is allowed than Any, then specifying Any & T shows that the user clearly thinks something more is happening or they wouldn't bother with a more complex type that is the same thing.

This also relates to:

From a pragmatic standpoint, if someone says "I need to be able to use something as this type and this other type", if one of those types isn't typed, do we really want to discard information on the type that is typed?

Yes, OP(x) is necessarily broader. This is analogous to how a parameter that is annotated to have type T & U becomes Any when T or U happens to be Any.

You say it's an obvious analog, but I challenge you to find a case where you would intentionally include Any in an intersection as a parameter and it be more useful than simply stating Any. If there is no case where it is, the behavior is not useful, and the statement of that intersection would indicate that a user has the expectation that something else would happen here.

If you feel strongly about this, maybe you should reopen #1 and try to swing the consensus to the proposals you were supporting (1 and 2). This thread was supposed to be about 4 versus 5.

Since we clearly aren't all on the same page about the reasons why other proposals were not an option, I think that is necessary. I only agreed that we could stop that discussion because I thought people were going to be consistent in applying the reasoning for other options being taken off the table.

Can you give an example of library code that accesses an attribute of something having type T & Any and it hiding an error in user code?

I'm not rehashing examples that were already given and interacted with. That you have ignored prior examples I have given on this matter does not make me inclined to take the time on this, those interested can read the prior discussion.

The alternative of false negatives would push those with typed code that is "Broken" by this to push for the full removal of any code that isn't typed

I don't see how that helps since if something is annotated T & U, and T is Any, then no one is going to push for the annotation T & U to be removed. They may push for (or wait for) T to be given a narrower definition.

Removal of untyped code does not need to be by removal of the code. I thought you had been aware of the other related discussions on this, and the part of that statement you left out was the social effect. Socially pressuring other libraries to add types is not a positive effect and is more divisive than a lot of people deep in typed python realize. I value a pragmatic boundary between what is typed and what is not that makes it clear when typed code is interacting with untyped code without needing to compromise on the portions that are typed, as it reduces that pressure to a minimum.

@DiscordLiz
Copy link
Collaborator

I don't want to contest that because I agree with Eric that Any & T can occur deep in library code, e.g., as a result of unbound type variables.

Since neither of you have said it, We can also forbid unbound type vars in intersections or say that the behavior of an unbound typevar in an in intersection is to default to a bound of object rather than of Any, which remains as permissive to what can be passed in, but doesn't create that issue. This isn't a strong argument against forbidding Any in an intersection, but It also doesn't fix everything. The problem here isn't just typevars.

def real_class_decorator_example[T: object](user_type: type[T]) -> type[T & MyProtocol]:
    ...

If a user calls a function like this today, inside the body of foo, it is only safe to access what is on type[object]. However, what they get back is safe to use based on whatever the input to this function was. It is clear that this decorator adds support for a protocol, but if T going in is Any, then that protocol is never going to be checked properly under option 4.

I'm concerned that if we just allow Any here to remain in and have the behavior from option 4, that a lot of situations involving intersections become useless in the presence of any untyped code at all, because it's just too risky that somewhere deep in the type checker, something became as useless as Any unexpectedly.

what @mikeshardmind came up with for option 5 avoids this. If we're reopening up the other options, I think T & Any being treated as T for safe access also avoids this, even if we know it isn't theoretically fully accurate. Both of those options retain what he expressed about a pragmatic boundary between typed and untyped as well, and I agree with him that that it's necessary to have that boundary. He spoke from social pressure, but there are other issues as well. Not everything in python can be expressed by the type system. That's okay with me. But I'd like to be able to use those things when needed and still keep what assurances can be provided by typing in the process.

@DiscordLiz
Copy link
Collaborator

DiscordLiz commented Dec 17, 2023

For what it's worth, keeping in mind that the difference between intersections and unions is more about the subtyping relationship, and that Any behaves differently on opposite sides of <:

T | U Allows anything that is a subtype of T or a subtype of U to be assigned to it. In the case of U being Any, this allows anything.
T | U Allows only what is common between T and U to be accessed without discriminating first. In the case of U being any, this allows exactly what is available on T.

I think we all agree on that part right?

T & U allows only that which is a subtype of both T and of U to be assigned to it. If U is Any, the requirement of it being a subtype of T has not gone away.

A concrete object known as Any, but then known as a subtype of T is no longer Any It's a subtype of T. The fact that in python we do not distinguish between an unknown that should be treated as anything until it is known and literally allowing anything is making this dual harder for people to reason about.

From all of that, a consistent behavior here is that Any & T no longer has the compatibility behavior of Any as it's no longer Any.

@DiscordLiz
Copy link
Collaborator

And if you extend that to "It also needs to be a subtype of Any though", well what does that look like?

class Intersected(T, Any):
...

I know he called it a compormise, but it's sure looking less like a compromise and more just like the correct consistent behavior with other typing decisions to me

@mikeshardmind
Copy link
Collaborator

sigh it's consistent with other typing decisions, and consistent with the subtyping relations involved, but I don't know that the prior decision involved was well considered.

I happen to like that outcome, but the ability to subclass Any at runtime to indicate that behavior was accepted with a brief bit on the mailing list that received minimal attention and was done without a pep discussing the impact on how we treat Any in the type system, and I won't hinge it on "We have this behavior, so this must be the right interpretation". It being a consistent interpretation that continues following the subtyping rules is nice, but that was by design. I won't go as far as to say it's the only possible interpretation given the only minimal prior consideration.

I think it has precluded something important that could have been done that you actually got at, splitting "Unknown, could be anything, but use what you know about it and permissively allow use until you know something more specific about it" and the theoretical "Any"

@mikeshardmind
Copy link
Collaborator

I did some more thinking about this, I think option 4 actually is a non-starter unless we also make changing the behavior of runtime subclassing of Any part of the PEP. I'm not in favor of this as I think option 5 is better for users, but I do think it needs to be said so that if option 4 is decided on, that isn't lost.

Not doing so would make Any special in that it is the only case where an instance of a valid subclass:

class UnimportantName(T, U):
    pass

Did not have the same safe operation on it as

T & U

valid emphasized, because some intersections would require additional reconciliation in the body. The first form is valid today with Any as a base, and matches the behavior of option 5

@gvanrossum
Copy link

Did not have the same safe operation on it as

Maybe it's clear to you, but it's not clear to me what safe operation you are referring to here?

@mark-todd
Copy link
Collaborator Author

@mikeshardmind

I don't think we have actually. To me, option 4 is possible to rule out right now, as it provides negative value for all of my use cases for intersections. If it were accepted, I'd probably give up on trying to improve typing in python. I view it as that fundamentally problematic, and I already have many things pushing me in that direction with people picking purity and ease for tools over pragmatism for users. We don't have agreement here about which options can be ruled out. In contrast, I'm still trying to find solutions that we all might find agreeable rather than have to present a divide, and you went from

I would say there have been examples that have shown negative value for option 4 too (e.g. the table example @randolf-scholz put forward earlier in the discussion), it's just that for me in option 5 the pros outweigh the cons.

To not considering it an option here with no further comment, and nobody else engaged with that or the motivations behind actually working on a reconciliation that works for users.

I should have made this clearer, I guess ultimately I was thinking that options 5 and 6, are most similar so we could figure out which is best as part of developing the "option 5" PEP, I didn't mean to suggest I'd excluded it. I think option 4 presents a case that is much more different, and reconciliation between this and the other two is harder.

To put it bluntly, it feels like people are giving up on even attempting to come up with a definition that reconciles various concerns and just want to be done with it. If that's the case, please don't rush this and shelve it instead. Leave it for people who will do it better for users.

It seems to me that there are some cases that work better for option 4 and some that work better for option 5/6. I'm not sure how this could be resolved by somebody hypothetical in the future, unless there exists an option we haven't yet thought of.

@mark-todd
Copy link
Collaborator Author

mark-todd commented Jan 9, 2024

@mikeshardmind
This is the comparison table I was referring to: #31 (comment)

I think Option 6 would look the same as Option 5 in this table (hence what I meant about similarity) - if this isn't the case though I'd be interested to see what it's column would look like.

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Jan 9, 2024

I think option 4 presents a case that is much more different, and reconciliation between this and the other two is harder.

I don't think the arguments for option 4 over option 5 preclude option 6.
The arguments for option 4 which precluded option 5 were based around consistency and special casing.
6 does not have that issue.

I think Option 6 would look the same as Option 5 in this table (hence what I meant about similarity) - if this isn't the case though I'd be interested to see what it's column would look like.

The table may differ* between 5 and 6. Specifically, if 2 operands have conflicting definitions for get, use of get would be an error. It's this difference that allows 6 to have consistent rules that apply to the definition of Any.

* Edit: actually, it doesn't differ, the table doesn't present the case where the difference exists! but described anyhow

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Jan 9, 2024

It seems to me that there are some cases that work better for option 4 and some that work better for option 5/6. I'm not sure how this could be resolved by somebody hypothetical in the future, unless there exists an option we haven't yet thought of.

I don't think there are any cases which work better for 4 which are actually good canidates for intersection. None of the motivating cases we collected benefit from it, but toy examples did. Those toy examples also benefited from the mused about type[*bases] where this would not work for all of the intersection motivating cases.

The toy examples which benefited from 4 over 5 assumed complex diamond patterns were common, the actual class which resolves them wouldn't be provided, and that one side of the intersection would be untyped. I view this as a stretch, but I think that case can still be supported separately with the type[*bases] form if there's still a need for that.

@mark-todd
Copy link
Collaborator Author

@mikeshardmind

Edit: actually, it doesn't differ, the table doesn't present the case where the difference exists! but described anyhow

Yeah I suppose what I meant was it seems to be "new and improved" option 5, which is great, but then if anything that makes it the successor to option 5. Option 6 takes care of a different (but still very useful!) case.

Then the discussion becomes option 6 vs option 4, and we're back to the same old debate. To truly resolve we'd need something to behave like this:

function hypothetical + external untyped external typed
both_have_get Any & (X -> Y) (U -> V) & (X -> Y)
theirs_has_get Any (U -> V)
ours_has_get (X -> Y) (X -> Y)
neither_has_get Any ERROR

...but I think this is impossible. So that means a trade off has to be made, but which trade off is the better of the two seems quite subjective, it might be based on which use case feels like the most common or most useful.

@mark-todd
Copy link
Collaborator Author

mark-todd commented Jan 9, 2024

The toy examples which benefited from 4 over 5 assumed complex diamond patterns were common, the actual class which resolves them wouldn't be provided, and that one side of the intersection would be untyped. I view this as a stretch, but I think that case can still be supported separately with the type[*bases] form if there's still a need for that.

This is interesting. I know we floated the idea at some point of type[*bases], but I remember there was a syntax issue with this, as there's no way to express an instance of the described class. That said, it could be done with new term like e.g. Inherit, as then you can have type[Inherit[*bases]] for the class and Inherit[*bases] for the instance.

If all the motivating cases can be covered by option 6 and this term perhaps this is worth pursuing as the two PEPs.

@mikeshardmind
Copy link
Collaborator

I think I'd prefer object[*bases] and type[*bases] if we go that route, but that's just me recording a thought on names now before I lose an inspired thought.

If all the motivating cases can be covered by option 6 and this term perhaps this is worth pursuing as the two PEPs.

Kindof why I think it's too early to go to "lets just put a couple peps forward". I don't think we've explored enough options. It didn't take me long to look at where I thought the current options were lacking and the objections raised so far to iterate on option 5 to arrive at option 6. We seem to, myself included, have come into this discussion with preconceived notions of what should happen, but really as long as our rules make sense, serve the purpose we set out to accomplish, and are consistent, we have no obligation to stick with the preconceived notions we have around this. I think it's largely our job here to explore the potentials where there are downsides to see what we can do, if there are things we can't solve with a single construct, and if the path we pick still allows those things to be solved by something else or if it cuts those off.

I'd want to see if other people can iterate on what they feel is missing (if anything) as well. Maybe we reach something that can be presented as a single pep with a unified voice, as well as have a path to supporting the things we can't with just this.

@mark-todd
Copy link
Collaborator Author

I think I'd prefer object[*bases] and type[*bases] if we go that route, but that's just me recording a thought on names now before I lose an inspired thought.

Yeah we should perhaps move this to a different thread, but at least we're in agreement about what behaviour this would have.

Kindof why I think it's too early to go to "lets just put a couple peps forward". I don't think we've explored enough options. It didn't take me long to look at where I thought the current options were lacking and the objections raised so far to iterate on option 5 to arrive at option 6.

I think migrating to a more example based look at things is perhaps a good way to go. I'm not sure we can discount examples for being too "toy" - often reduced examples like this are just difficult to express in practical cases, but that doesn't mean the practical cases won't occur.

I've been working some more on the intersect simulator, my hope is if we can get it to a point that we can run all the examples for each option we can see the pros and cons of each.

We seem to, myself included, have come into this discussion with preconceived notions of what should happen, but really as long as our rules make sense, serve the purpose we set out to accomplish, and are consistent, we have no obligation to stick with the preconceived notions we have around this.

Yeah this is true - I suppose it was my impression that the notions of expected behaviour had reached an impasse, that would be impossible to reconcile because each person prefers a different use case, or perhaps because one solution is more practically satisfying while the other is more theoretically satisfying.

@gvanrossum
Copy link

I don't know if the two PEPs idea is still in the running, and I am just speaking for myself here, but I would advise against this. Making the SC debate this and expecting them to come up with the right answer is unreasonable, given that the group here hasn't been able to get to an agreement -- you might as well flip a coin. Worse, the SC might reject both PEPs. Substitute Typing Council for Steering Council (the SC would likely ask the TC for its opinion anyways) and the same possibilities exist. I strongly advise everyone here to consider what they would rather have -- no intersection or the option that they dislike. And take it from there.

@mark-todd
Copy link
Collaborator Author

I don't know if the two PEPs idea is still in the running, and I am just speaking for myself here, but I would advise against this. Making the SC debate this and expecting them to come up with the right answer is unreasonable, given that the group here hasn't been able to get to an agreement -- you might as well flip a coin. Worse, the SC might reject both PEPs. Substitute Typing Council for Steering Council (the SC would likely ask the TC for its opinion anyways) and the same possibilities exist. I strongly advise everyone here to consider what they would rather have -- no intersection or the option that they dislike. And take it from there.

@gvanrossum For me the option I dislike is definitely my preference over no intersection, but I'm not sure everyone feels the same way. Maybe you're right that the two or three PEPs idea just leads to the Typing Council or Steering Council endlessly debating, I suppose my hope was that in forming the PEP both ideas would gain clarity.

I've been thinking a bit about an alternate approach. As @mikeshardmind has pointed out, it's possible that neither options 4 or 5 present that best interpretation of Any & T. As I was working more on the intersect simulator to add option 6 as well, I discovered that option affects perhaps even a different aspect of the problem to the 4 vs 5 debate. I propose the following:

Stage 1: Identify intersection variations

Perhaps it's worth taking a step back, and instead of thinking about how we can narrow down our available options, go for the opposite. We try and identify as many ways that Any & T could be interpreted as possible. In this stage it doesn't matter if there are pro cases or counter cases to these variations, the point is simply to say "this is another way to interpret the intersection".

Stage 2: Produce pro and counter cases for our favourite combinations of intersection properties.

At this point we try and develop distinct for and against examples in the simulator for each set of properties. These examples can be toy or real - the point of this stage is simply to show that this combination of properties, in this example produces either a useful or not useful result. We can agree on what the result would be because the simulator should produce this result. If the simulator doesn't produce the expected result for the combination of properties, that's a bug in the simulator.

Stage 3: Elimination round

My hope is that by this point there may emerge an option that has few or no counter cases, but if not, I guess we could go tournament style. The point of this stage is to end up with only one option remaining that everyone is satisfied with.

I'd want to see if other people can iterate on what they feel is missing (if anything) as well. Maybe we reach something that can be presented as a single pep with a unified voice, as well as have a path to supporting the things we can't with just this.

Going back to what @mikeshardmind said, I agree with this, but I think it would be useful to try and structure our process a bit more (hence the above). I'm trying to avoid falling into circular debate, and concentrate more on developing clear options with clear examples.

@NeilGirdhar
Copy link
Collaborator

NeilGirdhar commented Jan 10, 2024

Making the SC debate this and expecting them to come up with the right answer is unreasonable, given that the group here hasn't been able to get to an agreement -- you might as well flip a coin. Worse, the SC might reject both PEPs. Substitute Typing Council for Steering Council (the SC would likely ask the TC for its opinion anyways) and the same possibilities exist.

You may be absolutely right, however at least two people from the typing council have been commenting on all of these issues. If we were to write a PEP, I'm pretty sure we would copy and paste their own arguments. Also, from what I've seen, both the steering council and the typing council have made excellent and well-reasoned decisions. I have a tremendous amount of respect and faith in their decision-making process.

I strongly advise everyone here to consider what they would rather have -- no intersection or the option that they dislike. And take it from there.

While I would love to find consensus, it just seems unlikely at this point.

First of all, in the last few comments Mike is saying that " If [proposal 4] were accepted, I'd probably give up on trying to improve typing in python". I think it's fair to say he's not willing to compromise. Compromises I'd like to consider include:

  • adding an optional warning when T & Any has properties accessed in a way that violates proposal 5—this would essentially mean that users would optionally have proposal 5 at the discretion of type checker authors,
  • adding an additional operator SafeIntersection[T, U], and
  • adding additional constructs to declare that certain interfaces can't be broadened.

Second, if you go back to the very start of #1, there remain fundamental misapprehensions that motivate options 1, 3, 5, and 6. Namely,

  • the idea that T & U is narrower than T, and
  • the rejection of the "any-substitution principle".

The first idea seems correct at first, but the issue is that a "type" as far as a type checker is concerned has to encode both the valid subtypes and the valid interface (OP(x)). And therefore unions T | U produce subtypes that are broader than T and U and interfaces that are narrower than T and U; and intersections T & U produce subtypes that are narrower than T and U and interfaces that are broader than T and U. This is the central motivation for all of the alternative options. It is an attempt to have the narrowing behavior for intersection interfaces. But this is literally the opposite of what intersections normally do!

The "any-substitution principle" is that a type expression involving a specified type T should not create any errors when T is replaced with Any. All of the special-casing proposals presented so far break this. And it's a huge problem if you have ever written any annotations for a project. I have many, many type annotations pull requests. I can say from experience that it's a fairly regular occurrence that things get swapped to Any fairly often. And the idea that has been repeatedly suggested (just find concrete types) is not realistic. Swapping to Any should not create any type errors. That will just create a lot of work, and delay code reviews, for what should remain a simple task.

While I recognize that there is room for making things practical rather than pure, I think breaking these kind of basic rules is going to lead to more problems as typing evolves. For example, if higher-kinded types are added, there may be many more synthetic types. And with synthetic types, we may see synthetic intersections deep in synthesized core where there may not be an opportunity to even stifle type errors (or they may produce very convoluted errors).

Finally, there have been some absolutely disrespectful comments, and I don't think that it makes sense to work with people that insist on that kind of behavior.

So, my preference is to simply write a PEP, and if the others want to write one or two more PEPs, I think they should do so and I'm happy to wait a few months so that we can submit them together. If anything, maybe this exercise will produce the clarity that's been missing.

@mark-todd
Copy link
Collaborator Author

mark-todd commented Jan 10, 2024

@NeilGirdhar

the idea that T & U is narrower than T, and

the rejection of the "any-substitution principle".

I think I see what you're getting at with these two points, but I'm not entirely convinced the first one is applicable to Any (I'll invert them for discussion)

  • T & U is wider than T

I see how if T is Any, this becomes Any & U is wider than Any, which doesn't make sense. It's been discussed that Any is the top and bottom type, which I think would mean Any & U is both wider and narrower than Any. I'm not sure I'm convinced this tells us much in the Any case.

  • any-substitution principle

This one I buy much more. If you replace a more specified type with Any, it shouldn't create errors - that would be very strange indeed! But I'm not convinced that option 5 is in violation of this. If T & U -> Any & U, all the properties of T just become Any, while the properties of U remain - I don't see any issues here.

@NeilGirdhar
Copy link
Collaborator

NeilGirdhar commented Jan 10, 2024

I see how if T is Any, this becomes Any & U is wider than Any, which doesn't make sense. It's been discussed that Any is the top and bottom type, which I think would mean Any & U is both wider and narrower than Any. I'm not sure I'm convinced this tells us much in the Any case.

Sorry, I was just speaking loosely to give a feeling for the general principle. For the exact rules, please see Eric's comment (you'll have to search for it). If I remember correctly, (T & Any).x is T.x & Any if T has x, and Any otherwise. The key point being that it's wider than T.x.

But I'm not convinced that option 5 is in violation of this. If T & U -> Any & U, all the properties of T just become Any, while the properties of U remain - I don't see any issues here.

The problem is that if T and U contain x, then in the first case you have (T & U).x = T.x & U.x whereas in the second case you have—according to option 5—(T & Any).x = T.x. If U.x is:

  • Any
  • a function with more overloads than T.x, etc.

then, the substitution causes type errors.

If I end up writing up a PEP, I'll add this principle as a general principle so that future typing proposals can make use of it.

@mark-todd
Copy link
Collaborator Author

Sorry, I was just speaking loosely to give a feeling for the general principle. For the exact rules, please see Eric's comment (you'll have to search for it). If I remember correctly, (T & Any).x is T.x & Any if T has x, and Any otherwise. The key point being that it's wider than T.x.

I think I found it again: #1 (comment)
So these rules come out of the symmetry with union? I follow your reasoning here, but I think there's still an open question about what T.x & Any might be interpreted as. You're right that this isn't what option 5 subscribes to but it might be there exists another interpretation of how this type gets used.

The problem is that if T and U contain x, then in the first case you have (T & U).x = T.x & U.x whereas in the second case you have—according to option 5—(T & Any).x = T.x.

Point taken on the substitution part as well. If we want to adhere to these principles option 5 would not work.

Let's assume for a moment that we go with (T & Any).x = T.x & Any (currently as presented by option 4 I believe). What does the type checker do with T.x & Any? I'm not sure I have an intuition about how this type would behave, which suggests that maybe there's some wiggle room here. Is it possible there exists an interpretation of this that has all the practicality of option 5?

@mark-todd
Copy link
Collaborator Author

mark-todd commented Jan 10, 2024

To delve into the more practical, I'll put this thought into an example:

class T:
    x : int

inst: T & Any = T()

z = inst.x # So this is now type int & Any
z += 1 # Definately valid
z = z + 'test' # Now here we get the controversial bit. 

My hunch is that (by some means) the final line should be invalid, but here's the problem. If we say it's valid, how is this any different from z just being Any? Similarly if we say it's invalid, how is this any different from just saying it's type int? For int & Any to be a type in it's own right, surely it must behave differently to both Any and int?

@NeilGirdhar
Copy link
Collaborator

Yes, that's the comment I had in mind.

z = inst.x # So this is now type int & Any

Yes, I think so.

z = z + 'test' # Now here we get the controversial bit.
Right, that's technically okay since you could do

class T:
    x : int
class C(int):
    def __add__(self, other: Any) -> C: ...
class U(T):
    x: C
inst: T & Any = U()
z = inst.x # So this is now type int & Any
z += 1 # Definately valid
z = z + 'test' # No error. 

how is this any different from z just being Any?

If I understand correctly, you can't do anything with int & Any. For example, z + 'test' must return an int or subtype thereof.

For int & Any to be a type in it's own right, surely it must behave differently to both Any and int?

I think it follows the rules that Eric gave recursively.

@mark-todd
Copy link
Collaborator Author

mark-todd commented Jan 10, 2024

If I understand correctly, you can't do anything with int & Any. For example, z + 'test' must return an int or subtype thereof.

If it must return an int, how is it any different from an int?

I think it follows the rules that Eric gave recursively.

I've had a follow up thought - Any | T currently I believe behaves exactly the same as Any, so maybe if Any & T behaves exactly the same as Any in practice that's not the end of the world either.

EDIT: Tested and Any | T does not behave the same - attributes must be present on all objects in the union.

@mark-todd
Copy link
Collaborator Author

(A & B).x:
Invalid if x is missing from both A and B
A.x if x is present on A but not B
B.x if x is present on B but not A
(A.x & B.x) if x is present on A and B

Eric's rules here are very compelling. For cases outside of Any, I think these aren't even really controversial. Replacing A with Any, is not much of a jump, and that leads us to Any & B.x. On the flipside though I do see how this is frustrating, because it's kind of catering for a case that rarely comes up.

I'm also drawn back to my previous post: #31 (comment)

In the case where T has another type for foo, this becomes a problem. But it has been noted this would create an LSP violation. If we assume that A and B must have the same type for property x these rules can become a little bit different.

(A & B).x:
Invalid if x is missing from both A and B
A.x/B.x if x is present on A or on B (inclusive or)
Invalid if x is present on A and on B and the types are different.

Are all the cases where option 5 fails the any-substitution principle LSP violations?

@mark-todd
Copy link
Collaborator Author

mark-todd commented Jan 10, 2024

Ok slightly radical idea: Intersections between two classes where the types are different on two of their attributes & methods are banned on access (as in option 6, but expanded to methods). I appreciate this might go further than LSP but it has some advantages:

  • mixing two classes together always produces the intersection (may not be a requirement, but it's handy!)
  • (Any & T).x must condense to T.x because for whatever Any is for it to be valid in an intersection it must subscribe to this rule. If x is not present on T, the type must be Any.
  • Callable intersections go away (no longer valid)
  • By banning on access we allow classes to be merged that have attributes that clash, so long as they aren't used
  • It works with the any-substitution principle
  • It has a symmetry with union, based on the following set of rules:
(A & B).x:
Invalid if x is missing from both A and B
A.x/B.x if x is present on A or on B (inclusive or)
Invalid if x is present on A and on B and the types are different.

The only real disadvantage I see is we lose the utility of combining methods of the same name into being overloads, but in comparison to all of the advantages this feels like a small loss. Thoughts?

PS: I may have mentioned this before, but if so I wasn't aware of all it's advantages.

EDIT: I also vaguely remember when we discussed something similar before, it was concluded that would take too much runtime cost to evaluate both classes and determine if they're compatible. Now it's on access however, that means we just have to compare the relevant attribute.

@erictraut
Copy link
Collaborator

Intersections between two classes where the types are different on two of their attributes & methods are banned on access

That's an interesting idea. That restriction makes many of the problems fall away.

Question to others on the thread: Does that restriction eliminate any important use cases that you've identified for intersections? It would preclude my BookMovie example above, but that was a toy example intended to demonstrate an inconsistency, not a real-world example that I think is important to support.

@DiscordLiz said "I would not want to use an intersection with non-disjoint types. It generally doesn't make sense to.". This proposal effectively takes advantage of this observation and codifies it into the rules.

The only real disadvantage I see is we lose the utility of combining methods of the same name into being overloads

I see this as more of an advantage than a disadvantage. Combining methods of the same name into overloads is problematic because overloads are ordered and intersections are not. Overloads also have a bunch of other complicated rules associated with them that don't apply to intersections. This proposal eliminates this problem entirely.

One issue that would need to be tightened up is the specification of what it means for two types to be "different". We've talked in the past about "bidirectionally compatible types" (type A is compatible with B and vice versa), but I'm not sure what it means for two types to be "different". Any is bidirectionally compatible with int; are these two types "different"? Is Literal[True, False] "different from" bool? What about two structural types with different names but the same definition? Would it make sense to replace "types are different" with "types are not bidirectionally compatible"?

Another issue to consider is related to method access and binding. If classes A and B both define def method(self) -> Self, how would (A & B).method work? Instance and class methods are bound and partially specialized at access time, so the type of A().method is () -> A and the type of B().method is () -> B. Does that mean the two are "different types" according to the proposed rule above? Or perhaps binding & partial specialization should be done with the intersection type, which would produce () -> A & B in both cases. (Note: Method binding and partial specialization is currently under-specified in the typing spec. It's one of the areas I hope to firm up in the upcoming months.)

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Jan 10, 2024

@NeilGirdhar

While I would love to find consensus, it just seems unlikely at this point.

First of all, in the last few comments Mike is saying that " If [proposal 4] were accepted, I'd probably give up on trying to improve typing in python". I think it's fair to say he's not willing to compromise. Compromises I'd like to consider include:

adding an optional warning when T & Any has properties accessed in a way that violates proposal 5—this would essentially mean that users would optionally have proposal 5 at the discretion of type checker authors,
adding an additional operator SafeIntersection[T, U], and
adding additional constructs to declare that certain interfaces can't be broadened.

Some of those compromises are things I've already said I'm fine with (Well, not entirely as stated, but close), but re-raised a question about something where someone else said it would be a problem or other issues with those compromises. See here

I'm very much still wanting to work towards something that can work for everyone, but in unequivocal terms, if option 4 is accepted as-is, I'm personally done with python typing. There are enough issues with it for my real world use that this would kill typing usability for me long term as this started to be used in that fashion, and I've already explained how those would manifest.

The fact that I am opposed to one option, but still working towards finding others that meet multiple groups needs is not an unwillingness to find compromise at all, but an unwillingness to accept a solution which has major flaws for my own use when there is more to explore.

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Jan 10, 2024

@mark-todd

Ok slightly radical idea: Intersections between two classes where the types are different on two of their attributes & methods are banned on access (as in option 6, but expanded to methods). I appreciate this might go further than LSP but it has some advantages:

I was viewing option 6 as also banning methods, properties, and functions with diferring types on access as well when I wrote it. The primary reason for allowing overlap at all is the unavoidable overlaps of things inherited from object, but the rule "just working" without special casing those things was good.

@erictraut

One issue that would need to be tightened up is the specification of what it means for two types to be "different". We've talked in the past about "bidirectionally compatible types" (type A is compatible with B and vice versa), but I'm not sure what it means for two types to be "different". Any is bidirectionally compatible with int; are these two types "different"? Is Literal[True, False] "different from" bool? What about two structural types with different names but the same definition? Would it make sense to replace "types are different" with "types are not bidirectionally compatible"?

bidierectionally compatible is almost fine for what I was considering when proposing that (See below, because I think we do need to be careful in wording this), I can come up with more formal definitions as needed if this approach is viable for others here.

Another issue to consider is related to method access and binding. If classes A and B both define def method(self) -> Self, how would (A & B).method work? Instance and class methods are bound and partially specialized at access time, so the type of A().method is () -> A and the type of B().method is () -> B. Does that mean the two are "different types" according to the proposed rule above? Or perhaps binding & partial specialization should be done with the intersection type, which would produce () -> A & B in both cases. (Note: Method binding and partial specialization is currently under-specified in the typing spec. It's one of the areas I hope to firm up in the upcoming months.)

methods should be fine here with how I was considering the option when I proposed it, the type of (A & B).method would bind to (A & B), We can use type[T] / Self here to improve reasoning about this when considering unbound methods prior to binding. This may sneak a small bit of special casing into implementations that don't understand method binding or descriptors though, and is definitely worth making sure we spell out to avoid diverging implementations.

@DiscordLiz
Copy link
Collaborator

DiscordLiz commented Jan 10, 2024

@mark-todd

Callable intersections go away (no longer valid)

They only go away partially. Please do not ban callables from intersections entirely, we have a stated use case in this repo that still works with this option.

Callable[[int], T] & type[T] where type[T] is callable with a single int plugs a type hole that exists to make subclassing easier, but LSP unsafe with __init__

@DiscordLiz
Copy link
Collaborator

@NeilGirdhar

Finally, there have been some absolutely disrespectful comments, and I don't think that it makes sense to work with people that insist on that kind of behavior.

You just accused another collaborator of not being willing to compromise in the same post when they are here still working on a compromise. You are continuing to not read or otherwise ignore other people's contributions to this thread, and even so, I'd rather you keep providing your input here if you have something valuable to say. I don't think competing PEPs are the answer. We can make the PEP stronger by finding what issues people have with it, and that includes any issues you have with it.

@DiscordLiz
Copy link
Collaborator

@erictraut

Question to others on the thread: Does that restriction eliminate any important use cases that you've identified for intersections? It would preclude my BookMovie example #31 (comment), but that was a toy example intended to demonstrate an inconsistency, not a real-world example that I think is important to support.

Sort of? I said an intersection would only make sense to me when I care about disjoint types, and that's true, but there are cases for extending base functionality beyond what this would support. However, the thing that @mark-todd and @mikeshardmind went back and forth on in discord and here with type and object or Inherit allowing multiple type parameters in an ordered fashion can address this case without intersections needing to be the tool for this, so I think it's okay to directly state this in a "why this behavior" wasn't included section for the pep since we have a potential path for that behavior in a different mechanism that can be worked on in the future without intersections blocking it.

@mark-todd
Copy link
Collaborator Author

@erictraut

I see this as more of an advantage than a disadvantage. Combining methods of the same name into overloads is problematic because overloads are ordered and intersections are not. Overloads also have a bunch of other complicated rules associated with them that don't apply to intersections. This proposal eliminates this problem entirely.

That's a good point - I hadn't considered it, but you're right it saves us from that issue.

One issue that would need to be tightened up is the specification of what it means for two types to be "different". We've talked in the past about "bidirectionally compatible types" (type A is compatible with B and vice versa), but I'm not sure what it means for two types to be "different". Any is bidirectionally compatible with int; are these two types "different"? Is Literal[True, False] "different from" bool? What about two structural types with different names but the same definition? Would it make sense to replace "types are different" with "types are not bidirectionally compatible"?

One property of the above description is I was trying to make A.x and B.x effectively interchangeable. If we consider Literal[True, False] to be identical to bool, it might matter which from A and B gets chosen.

Another issue to consider is related to method access and binding. If classes A and B both define def method(self) -> Self, how would (A & B).method work? Instance and class methods are bound and partially specialized at access time, so the type of A().method is () -> A and the type of B().method is () -> B. Does that mean the two are "different types" according to the proposed rule above? Or perhaps binding & partial specialization should be done with the intersection type, which would produce () -> A & B in both cases. (Note: Method binding and partial specialization is currently under-specified in the typing spec. It's one of the areas I hope to firm up in the upcoming months.)

Interesting edge case! I think these would be compatible, returning the new type (A & B) as Self, like you describe. If we found down the road though that this has to be banned I don't think that's the end of the world.

@mikeshardmind

I was viewing option 6 as also banning methods, properties, and functions with deferring types on access as well when I wrote it. The primary reason for allowing overlap at all is the unavoidable overlaps of things inherited from object, but the rule "just working" without special casing those things was good.

Oh ok, great! Apologies, I didn't realise this

@DiscordLiz

They only go away partially. Please do not ban callables from intersections entirely, we have a stated use case in this repo that still works with this option.

Yeah I was thinking about the case of combining two Callables - I see that so long as the other item in the intersection doesn't have a call method that should be find.

Sort of? I said an intersection would only make sense to me when I care about disjoint types, and that's true, but there are cases for extending base functionality beyond what this would support. However, the thing that @mark-todd and @mikeshardmind went back and forth on in discord and here with type and object or Inherit allowing multiple type parameters in an ordered fashion can address this case without intersections needing to be the tool for this, so I think it's okay to directly state this in a "why this behavior" wasn't included section for the pep since we have a potential path for that behavior in a different mechanism that can be worked on in the future without intersections blocking it.

Yeah I think separating that behaviour is a good plan - there's certainly a use case for it but I think it can go into a different PEP.

@CarliJoy
Copy link
Owner

The Issue got too long again. Please continue discussing in #37.

Please open new Issues if you start discussing on new sub aspects.

Repository owner locked as too heated and limited conversation to collaborators Jan 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants