Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not add the same interface multiple times to the MRO, round 2 #1

Closed
wants to merge 1 commit into from

Conversation

DMRobertson
Copy link
Owner

@DMRobertson DMRobertson commented Sep 27, 2022

For your consideration: a patch which I believe fixes Shoobx#76. This tackles the same kind of problem that Shoobx#41 addresses. Seems to be a regression introduced in Shoobx#75 as part of supporting mypy 0.970.

This turned out to be a bit of a rabbit hole. I've tried to thoroughly reproduce my working here because I really am not confident in my understanding of mypy and mypy-zope. Apologies for the verbosity! (I was very keen to nail this one down.)

Reproduction

I reproduced the problem as reported in Shoobx#76:

# Setup: in a blank venv
pip install -e path/to/mypy/zope mypy==0.981 twisted[tls]==22.4.0

cat >good.py <<EOF
from twisted.web.client import HTTPClientFactory
from twisted.web.client import HTTPPageGetter

class Y(HTTPPageGetter):
    pass
EOF

cat >bad.py <<EOF
from twisted.web.client import HTTPClientFactory
from twisted.web.client import HTTPPageGetter

class Y(HTTPPageGetter, object):
    pass
EOF

# Test:
rm -r .mypy_cache/3.10/twisted/

mypy good.py
# Success: no issues found in 1 source file

mypy bad.py
# bad.py:4: error: Cannot determine consistent method resolution order (MRO) for "Y"
# Found 1 error in 1 file (checked 1 source file)

Investigation

To attempt to understand what was going on, I added a bunch of debug print statements to mypy and mypy-zope until I reached enlightenment.
I don't fully grok mypy's machinery, but from what I can tell:

  • During semantic analysis, when mypy encounters a class with multiple base classes, it computes the child class's MRO. 1, 2
  • In order to do so, it calls a function named linearize_hierarchy.

When I run mypy bad.py I see that linear_hierarchy is called for HTTPPageGetter with the following TypeInfo:

TypeInfo(
  Name(twisted.web.client.HTTPPageGetter)
  Bases(twisted.web.http.HTTPClient)
  Mro(twisted.web.client.HTTPPageGetter, twisted.web.http.HTTPClient, twisted.protocols.basic.LineReceiver, twisted.internet.protocol.Protocol, twisted.internet.protocol.BaseProtocol, twisted.protocols.basic._PauseableMixin, builtins.object, twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext, twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext)
  Names(
    _completelyDone (builtins.bool)
    _specialHeaders (builtins.set[builtins.bytes])
    connectionLost
    connectionMade
    failed (builtins.int)
    followRedirect (builtins.bool)
    handleEndHeaders
    handleHeader
    handleResponse
    handleStatus
    handleStatusDefault
    handleStatus_200
    handleStatus_201 (def (self: Any) -> Any)
    handleStatus_202 (def (self: Any) -> Any)
    handleStatus_301
    handleStatus_302
    handleStatus_303
    headers (Any)
    message (Any)
    quietLoss (builtins.int)
    status (Any)
    timeout
    version (Any)))

In particular the MRO contains two copies of IProtocol and ILoggingContext, which doesn't seem right (c.f. Shoobx#41). As I understand it, mypy bad.py is loading the TypeInfo for HttpPageGetter that we persisted at the end of mypy good.py. So what was going on when we ran mypy the first time?

I will try to summarise a shedload of debug information. At some point mypy computes the MRO for twisted.internet.protocol.Protocol. The argument to linearize_hierarchy is

TypeInfo(
    Name(twisted.internet.protocol.Protocol)
    Bases(twisted.internet.protocol.BaseProtocol)
    Mro()
    Names())

Later, mypy computes the MRO for a subclass twisted.protocols.policies.ProtocolWrapper. Linearize hierarchy receives

TypeInfo(
    Name(twisted.protocols.policies.ProtocolWrapper)
    Bases(twisted.internet.protocol.Protocol)
    Mro()
    Names())

and recursively calls itself with the twisted.internet.protocol.Protocol TypeInfo. This time the MRO and Names have already been computed!


TypeInfo(
    Name(twisted.internet.protocol.Protocol)
    Bases(twisted.internet.protocol.BaseProtocol)
    Mro(twisted.internet.protocol.Protocol, twisted.internet.protocol.BaseProtocol, builtins.object, twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext)
    Names(
      connectionLost
      dataReceived
      factory (Union[twisted.internet.protocol.Factory, None])
      logPrefix))

The MRO looks sensible there.

By the time mypy comes to compute the MRO for twisted.spread.banana.Banana (yeah, no idea there) we again lookup the MRO for twisted.internet.protocol.Protocol. But the MRO now has duplicate interfaces.

TypeInfo(
    Name(twisted.spread.banana.Banana)
    Bases(twisted.internet.protocol.Protocol, twisted.persisted.styles.Ephemeral)
    Mro()
    Names())
TypeInfo(
   Name(twisted.internet.protocol.Protocol)
    Bases(twisted.internet.protocol.BaseProtocol)
    Mro(twisted.internet.protocol.Protocol, twisted.internet.protocol.BaseProtocol, builtins.object, twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext, twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext)

Explanation

The following dirty patch adds useful logging.

Dirty patch
diff --git a/src/mypy_zope/plugin.py b/src/mypy_zope/plugin.py
index 312c0b5..17fa127 100644
--- a/src/mypy_zope/plugin.py
+++ b/src/mypy_zope/plugin.py
@@ -364,7 +364,7 @@ def analyze_subinterface(classdef_ctx: ClassDefContext) -> None:
     def get_customize_class_mro_hook(
         self, fullname: str
     ) -> Optional[Callable[[ClassDefContext], None]]:
-        # print(f"get_customize_class_mro_hook: {fullname}")
+        print(f"get_customize_class_mro_hook: {fullname}")
 
         def analyze_interface_base(classdef_ctx: ClassDefContext) -> None:
             # Create fake constructor to mimic adaptation signature
@@ -406,6 +406,9 @@ def analyze(classdef_ctx: ClassDefContext) -> None:
             if directiface or subinterface:
                 self._analyze_zope_interface(classdef_ctx.api, classdef_ctx.cls)
 
+            self.log(f"get_customize_class_mro_hook ending for {fullname}")
+            self.log(info.mro)
+
         if fullname == "zope.interface.interface.Interface":
             return analyze_interface_base
 
@@ -695,10 +698,19 @@ def _apply_interface(self, impl: TypeInfo, iface: TypeInfo) -> None:
         # there is a decorator for the class that will create a "type promotion",
         # but ensure this only gets applied a single time per interface.
         promote = Instance(iface, [])
+        if impl.fullname == "twisted.internet.protocol.Protocol":
+            for ti in impl.mro:
+                self.log(f"DMR HMM: {ti._promote}, {promote}, {ti.fullname}")
+                self.log(f"DMR HMM: {type(ti._promote)}, {type(promote)}, {ti.fullname}")
         if not any(ti._promote == promote for ti in impl.mro):
             faketi = TypeInfo(SymbolTable(), iface.defn, iface.module_name)
             faketi._promote = [promote]
+            if impl.fullname=="twisted.internet.protocol.Protocol":
+                self.log(f"============== APPENDING {iface.defn.fullname} =====")
             impl.mro.append(faketi)
+        self.log(f"mro of {impl.fullname}@{hex(id(impl))} ({hex(id(impl.mro))}) is now:")
+        self.log(impl.mro)
+
 
 
 def plugin(version: str) -> PyType[Plugin]:

We can then see where mypy-zope is adding the duplicate interfaces.

  ZOPE: get_customize_class_mro_hook ending for twisted.internet.protocol.BaseProtocol
  ZOPE: [<TypeInfo twisted.internet.protocol.BaseProtocol>, <TypeInfo builtins.object>]
  ZOPE: get_customize_class_mro_hook ending for twisted.internet.protocol.Protocol
  ZOPE: [<TypeInfo twisted.internet.protocol.Protocol>, <TypeInfo twisted.internet.protocol.BaseProtocol>, <TypeInfo builtins.object>, <TypeInfo twisted.internet.interfaces.IProtocol>, <TypeInfo twisted.internet.interfaces.ILoggingContext>]
  ZOPE: Found implementation of twisted.internet.interfaces.IProtocol: twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.IProtocol, twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.IProtocol, twisted.internet.protocol.BaseProtocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.protocol.BaseProtocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.IProtocol, builtins.object
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, builtins.object
* ZOPE: DMR HMM: [twisted.internet.interfaces.IProtocol], twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.IProtocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.interfaces.IProtocol
* ZOPE: DMR HMM: [twisted.internet.interfaces.ILoggingContext], twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.interfaces.ILoggingContext
  ZOPE: ============== APPENDING twisted.internet.interfaces.IProtocol =====
  ZOPE: mro of twisted.internet.protocol.Protocol@0x7f7a62f0a6c0 (0x7f7a66731640) is now:
  ZOPE: [<TypeInfo twisted.internet.protocol.Protocol>, <TypeInfo twisted.internet.protocol.BaseProtocol>, <TypeInfo builtins.object>, <TypeInfo twisted.internet.interfaces.IProtocol>, <TypeInfo twisted.internet.interfaces.ILoggingContext>, <TypeInfo twisted.internet.interfaces.IProt  ocol>]
  ZOPE: Found implementation of twisted.internet.interfaces.ILoggingContext: twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.ILoggingContext, twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.ILoggingContext, twisted.internet.protocol.BaseProtocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.protocol.BaseProtocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.ILoggingContext, builtins.object
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, builtins.object
* ZOPE: DMR HMM: [twisted.internet.interfaces.IProtocol], twisted.internet.interfaces.ILoggingContext, twisted.internet.interfaces.IProtocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.interfaces.IProtocol
* ZOPE: DMR HMM: [twisted.internet.interfaces.ILoggingContext], twisted.internet.interfaces.ILoggingContext, twisted.internet.interfaces.ILoggingContext
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.interfaces.ILoggingContext
* ZOPE: DMR HMM: [twisted.internet.interfaces.IProtocol], twisted.internet.interfaces.ILoggingContext, twisted.internet.interfaces.IProtocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.interfaces.IProtocol
  ZOPE: ============== APPENDING twisted.internet.interfaces.ILoggingContext =====

In particular, notice the line:

 ZOPE: DMR HMM: [twisted.internet.interfaces.IProtocol], twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.IProtocol

(which comes from self.log(f"DMR HMM: {ti._promote}, {promote}, {ti.fullname}") in the patch). The log lines shows we're comparing ti._promote = [Instance(IProtocol)] with promote Instance(IProtocol). These two things are not equal1 therefore the any(...) expression here

https://github.com/Shoobx/mypy-zope/blob/5139181a07febfde688af3035d00319942df4dcf/src/mypy_zope/plugin.py#L698

is always False.2 So we add a duplicate IProtocol to the end of the MRO.

A similar pattern proceeds for ILoggingContext.

Fix

Check if promote in ti._promote for any ti in impl.mro, now that ti._promote is a list.

I didn't notice a huge performance cost (rm -r .mypy_cache/3.10/twisted/; mypy good.py; mypy bad.py takes 2 seconds on my machine with the patch). I haven' tried this on a bigger project like matrix-org/synapse yet. (I'm happy to, but I doubt that anyone is implementing that many interfaces for this to be a problem.)

Regression

I think this was introduced in #75. Presumably mypy made _promote a list of types rather than just a single type?

Tests

Works For Me ™️ after applying the change:

 $ rm -r .mypy_cache/3.10/twisted/; mypy good.py; mypy bad.py
Success: no issues found in 1 source file
Success: no issues found in 1 source file

I wasn't sure how to add a test for this (apologies). I was uncertain about the following:

  • We need to run mypy on two files in a certain order. The current tests look to run each sample file in isolation(?)
  • I struggled to reproduce an example without pulling in Twisted as a dependency
  • We can't directly get at the MRO from mypy's output AFAIK. (No reveal_mro like there is reveal_type?)

Footnotes

  1. I even checked that Instance doesn't define a weird __req__.

  2. I'm slightly surprised that mypy --strict-equality doesn't catch this!

@DMRobertson DMRobertson changed the title Fix false-positives appearing when cache is in use Do not add the same interface multiple times to the MRO, round 2 Sep 27, 2022
@DMRobertson
Copy link
Owner Author

DMRobertson commented Sep 28, 2022

(I was using this as a draft to type out my Markdown epic.) See Shoobx#80

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant