Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not add the same interface multiple times to the MRO, round 2 #80

Merged
merged 2 commits into from Sep 30, 2022

Conversation

DMRobertson
Copy link
Contributor

@DMRobertson DMRobertson commented Sep 28, 2022

For your consideration: a patch which I believe fixes #76. This tackles the same kind of problem that #41 addresses. Seems to be a regression introduced in #75 as part of supporting mypy 0.970.

This turned out to be a bit of a rabbit hole. I've tried to thoroughly reproduce my working here because I really am not confident in my understanding of mypy and mypy-zope. Apologies for the verbosity! (I was very keen to nail this one down.)

Reproduction

I reproduced the problem as reported in #76:

# Setup: in a blank venv
pip install -e path/to/mypy/zope mypy==0.981 twisted[tls]==22.4.0

cat >good.py <<EOF
from twisted.web.client import HTTPClientFactory
from twisted.web.client import HTTPPageGetter

class Y(HTTPPageGetter):
    pass
EOF

cat >bad.py <<EOF
from twisted.web.client import HTTPClientFactory
from twisted.web.client import HTTPPageGetter

class Y(HTTPPageGetter, object):
    pass
EOF

# Test:
rm -r .mypy_cache/3.10/twisted/

mypy good.py
# Success: no issues found in 1 source file

mypy bad.py
# bad.py:4: error: Cannot determine consistent method resolution order (MRO) for "Y"
# Found 1 error in 1 file (checked 1 source file)

Investigation

To attempt to understand what was going on, I added a bunch of debug print statements to mypy and mypy-zope until I reached enlightenment.
I don't fully grok mypy's machinery, but from what I can tell:

  • During semantic analysis, when mypy encounters a class with multiple base classes, it computes the child class's MRO. 1, 2
  • In order to do so, it calls a function named linearize_hierarchy.

When I run mypy bad.py I see that linear_hierarchy is called for HTTPPageGetter with the following TypeInfo:

TypeInfo(
  Name(twisted.web.client.HTTPPageGetter)
  Bases(twisted.web.http.HTTPClient)
  Mro(twisted.web.client.HTTPPageGetter, twisted.web.http.HTTPClient, twisted.protocols.basic.LineReceiver, twisted.internet.protocol.Protocol, twisted.internet.protocol.BaseProtocol, twisted.protocols.basic._PauseableMixin, builtins.object, twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext, twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext)
  Names(
    _completelyDone (builtins.bool)
    _specialHeaders (builtins.set[builtins.bytes])
    connectionLost
    connectionMade
    failed (builtins.int)
    followRedirect (builtins.bool)
    handleEndHeaders
    handleHeader
    handleResponse
    handleStatus
    handleStatusDefault
    handleStatus_200
    handleStatus_201 (def (self: Any) -> Any)
    handleStatus_202 (def (self: Any) -> Any)
    handleStatus_301
    handleStatus_302
    handleStatus_303
    headers (Any)
    message (Any)
    quietLoss (builtins.int)
    status (Any)
    timeout
    version (Any)))

In particular the MRO contains two copies of IProtocol and ILoggingContext, which doesn't seem right (c.f. #41). As I understand it, mypy bad.py is loading the TypeInfo for HttpPageGetter that we persisted at the end of mypy good.py. So what was going on when we ran mypy the first time?

I will try to summarise a shedload of debug information. At some point mypy computes the MRO for twisted.internet.protocol.Protocol. The argument to linearize_hierarchy is

TypeInfo(
    Name(twisted.internet.protocol.Protocol)
    Bases(twisted.internet.protocol.BaseProtocol)
    Mro()
    Names())

Later, mypy computes the MRO for a subclass twisted.protocols.policies.ProtocolWrapper. Linearize hierarchy receives

TypeInfo(
    Name(twisted.protocols.policies.ProtocolWrapper)
    Bases(twisted.internet.protocol.Protocol)
    Mro()
    Names())

and recursively calls itself with the twisted.internet.protocol.Protocol TypeInfo. This time the MRO and Names have already been computed!


TypeInfo(
    Name(twisted.internet.protocol.Protocol)
    Bases(twisted.internet.protocol.BaseProtocol)
    Mro(twisted.internet.protocol.Protocol, twisted.internet.protocol.BaseProtocol, builtins.object, twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext)
    Names(
      connectionLost
      dataReceived
      factory (Union[twisted.internet.protocol.Factory, None])
      logPrefix))

The MRO looks sensible there.

By the time mypy comes to compute the MRO for twisted.spread.banana.Banana (yeah, no idea there) we again lookup the MRO for twisted.internet.protocol.Protocol. But the MRO now has duplicate interfaces.

TypeInfo(
    Name(twisted.spread.banana.Banana)
    Bases(twisted.internet.protocol.Protocol, twisted.persisted.styles.Ephemeral)
    Mro()
    Names())
TypeInfo(
   Name(twisted.internet.protocol.Protocol)
    Bases(twisted.internet.protocol.BaseProtocol)
    Mro(twisted.internet.protocol.Protocol, twisted.internet.protocol.BaseProtocol, builtins.object, twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext, twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext)

Explanation

The following dirty patch adds useful logging.

Dirty patch
diff --git a/src/mypy_zope/plugin.py b/src/mypy_zope/plugin.py
index 312c0b5..17fa127 100644
--- a/src/mypy_zope/plugin.py
+++ b/src/mypy_zope/plugin.py
@@ -364,7 +364,7 @@ def analyze_subinterface(classdef_ctx: ClassDefContext) -> None:
     def get_customize_class_mro_hook(
         self, fullname: str
     ) -> Optional[Callable[[ClassDefContext], None]]:
-        # print(f"get_customize_class_mro_hook: {fullname}")
+        print(f"get_customize_class_mro_hook: {fullname}")
 
         def analyze_interface_base(classdef_ctx: ClassDefContext) -> None:
             # Create fake constructor to mimic adaptation signature
@@ -406,6 +406,9 @@ def analyze(classdef_ctx: ClassDefContext) -> None:
             if directiface or subinterface:
                 self._analyze_zope_interface(classdef_ctx.api, classdef_ctx.cls)
 
+            self.log(f"get_customize_class_mro_hook ending for {fullname}")
+            self.log(info.mro)
+
         if fullname == "zope.interface.interface.Interface":
             return analyze_interface_base
 
@@ -695,10 +698,19 @@ def _apply_interface(self, impl: TypeInfo, iface: TypeInfo) -> None:
         # there is a decorator for the class that will create a "type promotion",
         # but ensure this only gets applied a single time per interface.
         promote = Instance(iface, [])
+        if impl.fullname == "twisted.internet.protocol.Protocol":
+            for ti in impl.mro:
+                self.log(f"DMR HMM: {ti._promote}, {promote}, {ti.fullname}")
+                self.log(f"DMR HMM: {type(ti._promote)}, {type(promote)}, {ti.fullname}")
         if not any(ti._promote == promote for ti in impl.mro):
             faketi = TypeInfo(SymbolTable(), iface.defn, iface.module_name)
             faketi._promote = [promote]
+            if impl.fullname=="twisted.internet.protocol.Protocol":
+                self.log(f"============== APPENDING {iface.defn.fullname} =====")
             impl.mro.append(faketi)
+        self.log(f"mro of {impl.fullname}@{hex(id(impl))} ({hex(id(impl.mro))}) is now:")
+        self.log(impl.mro)
+
 
 
 def plugin(version: str) -> PyType[Plugin]:

We can then see where mypy-zope is adding the duplicate interfaces.

  ZOPE: get_customize_class_mro_hook ending for twisted.internet.protocol.BaseProtocol
  ZOPE: [<TypeInfo twisted.internet.protocol.BaseProtocol>, <TypeInfo builtins.object>]
  ZOPE: get_customize_class_mro_hook ending for twisted.internet.protocol.Protocol
  ZOPE: [<TypeInfo twisted.internet.protocol.Protocol>, <TypeInfo twisted.internet.protocol.BaseProtocol>, <TypeInfo builtins.object>, <TypeInfo twisted.internet.interfaces.IProtocol>, <TypeInfo twisted.internet.interfaces.ILoggingContext>]
  ZOPE: Found implementation of twisted.internet.interfaces.IProtocol: twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.IProtocol, twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.IProtocol, twisted.internet.protocol.BaseProtocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.protocol.BaseProtocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.IProtocol, builtins.object
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, builtins.object
* ZOPE: DMR HMM: [twisted.internet.interfaces.IProtocol], twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.IProtocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.interfaces.IProtocol
* ZOPE: DMR HMM: [twisted.internet.interfaces.ILoggingContext], twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.interfaces.ILoggingContext
  ZOPE: ============== APPENDING twisted.internet.interfaces.IProtocol =====
  ZOPE: mro of twisted.internet.protocol.Protocol@0x7f7a62f0a6c0 (0x7f7a66731640) is now:
  ZOPE: [<TypeInfo twisted.internet.protocol.Protocol>, <TypeInfo twisted.internet.protocol.BaseProtocol>, <TypeInfo builtins.object>, <TypeInfo twisted.internet.interfaces.IProtocol>, <TypeInfo twisted.internet.interfaces.ILoggingContext>, <TypeInfo twisted.internet.interfaces.IProt  ocol>]
  ZOPE: Found implementation of twisted.internet.interfaces.ILoggingContext: twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.ILoggingContext, twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.ILoggingContext, twisted.internet.protocol.BaseProtocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.protocol.BaseProtocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.ILoggingContext, builtins.object
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, builtins.object
* ZOPE: DMR HMM: [twisted.internet.interfaces.IProtocol], twisted.internet.interfaces.ILoggingContext, twisted.internet.interfaces.IProtocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.interfaces.IProtocol
* ZOPE: DMR HMM: [twisted.internet.interfaces.ILoggingContext], twisted.internet.interfaces.ILoggingContext, twisted.internet.interfaces.ILoggingContext
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.interfaces.ILoggingContext
* ZOPE: DMR HMM: [twisted.internet.interfaces.IProtocol], twisted.internet.interfaces.ILoggingContext, twisted.internet.interfaces.IProtocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.interfaces.IProtocol
  ZOPE: ============== APPENDING twisted.internet.interfaces.ILoggingContext =====

In particular, notice the line:

 ZOPE: DMR HMM: [twisted.internet.interfaces.IProtocol], twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.IProtocol

(which comes from self.log(f"DMR HMM: {ti._promote}, {promote}, {ti.fullname}") in the patch). The log lines shows we're comparing ti._promote = [Instance(IProtocol)] with promote Instance(IProtocol). These two things are not equal1 therefore the any(...) expression here

if not any(ti._promote == promote for ti in impl.mro):

is always False.2 So we add a duplicate IProtocol to the end of the MRO.

A similar pattern proceeds for ILoggingContext.

Fix

Check if promote in ti._promote for any ti in impl.mro, now that ti._promote is a list.

I didn't notice a huge performance cost (rm -r .mypy_cache/3.10/twisted/; mypy good.py; mypy bad.py takes 2 seconds on my machine with the patch). I haven' tried this on a bigger project like matrix-org/synapse yet. (I'm happy to, but I doubt that anyone is implementing that many interfaces for this to be a problem.)

Regression

I think this was introduced in #75. Presumably mypy made _promote a list of types rather than just a single type?

Tests

Works For Me ™️ after applying the change:

 $ rm -r .mypy_cache/3.10/twisted/; mypy good.py; mypy bad.py
Success: no issues found in 1 source file
Success: no issues found in 1 source file

make tests also passes on my machine.

I wasn't sure how to add a test for this (apologies). I was uncertain about the following:

  • We need to run mypy on two files in a certain order. The current tests look to run each sample file in isolation(?)
  • I struggled to reproduce an example without pulling in Twisted as a dependency
  • We can't directly get at the MRO from mypy's output AFAIK. (No reveal_mro like there is reveal_type?)

Footnotes

  1. I even checked that Instance doesn't define a weird __req__.

  2. I'm slightly surprised that mypy --strict-equality doesn't catch this!

@DMRobertson
Copy link
Contributor Author

cc @clokep for interest!

@kedder
Copy link
Member

kedder commented Sep 28, 2022

@DMRobertson thanks for digging into it. That's a research well done! Given the complexity of the issue, I'd really like to have a regression test for it. I realize it is not easy though. Obviously we don't want to add twisted as a dependency for this.

Something I don't fully understand is why do we need to run mypy on two files to repro this. Is it somehow related to mypy cache?

@DMRobertson
Copy link
Contributor Author

Something I don't fully understand is why do we need to run mypy on two files to repro this. Is it somehow related to mypy cache?

Good question. I'm not completely sure what's going on: my current hypothesis is that we:

  • in good.py, compute the "right" MRO
  • use it for some parts mypy's semantic analysis of good.py
  • do not find any errors
  • during other parts of analysis, extend the MRO so that it is now "wrong"
  • persist the wrong MRO to cache
  • compute the wrong MRO is then used when analysing bad.py

That is: I think the cache exposes the bad MRO but doesn't cause it.

I'll see if I can confirm or deny that hypothesis.

(If it's possible to reproduce this with just one file then that'd make it much easier to test!)

@kedder
Copy link
Member

kedder commented Sep 28, 2022

Of course it wouldn't be that difficult to write a test that runs mypy two times on two files (just outside of that "check all files in that directory" mini-framework. The tricky thing here (I think) is to find that minimal test case.

@clokep
Copy link
Contributor

clokep commented Sep 28, 2022

Great work! 👍 I'm not sure why running it on two files is the issue, when I was previously testing #41 I could run it on the same file twice in a row to reproduce...

@DMRobertson
Copy link
Contributor Author

DMRobertson commented Sep 28, 2022

I'm not sure why running it on two files is the issue, when I was previously testing #41 I could run it on the same file twice in a row to reproduce...

I was building off the report in #76 here. But it seems like just bad.py is enough to illustrate the problem from a clean cache:

Edit: whoops, I didn't correctly clear my cache or was somehow otherwise confused. I'll investigate further.

@DMRobertson
Copy link
Contributor Author

Good news: I have a minimal example:

from zope.interface import implementer, Interface

class IProtocol(Interface):
    pass

class Factory:
    # It seems important for "Protocol" show show up as an attribute annotation to
    # trigger the bug(!?)
    protocol: "Protocol"

@implementer(IProtocol)
class Protocol:
    pass

To detect the bug, one can call result = mypy.build.build(...) with a single BuildSource representing this file. By inspecting result we can get our hands on a mypy TypeInfo object, which will let us inspect the MRO that mypy has created. This means we don't have to worry about caches, re-running mypy or investigating multiple files.

I hope to get a test with this together soon.

@DMRobertson DMRobertson marked this pull request as draft September 29, 2022 22:44
@DMRobertson
Copy link
Contributor Author

I hope to get a test with this together soon.

See f341c60. It's a bit rough-and-ready, but hopefully it's enough to serve as a proof of concept.

I'm not sure if this ought to be part of test_samples.py rather than in its own test file? There's a fair bit of duplication here. I'm happy to clean things up as you see fit. Let me know what you think!

@DMRobertson DMRobertson marked this pull request as ready for review September 29, 2022 23:39
@kedder kedder merged commit b9f6462 into Shoobx:master Sep 30, 2022
@kedder
Copy link
Member

kedder commented Sep 30, 2022

@DMRobertson looks great, thanks! A little duplication in a test is not a big deal, much more important that we have test and won't lose the fix. Will release a new version soon.

@DMRobertson
Copy link
Contributor Author

Fantastic---many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Caching errors with latest version of mypy
3 participants