-
Notifications
You must be signed in to change notification settings - Fork 224
Break Cluster-Topic refcycle #262
Conversation
This breaks a reference cycle between Cluster and Topic. As per the discussion in #257, we didn't want to make Topic._cluster a weakref, as this would break user code that retains a reference to the Topic only. Instead, we make TopicDict lazily instantiate Topics (which should have the added benefit of saving a bunch of space if a cluster has many topics and partitions), while storing only a weakref to it. This also moves Cluster._update_topics() into TopicDict, because it needs to know about the weakref implementation details: if a Topic hasn't been instantiated, it shouldn't try to update it. Signed-off-by: Yung-Chin Oei <yungchin@yungchin.nl>
Signed-off-by: Yung-Chin Oei <yungchin@yungchin.nl>
The grandparent commit (which made TopicDict carry weakrefs) introduced a bug, namely that topic auto-creation via __missing__() caused an AttributeError to be raised, because I forgot to update __missing__ for the new nature of the dict entries. There was no test coverage to catch this, so tests are added here too. To make those tests pass reliably, we needed to ensure that topic auto-creation works even on a fresh cluster. It turns out that with the updated version of Cluster._get_metadata (per the grandparent commit), that was just a one-line change - that is, as a side-effect, we've now also addressed #175: auto-creation works on a freshly bootstrapped kafka 0.8.2 cluster. Signed-off-by: Yung-Chin Oei <yungchin@yungchin.nl>
As noted in the last commit msg, this now also happens to deal with #175 - topic auto-creation is enabled even on a fresh, empty cluster (there's still no broker metadata, but it can fall back on the bootstrap brokers). |
I've spent a while trying to grok this, and I understand it satisfactorily. Even outside of the specific object-leak bugfix, lazily creating |
@emmett9001 thanks for checking this out! I was a bit spooked by the two test failures on travis just now: Have not seen these in local testing, but to save me some RAM I often run a two-node cluster (whereas we run a three-node cluster on travis), and I guess the smaller cluster is quicker to finish setting up a new topic. In the failing tests, I think what goes wrong is this: I'm not really sure what the best fix would be - I suppose |
The tests added with d6255c5 would sometimes fail, when for one reason or another the test cluster would take longer to set up the new topic than Cluster._create_topic was willing to wait. We had to give up after some retries, because we didn't try to distinguish between cluster settings where auto-create was enabled or disabled. This commit adds that ability, by letting _create_topic inspect the error value in the metadata response. If the error is a LeaderNotAvailable, that means the cluster is actually busy setting up our new topic, so we know it may be worth waiting a bit longer than the fixed 5 retries we had before. Signed-off-by: Yung-Chin Oei <yungchin@yungchin.nl>
Out of curiosity, I restarted the test on travis earlier (the previous run's log output is here for posterity and such), and indeed it passed just fine the second time around. Of course that doesn't mean it's all ok - the test would continue to occasionally fail. The commit I've just pushed should fix that. |
Break Cluster-Topic refcycle
This partially addresses #257, as discussed there. I figured it was best to make a separate pullreq, as it doesn't have much to do with the changes in #258.