Skip to content
This repository was archived by the owner on Mar 24, 2021. It is now read-only.

Conversation

yungchin
Copy link
Contributor

This partially addresses #257, as discussed there. I figured it was best to make a separate pullreq, as it doesn't have much to do with the changes in #258.

This breaks a reference cycle between Cluster and Topic.  As per the
discussion in #257, we didn't want to make Topic._cluster a weakref, as
this would break user code that retains a reference to the Topic only.
Instead, we make TopicDict lazily instantiate Topics (which should have
the added benefit of saving a bunch of space if a cluster has many
topics and partitions), while storing only a weakref to it.

This also moves Cluster._update_topics() into TopicDict, because it
needs to know about the weakref implementation details: if a Topic
hasn't been instantiated, it shouldn't try to update it.

Signed-off-by: Yung-Chin Oei <yungchin@yungchin.nl>
Signed-off-by: Yung-Chin Oei <yungchin@yungchin.nl>
The grandparent commit (which made TopicDict carry weakrefs) introduced
a bug, namely that topic auto-creation via __missing__() caused an
AttributeError to be raised, because I forgot to update __missing__ for
the new nature of the dict entries.

There was no test coverage to catch this, so tests are added here too.
To make those tests pass reliably, we needed to ensure that topic
auto-creation works even on a fresh cluster.  It turns out that with the
updated version of Cluster._get_metadata (per the grandparent commit),
that was just a one-line change - that is, as a side-effect, we've now
also addressed #175: auto-creation works on a freshly bootstrapped kafka
0.8.2 cluster.

Signed-off-by: Yung-Chin Oei <yungchin@yungchin.nl>
@yungchin
Copy link
Contributor Author

As noted in the last commit msg, this now also happens to deal with #175 - topic auto-creation is enabled even on a fresh, empty cluster (there's still no broker metadata, but it can fall back on the bootstrap brokers).

@emmettbutler
Copy link
Contributor

@yungchin thanks for making this totally separate from #258 - it makes them both a lot easier to think about. This will be next on my list of pull requests to review.

@emmettbutler
Copy link
Contributor

I've spent a while trying to grok this, and I understand it satisfactorily. Even outside of the specific object-leak bugfix, lazily creating Topics is a good idea to mitigate memory usage. I'd like to run a few more tests on my own, but I'm close to totally convinced.

@emmettbutler emmettbutler self-assigned this Sep 20, 2015
@yungchin
Copy link
Contributor Author

@emmett9001 thanks for checking this out! I was a bit spooked by the two test failures on travis just now:
https://travis-ci.org/Parsely/pykafka/jobs/81294673#L516

Have not seen these in local testing, but to save me some RAM I often run a two-node cluster (whereas we run a three-node cluster on travis), and I guess the smaller cluster is quicker to finish setting up a new topic.

In the failing tests, I think what goes wrong is this: test_topic_autocreate tries to auto-create a topic but gives up on it too soon (the cluster is still electing a leader and stuff), and so the test fails. Then we enter test_topic_updates, and in the middle of that our new topic from test_topic_autocreate finally appears, so now this test errors too because it sees more new topics than it expected. (And both tests are new, which is why this has never come up before, but this seems to be the kind of failure that will keep coming up randomly....)

I'm not really sure what the best fix would be - I suppose TopicDict._create_topic needs to wait longer or should really be smarter about the wait, but I've no good suggestions on that :(

The tests added with d6255c5 would sometimes fail, when for one
reason or another the test cluster would take longer to set up the new
topic than Cluster._create_topic was willing to wait.

We had to give up after some retries, because we didn't try to
distinguish between cluster settings where auto-create was enabled or
disabled.  This commit adds that ability, by letting _create_topic
inspect the error value in the metadata response.

If the error is a LeaderNotAvailable, that means the cluster
is actually busy setting up our new topic, so we know it may be worth
waiting a bit longer than the fixed 5 retries we had before.

Signed-off-by: Yung-Chin Oei <yungchin@yungchin.nl>
@yungchin
Copy link
Contributor Author

Out of curiosity, I restarted the test on travis earlier (the previous run's log output is here for posterity and such), and indeed it passed just fine the second time around. Of course that doesn't mean it's all ok - the test would continue to occasionally fail. The commit I've just pushed should fix that.

emmettbutler added a commit that referenced this pull request Sep 24, 2015
@emmettbutler emmettbutler merged commit 6768598 into master Sep 24, 2015
@emmettbutler emmettbutler deleted the fix/break-cluster-topic-refcycle branch September 24, 2015 19:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants