Skip to content
This repository has been archived by the owner on Mar 24, 2021. It is now read-only.

Client connects, but topics, listed, have None as their value #670

Closed
JohnOmernik opened this issue Mar 24, 2017 · 11 comments
Closed

Client connects, but topics, listed, have None as their value #670

JohnOmernik opened this issue Mar 24, 2017 · 11 comments
Labels

Comments

@JohnOmernik
Copy link

JohnOmernik commented Mar 24, 2017

PyKafka version:pykafka-2.5.0
Kafka version: kafka 0.10.2

I can connect via ZK or direct connection and I get a KafkaClient object back. I can list topics, but they show as "None" value so when I try to "assign" a topic, I get the error below.

<class 'pykafka.client.KafkaClient'>
{b'weblogs': None, b'dnslogs': None}
Traceback (most recent call last):
File "./pyweblogs.py", line 13, in
weblogs = client.topics['weblogs']
File "/usr/local/lib/python3.5/dist-packages/pykafka/cluster.py", line 56, in getitem
"got '%s'", type(key))
TypeError: ("TopicDict.getitem accepts a bytes object, but it got '%s'", <class 'str'>)

@emmettbutler
Copy link
Contributor

Hi @JohnOmernik, and thanks for reaching out. I notice that you're attempting to use a unicode object in client.topics['weblogs'], but getitem actually expects a bytes object. Just prefix the string with b like this client.topics[b'weblogs'] and that problem should go away. As for the None values in the topics dict, the cause is less clear. What do you see when you run the following in a shell?

$ kafka-topics.sh --list --zookeeper localhost:2181

@cpnielsen
Copy link

cpnielsen commented Apr 4, 2017

I see a similar thing when I "list" topics, by just outputting / printing topics:

>>> from pykafka import KafkaClient
>>> client = KafkaClient()
>>> client.topics
{b'test-topic': None}

Once I make use of the topic (creating a consumer/producer), I get the following:

>>> client.topics
{b'test-topic': <weakref at 0x107239368; to 'Topic' at 0x106833828>}

Functionality-wise, it does not seem like anything is wrong. Running the shell script listed gives me:

$ kafka-topics --list --zookeeper localhost:2181
__consumer_offsets
test-topic

@emmettbutler
Copy link
Contributor

This is happening because of a difference in dict behavior between python 2 and 3. The dict you're examining here is actually a specialized subclass of dict that only initializes its values as weakly-referenced objects when __getitem__ is called. As it turns out, using the implicit print client.topics in python2 results in calls to __getitem__; in python3 it does not. This means that in python3, the dict's values are not initialized and appear as None. This doesn't indicate that anything is broken here, but it may be confusing for python3 users.

@aodiwei
Copy link

aodiwei commented Jul 28, 2017

hi, @emmett9001 I use py2 and py3 , but both of them get the same problem:
{b'test-topic': None}
how to fix it

@emmettbutler
Copy link
Contributor

@aodiwei Can you be more specific? What code are you running to get the output {b'test-topic': None}?

@aodiwei
Copy link

aodiwei commented Aug 17, 2017

@emmett9001 thx for your answer, sorry for forgetting to replay you
my codes:

from pykafka import KafkaClient
client = KafkaClient(hosts='192.168.199.40:9091,192.168.199.41:9091,192.168.199.42:9091,192.168.199.43:9091,192.168.199.44:9091')
topics = client.topics
print(topics)
topic = topics.get('dm_session_protonum_topic_min_07')
print(topic)

part of console:
{b'dm_session_dst_ip_topic_day_2016': None, b'dm_server_relation_topic': None, b'dim_device_type_topic_month_2016': None, b'dim_browser_type_topic_hour_07': None, b'dm_server_relation_topic_month_2017': None, b'dm_session_dst_ip_topic_min_08': None, b'dim_user_statistics_topic_hour_07': None, b'dim_device_count_topic_hour_08': None, b'dm_session_dst_ip_day_topic_2017': None, b'dim_dst_statistics_topic_min_07': None, b'dim_system_type_topic_hour_08': None, b'dim_system_count_topic_hour_07': None, b'dm_server_relation_topic_hour_08': None, b'dim_user_statistics_topic_day_2016': None, b'dim_user_share_topic_week_2017': None, b'dm_session_src_ip_topic_min_08': None, b'dm_session_dst_ip_topic_month_2017': None, b'test2':

@emmettbutler
Copy link
Contributor

@aodiwei As far as I can tell this is the expected behavior. It's happening due to the use of weak references in the topics dictionary that are not initialized until a call to __getitem__ on that dictionary. In python 2.7 on my laptop, calling print(client.topics) shows the same output you're getting, presumably (and I'm guessing here) because python 2.7's print() does not call __getitem__ on dictionary arguments.

I can see that this is confusing for more than just a few users, so it's worth considering how to make this interface more readable. Maybe we override TopicDict.__repr__ to include a note about weak references? We might just make the documentation a bit clearer as well.

@aodiwei
Copy link

aodiwei commented Aug 18, 2017

@emmett9001 thx, i fix it in py3:
topic = client.topics[b'test']
As you said, client.topics don't return a available dict bescause the weak references

@fighting-dreamer
Copy link

fighting-dreamer commented Sep 10, 2017

from pykafka import KafkaClient

brokers = "127.0.0.1:9092"

def get_topics():
        client = KafkaClient(brokers)
        topics = client.topics
        return topics

topics = get_topics()
for topic_key in topics.keys():
        print(topic_key)
        print(topics[topic_key])

topic exist and both producer and consumer are working for that topic but it show "None" for topic object? I have tried adding b<topic_name_string> and changed python2 to python3 also, None of the solution worked for me. I get

b'MyfirstTopic'
Traceback (most recent call last):
  File "understanding_kafka.py", line 13, in <module>
    print(topics[topic_key])
  File "/usr/local/lib/python3.6/site-packages/pykafka/cluster.py", line 65, in __getitem__
    for i in range(self._cluster()._max_connection_retries):
AttributeError: 'NoneType' object has no attribute '_max_connection_retries'

@emmettbutler
Copy link
Contributor

@fighting-dreamer This error is happening because by the time you examine topics[topic_key], the KafkaClient instance you created in get_topics no longer exists. The KafkaClient instance needs to still be in memory when you access topics.

@fighting-dreamer
Copy link

@emmett9001 thnx!, It is now running

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants