Skip to content
Python client for Apache Kafka
Python Other
  1. Python 99.5%
  2. Other 0.5%
Branch: master
Clone or download
dpkp and jeffwidman Reduce internal client poll timeout for consumer iterator interface (#…
…1824)

More attempts to address heartbeat timing issues in consumers, especially with the iterator interface. Here we can reduce the `client.poll` timeout to at most the retry backoff (typically 100ms) so that the consumer iterator interface doesn't block for longer than the heartbeat timeout.
Latest commit 5bc2529 Aug 16, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
benchmarks Fixups to benchmark scripts for py3 / new KafkaFixture interface Mar 30, 2019
docs Release 1.4.6 Apr 3, 2019
kafka Reduce internal client poll timeout for consumer iterator interface (#… Aug 16, 2019
servers Rename ssl.keystore.location and ssl.truststore.location config files Mar 24, 2019
test tests: Use socket.SOCK_STREAM in assertions Aug 15, 2019
.covrc Dont include kafka.vendor in coverage Aug 4, 2016
.gitignore Increase some integration test timeouts (#1374) Feb 9, 2018
.gitmodules Remove kafka src submodules Aug 13, 2014
.travis.yml Update travis test coverage: 2.7, 3.4, 3.7, pypy2.7 (#1614) Mar 13, 2019
AUTHORS.md Fix for Python 3 byte string handling in SASL auth (#1353) Jan 24, 2018
CHANGES.md Release 1.4.6 Apr 3, 2019
LICENSE Update LICENSE Feb 3, 2015
MANIFEST.in Include README.rst, CHANGES.md, and AUTHORS.md in manifest Dec 7, 2015
Makefile Update travis test coverage: 2.7, 3.4, 3.7, pypy2.7 (#1614) Mar 13, 2019
README.rst Update kafka broker compatibility docs Nov 20, 2018
build_integration.sh Add test resources for kafka versions 1.0.2 -> 2.0.1 Nov 20, 2018
example.py Added controlled thread shutdown to example.py (#1268) Oct 21, 2017
pylint.rc Prevent `pylint` import errors on `six.moves` Oct 24, 2018
requirements-dev.txt Add py to requirements-dev Mar 22, 2019
setup.cfg Add license to wheel Nov 4, 2017
setup.py Update travis test coverage: 2.7, 3.4, 3.7, pypy2.7 (#1614) Mar 13, 2019
tox.ini Update travis test coverage: 2.7, 3.4, 3.7, pypy2.7 (#1614) Mar 13, 2019
travis_java_install.sh Update travis test coverage: 2.7, 3.4, 3.7, pypy2.7 (#1614) Mar 13, 2019

README.rst

Kafka Python client

https://coveralls.io/repos/dpkp/kafka-python/badge.svg?branch=master&service=github https://travis-ci.org/dpkp/kafka-python.svg?branch=master

Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much like the official java client, with a sprinkling of pythonic interfaces (e.g., consumer iterators).

kafka-python is best used with newer brokers (0.9+), but is backwards-compatible with older versions (to 0.8.0). Some features will only be enabled on newer brokers. For example, fully coordinated consumer groups -- i.e., dynamic partition assignment to multiple consumers in the same group -- requires use of 0.9+ kafka brokers. Supporting this feature for earlier broker releases would require writing and maintaining custom leadership election and membership / health check code (perhaps using zookeeper or consul). For older brokers, you can achieve something similar by manually assigning different partitions to each consumer instance with config management tools like chef, ansible, etc. This approach will work fine, though it does not support rebalancing on failures. See <https://kafka-python.readthedocs.io/en/master/compatibility.html> for more details.

Please note that the master branch may contain unreleased features. For release documentation, please see readthedocs and/or python's inline help.

>>> pip install kafka-python

KafkaConsumer

KafkaConsumer is a high-level message consumer, intended to operate as similarly as possible to the official java client. Full support for coordinated consumer groups requires use of kafka brokers that support the Group APIs: kafka v0.9+.

See <https://kafka-python.readthedocs.io/en/master/apidoc/KafkaConsumer.html> for API and configuration details.

The consumer iterator returns ConsumerRecords, which are simple namedtuples that expose basic message attributes: topic, partition, offset, key, and value:

>>> from kafka import KafkaConsumer
>>> consumer = KafkaConsumer('my_favorite_topic')
>>> for msg in consumer:
...     print (msg)
>>> # join a consumer group for dynamic partition assignment and offset commits
>>> from kafka import KafkaConsumer
>>> consumer = KafkaConsumer('my_favorite_topic', group_id='my_favorite_group')
>>> for msg in consumer:
...     print (msg)
>>> # manually assign the partition list for the consumer
>>> from kafka import TopicPartition
>>> consumer = KafkaConsumer(bootstrap_servers='localhost:1234')
>>> consumer.assign([TopicPartition('foobar', 2)])
>>> msg = next(consumer)
>>> # Deserialize msgpack-encoded values
>>> consumer = KafkaConsumer(value_deserializer=msgpack.loads)
>>> consumer.subscribe(['msgpackfoo'])
>>> for msg in consumer:
...     assert isinstance(msg.value, dict)
>>> # Access record headers. The returned value is a list of tuples
>>> # with str, bytes for key and value
>>> for msg in consumer:
...     print (msg.headers)
>>> # Get consumer metrics
>>> metrics = consumer.metrics()

KafkaProducer

KafkaProducer is a high-level, asynchronous message producer. The class is intended to operate as similarly as possible to the official java client. See <https://kafka-python.readthedocs.io/en/master/apidoc/KafkaProducer.html> for more details.

>>> from kafka import KafkaProducer
>>> producer = KafkaProducer(bootstrap_servers='localhost:1234')
>>> for _ in range(100):
...     producer.send('foobar', b'some_message_bytes')
>>> # Block until a single message is sent (or timeout)
>>> future = producer.send('foobar', b'another_message')
>>> result = future.get(timeout=60)
>>> # Block until all pending messages are at least put on the network
>>> # NOTE: This does not guarantee delivery or success! It is really
>>> # only useful if you configure internal batching using linger_ms
>>> producer.flush()
>>> # Use a key for hashed-partitioning
>>> producer.send('foobar', key=b'foo', value=b'bar')
>>> # Serialize json messages
>>> import json
>>> producer = KafkaProducer(value_serializer=lambda v: json.dumps(v).encode('utf-8'))
>>> producer.send('fizzbuzz', {'foo': 'bar'})
>>> # Serialize string keys
>>> producer = KafkaProducer(key_serializer=str.encode)
>>> producer.send('flipflap', key='ping', value=b'1234')
>>> # Compress messages
>>> producer = KafkaProducer(compression_type='gzip')
>>> for i in range(1000):
...     producer.send('foobar', b'msg %d' % i)
>>> # Include record headers. The format is list of tuples with string key
>>> # and bytes value.
>>> producer.send('foobar', value=b'c29tZSB2YWx1ZQ==', headers=[('content-encoding', b'base64')])
>>> # Get producer performance metrics
>>> metrics = producer.metrics()

Thread safety

The KafkaProducer can be used across threads without issue, unlike the KafkaConsumer which cannot.

While it is possible to use the KafkaConsumer in a thread-local manner, multiprocessing is recommended.

Compression

kafka-python supports gzip compression/decompression natively. To produce or consume lz4 compressed messages, you should install python-lz4 (pip install lz4). To enable snappy compression/decompression install python-snappy (also requires snappy library). See <https://kafka-python.readthedocs.io/en/master/install.html#optional-snappy-install> for more information.

Protocol

A secondary goal of kafka-python is to provide an easy-to-use protocol layer for interacting with kafka brokers via the python repl. This is useful for testing, probing, and general experimentation. The protocol support is leveraged to enable a KafkaClient.check_version() method that probes a kafka broker and attempts to identify which version it is running (0.8.0 to 1.1+).

Low-level

Legacy support is maintained for low-level consumer and producer classes, SimpleConsumer and SimpleProducer. See <https://kafka-python.readthedocs.io/en/master/simple.html?highlight=SimpleProducer> for API details.

You can’t perform that action at this time.