You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First let’s create a topic where all the data will be streamed to. Here we will have a topic with 4 partitions to increase parallelized reads and writes.
##Create a Kafka Producer on your local machine
Let’s first install the Kafka Python package
localhost:~$ sudo pip install kafka-python
Next let’s create a file named kafka_producer.py and paste the following into the file. The current script will provide a ticker that performs a random walk and tag the data with a key.
The streaming data is simulated price data from several data sources. Each record contains a source, date, the last price at that time, and the number of contracts (or volume) traded at that price. The date is in the format of:
Let’s now spawn 8 producers from your machine all in parallel. This simulates 8 different data sources. This can be done with the use of tmux. The following script spawn_kafka_streams.sh will help you perform the task. Feel free to tailor this to your use case. This will create a new session that also contains N number of windows based on the argument you send into the script.
You will notice that when we produce messages to the Kafka broker in a keyed fashion, each node takes care of a partition of the specified topic due to the hashing of the keys. This also reduces the write workload on each node since each node only takes care of a fraction of the incoming data.