# Apache Kafka on Python

**Install Python Kafka**

Kafka uses the `kafka-python` library to connect and manage Kafka clusters. Install it with the following command.

`!pip install kafka-python`

## Setting Up Kafka Cluster

To set up a Kafka cluster locally, you need to install Kafka and Zookeeper on your machine.

1. Download Kafka: [Kafka Downloads](https://kafka.apache.org/downloads)
2. Extract the downloaded files and navigate to the Kafka directory.
3. Start Zookeeper: Open a terminal and run:
   ```
   bin/zookeeper-server-start.sh config/zookeeper.properties
   ```
4. Start Kafka server: Open another terminal and run:
   ```
   bin/kafka-server-start.sh config/server.properties
   ```

Ensure both Zookeeper and Kafka server are running before proceeding.

*You can read the slide for details regarding this*

## Use Case

We want to fetch real-time weather data for a city and publish it to a Kafka topic called `weather` for downstream processing.


### Dataset

We will use data from api https://wttr.in/<city>?format=j1. You can set `<city>` by input the city that you want to know the weather. For example https://wttr.in/jakarta?format=j1.

If you want to explore the api, you can go to the url on your browser without `?format=j1`. For example https://wttr.in/jakarta. However, in this case, we only fetch certain data from the json.

FIrst of all, we will test the api and get the current condition only

In [1]:
import requests

api = requests.get('https://wttr.in/jakarta?format=j1')
api.json()['current_condition'][0]

{'FeelsLikeC': '33',
 'FeelsLikeF': '91',
 'cloudcover': '25',
 'humidity': '75',
 'localObsDateTime': '2024-12-07 04:32 PM',
 'observation_time': '09:32 AM',
 'precipInches': '0.0',
 'precipMM': '0.1',
 'pressure': '1005',
 'pressureInches': '30',
 'temp_C': '30',
 'temp_F': '86',
 'uvIndex': '2',
 'visibility': '10',
 'visibilityMiles': '6',
 'weatherCode': '116',
 'weatherDesc': [{'value': 'Partly cloudy'}],
 'weatherIconUrl': [{'value': ''}],
 'winddir16Point': 'N',
 'winddirDegree': '7',
 'windspeedKmph': '15',
 'windspeedMiles': '10'}

### Producer Setting

```python
from kafka import KafkaProducer
import json

producer = KafkaProducer(
    bootstrap_servers='localhost:9092',
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)
```

- `bootstrap_servers='localhost:9092'`: This specifies the Kafka broker(s) that the producer connects to. In this case, the producer connects to a Kafka broker running locally on port 9092. If there are multiple brokers, you can provide them as a comma-separated list.
- `value_serializer`: The value_serializer is a function that Kafka uses to convert Python objects into a format suitable for transmission over the network. Kafka sends messages as byte arrays (bytes). The value_serializer transforms your data into this format.



### Create Topic

Don't forget to create topic in terminal/command prompt

`bin/kafka-topic.sh --bootstrap-server localhost:9092 --topic weather --create` for Mac/Linux

or

`bin\windows\kafka-topic.bat --bootstrap-server localhost:9092 --topic weather --create` for Windows

### Publish Data from API to Kafka

```python
data = requests.get("https://wttr.in/jakarta?format=j1").json()['current_condition'][0]
producer.send('weather', value=data)
```

You can compile into a function:

```python
def producer_data():
  producer = KafkaProducer(bootstrap_servers='localhost:9092' value_serializer=lambda v: json.dumps(v).encode('utf-8'))

  data = requests.get("https://wttr.in/jakarta?format=j1").json()['current_condition'][0]
  producer.send('weather', value=data)
```

### Kafka Consumer

To look at the data you have already sent by producer, you can run the Kafka consumer in command prompt/terminal by running following command:

`bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic weather` for Mac/Linux

or

`bin\windows\kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic weather` for Windows

### Compile in One Script

```python
import requests
from kafka import KafkaProducer
import json


def producer_data():
  producer = KafkaProducer(bootstrap_servers='localhost:9092' value_serializer=lambda v: json.dumps(v).encode('utf-8'))

  data = requests.get("https://wttr.in/jakarta?format=j1").json()['current_condition'][0]
  
  producer.send('weather', value=data)
  print(f'{data} successfully transfered')

if __name__ == "__main__":
  producer_data()
```


  
Save the script into python file (name it as you want) and run in seperate terminal/cmd (**make sure the previous running producer has been shutted down**).