# Benchmarking FastKafka app

## Prerequisites

To benchmark a `FastKafka` project, you will need the following:

1. A library built with `FastKafka`.
2. A running `Kafka` instance to benchmark the FastKafka application against.

### Creating FastKafka Code

Let's create a `FastKafka`-based application and write it to the `application.py` file based on the [tutorial](/#tutorial).

```python
# content of the "application.py" file

from pydantic import BaseModel, NonNegativeFloat, Field

class IrisInputData(BaseModel):
    sepal_length: NonNegativeFloat = Field(
        ..., example=0.5, description="Sepal length in cm"
    )
    sepal_width: NonNegativeFloat = Field(
        ..., example=0.5, description="Sepal width in cm"
    )
    petal_length: NonNegativeFloat = Field(
        ..., example=0.5, description="Petal length in cm"
    )
    petal_width: NonNegativeFloat = Field(
        ..., example=0.5, description="Petal width in cm"
    )


class IrisPrediction(BaseModel):
    species: str = Field(..., example="setosa", description="Predicted species")

from fastkafka import FastKafka

kafka_brokers = {
    "localhost": {
        "url": "localhost",
        "description": "local development kafka broker",
        "port": 9092,
    },
    "production": {
        "url": "kafka.airt.ai",
        "description": "production kafka broker",
        "port": 9092,
        "protocol": "kafka-secure",
        "security": {"type": "plain"},
    },
}

kafka_app = FastKafka(
    title="Iris predictions",
    kafka_brokers=kafka_brokers,
    bootstrap_servers="localhost:9092",
)

iris_species = ["setosa", "versicolor", "virginica"]

@kafka_app.consumes(topic="input_data", auto_offset_reset="latest")
async def on_input_data(msg: IrisInputData):
    global model
    species_class = model.predict([
          [msg.sepal_length, msg.sepal_width, msg.petal_length, msg.petal_width]
        ])[0]

    await to_predictions(species_class)


@kafka_app.produces(topic="predictions")
async def to_predictions(species_class: int) -> IrisPrediction:
    prediction = IrisPrediction(species=iris_species[species_class])
    return prediction

```


`FastKafka` has a decorator for benchmarking which is appropriately called as `benchmark`.
Let's edit our `application.py` file and add the `benchmark` decorator to the consumes method.


```python
# content of the "application.py" file with benchmark

from pydantic import BaseModel, NonNegativeFloat, Field

class IrisInputData(BaseModel):
    sepal_length: NonNegativeFloat = Field(
        ..., example=0.5, description="Sepal length in cm"
    )
    sepal_width: NonNegativeFloat = Field(
        ..., example=0.5, description="Sepal width in cm"
    )
    petal_length: NonNegativeFloat = Field(
        ..., example=0.5, description="Petal length in cm"
    )
    petal_width: NonNegativeFloat = Field(
        ..., example=0.5, description="Petal width in cm"
    )


class IrisPrediction(BaseModel):
    species: str = Field(..., example="setosa", description="Predicted species")

from fastkafka import FastKafka

kafka_brokers = {
    "localhost": {
        "url": "localhost",
        "description": "local development kafka broker",
        "port": 9092,
    },
    "production": {
        "url": "kafka.airt.ai",
        "description": "production kafka broker",
        "port": 9092,
        "protocol": "kafka-secure",
        "security": {"type": "plain"},
    },
}

kafka_app = FastKafka(
    title="Iris predictions",
    kafka_brokers=kafka_brokers,
    bootstrap_servers="localhost:9092",
)

iris_species = ["setosa", "versicolor", "virginica"]

@kafka_app.consumes(topic="input_data", auto_offset_reset="latest")
@kafka_app.benchmark(interval=1, sliding_window_size=5)
async def on_input_data(msg: IrisInputData):
    global model
    species_class = model.predict([
          [msg.sepal_length, msg.sepal_width, msg.petal_length, msg.petal_width]
        ])[0]

    await to_predictions(species_class)


@kafka_app.produces(topic="predictions")
async def to_predictions(species_class: int) -> IrisPrediction:
    prediction = IrisPrediction(species=iris_species[species_class])
    return prediction

```


Here we are conducting a benchmark of a function that consumes data from the `input_data` topic with an interval of 1 second and a sliding window size of 5. This benchmark method uses the `interval` parameter to calculate the results over a specific time period, and the `sliding_window_size parameter` to determine the maximum number of results to use in calculating the average throughput and standard deviation. This benchmark is important to ensure that the function is performing optimally and to identify any areas for improvement.

### Starting Kafka

If you already have a `Kafka` running somewhere, then you can skip this step. Please keep in mind that your benchmarking results may be affected by bottlenecks such as network, CPU cores in the Kafka machine, or even the Kafka configuration itself.

#### Installing Java and Kafka
We need a working `Kafka`instance to benchmark our `FastKafka` app, and to run `Kafka` we need `Java`. Thankfully, `FastKafka` comes with a CLI to install both `Java` and `Kafka` on our machine.
So, let's install `Java` and `Kafka` by executing the following command.

```cmd
fastkafka testing install_deps
```

The above command will extract `Kafka` scripts at the location "$HOME/.local/kafka_2.13-3.3.2" on your machine. 


#### Creating configuration for Zookeeper and Kafka
Now we need to start `Zookeeper` and `Kafka` separately, and to start them we need `zookeeper.properties` and `kafka.properties` files.

Let's create a folder inside the folder where `Kafka` scripts were extracted and change directory into it.

```cmd
mkdir $HOME/.local/kafka_2.13-3.3.2/data_dir && cd $HOME/.local/kafka_2.13-3.3.2/data_dir
```

Let's create a file called `zookeeper.properties` and write the following content to the file:

```txt
dataDir=$HOME/.local/kafka_2.13-3.3.2/data_dir/zookeeper
clientPort=2181
maxClientCnxns=0
```

Similarly, let's create a file called `kafka.properties` and write the following content to the file:

```txt
broker.id=0
listeners=PLAINTEXT://:9092

num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600

num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

log.dirs=/work/fastkafka/kafka_config/kafka_logs
log.flush.interval.messages=10000
log.flush.interval.ms=1000
log.retention.hours=168
log.retention.bytes=1073741824
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000


zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=18000
```


#### Starting Zookeeper and Kafka

We need two different terminals to run `Zookeeper` in one and `Kafka` in another.
Let's open a new terminal and run the following command to start `Zookeeper`

```cmd
cd $HOME/.local/kafka_2.13-3.3.2/bin && ./zookeeper-server-start.sh ../data_dir/zookeeper.properties
```


Once `Zookeeper` is up and running, open a new terminal and execute the follwing command to start `Kafka`:
```cmd
cd $HOME/.local/kafka_2.13-3.3.2/bin && ./kafka-server-start.sh ../data_dir/kafka.properties
```

Now we have both `Zookeeper` and `Kafka` up and running.

### Benchmarking FastKafka


Once `Zookeeper` and `Kafka` are ready then benchmarking `FastKafka` app is as simple as running the `fastkafka run` command

```cmd
fastkafka run --num-workers 1 --kafka-broker localhost application:kafka_app
```

This will start the `FastKafka` app and will start consuming messages from `Kafka` which we spun up earlier. 
Also the same command will output all the benchmark throughputs based on `interval` and `sliding_window_size`