In [None]:
# | hide

import inspect
import os
import shutil
from pathlib import Path
from pprint import pprint
from tempfile import TemporaryDirectory
from typing import *

import pytest
from aiokafka import AIOKafkaProducer
from fastcore.basics import patch
from IPython.display import Markdown

from fastkafka.helpers import get_collapsible_admonition, source2markdown
from fastkafka.testing import mock_AIOKafkaProducer_send, run_script_and_cancel

In [None]:
# | notest
# | hide

import nest_asyncio

In [None]:
# | notest
# | hide

nest_asyncio.apply()

# FastKafka

<b>Effortless Kafka integration for your web services</b>

---

![PyPI](https://img.shields.io/pypi/v/fastkafka)
![PyPI - Downloads](https://img.shields.io/pypi/dm/fastkafka)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/fastkafka)

![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/airtai/fastkafka/test.yaml)
![CodeQL](https://github.com/airtai/fastkafka//actions/workflows/codeql.yml/badge.svg)
![Dependency Review](https://github.com/airtai/fastkafka//actions/workflows/dependency-review.yml/badge.svg)

![GitHub](https://img.shields.io/github/license/airtai/fastkafka)

---


[FastKafka](fastkafka.airt.ai) is a powerful and easy-to-use Python library for building asynchronous services that interact with Kafka topics. Built on top of [Pydantic](https://docs.pydantic.dev/), [AIOKafka](https://github.com/aio-libs/aiokafka) and [AsyncAPI](https://www.asyncapi.com/), FastKafka simplifies the process of writing producers and consumers for Kafka topics, handling all the parsing, networking, task scheduling and data generation automatically. With FastKafka, you can quickly prototype and develop high-performance Kafka-based services with minimal code, making it an ideal choice for developers looking to streamline their workflow and accelerate their projects.

## Install

FastKafka works on macOS, Linux, and most Unix-style operating systems. You can install it with `pip` as usual:

```sh
pip install fastkafka
```

## Tutorial

You can start an interactive tutorial in Google Colab by clicking the button below or by clicking on the following link:

https://colab.research.google.com/github/airtai/fastkafka/blob/main/nbs/guides/Guide_00_FastKafka_Demo.ipynb


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/airtai/fastkafka/blob/main/nbs/guides/Guide_00_FastKafka_Demo.ipynb)

## Writing server code

Here is an example python script using FastKafka that takes data from a Kafka topic, makes a prediction using a predictive model, and outputs the prediction to another Kafka topic.

### Preparing the demo model

First we will prepare our model using the Iris dataset so that we can demonstrate the preditions using FastKafka. The following call downloads the dataset and trains the model.

In [None]:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

X, y = load_iris(return_X_y=True)
model = LogisticRegression(random_state=0, max_iter=500).fit(X, y)
x = X[[0, 55, -1]]
print(x)
print(model.predict(x))

[[5.1 3.5 1.4 0.2]
 [5.7 2.8 4.5 1.3]
 [5.9 3.  5.1 1.8]]
[0 1 2]


### Messages

FastKafka uses [Pydantic](https://docs.pydantic.dev/) to parse input JSON-encoded data into Python objects, making it easy to work with structured data in your Kafka-based applications. Pydantic's [`BaseModel`](https://docs.pydantic.dev/usage/models/) class allows you to define messages using a declarative syntax, making it easy to specify the fields and types of your messages.

This example defines two message classes for use in a FastKafka application:

- The `IrisInputData` class is used to represent input data for a predictive model. It has four fields of type [`NonNegativeFloat`](https://docs.pydantic.dev/usage/types/#constrained-types), which is a subclass of float that only allows non-negative floating point values.

- The `IrisPrediction` class is used to represent the output of the predictive model. It has a single field `species` of type string representing the predicted species.

These message classes will be used to parse and validate incoming data in Kafka consumers and producers.

In [None]:
from pydantic import BaseModel, NonNegativeFloat, Field


class IrisInputData(BaseModel):
    sepal_length: NonNegativeFloat = Field(
        ..., example=0.5, description="Sepal length in cm"
    )
    sepal_width: NonNegativeFloat = Field(
        ..., example=0.5, description="Sepal width in cm"
    )
    petal_length: NonNegativeFloat = Field(
        ..., example=0.5, description="Petal length in cm"
    )
    petal_width: NonNegativeFloat = Field(
        ..., example=0.5, description="Petal width in cm"
    )


class IrisPrediction(BaseModel):
    species: str = Field(..., example="setosa", description="Predicted species")

### Application

This example shows how to initialize a FastKafka application.

It starts by defining  a dictionary called `kafka_brokers`, which contains two entries: `"localhost"` and `"production"`, specifying local development and production Kafka brokers. Each entry specifies the URL, port, and other details of a Kafka broker. This dictionary is used for generating the documentation only and it is not being checked by the actual server.

Next, an object of the `FastKafka` class is initialized with the minimum set of arguments:

- `kafka_brokers`: a dictionary used for generation of documentation

- `bootstrap_servers`: a ``host[:port]`` string or list of ``host[:port]`` strings that a consumer or a producer should contact to bootstrap initial cluster metadata


In [None]:
from fastkafka.application import FastKafka

kafka_brokers = {
    "localhost": {
        "url": "localhost",
        "description": "local development kafka broker",
        "port": 9092,
    },
    "production": {
        "url": "kafka.airt.ai",
        "description": "production kafka broker",
        "port": 9092,
        "protocol": "kafka-secure",
        "security": {"type": "plain"},
    },
}

kafka_app = FastKafka(
    kafka_brokers=kafka_brokers,
    bootstrap_servers="localhost:9092",
)

### Function decorators

FastKafka provides convenient function decorators `@kafka_app.consumes` and `@kafka_app.produces` to allow you to delegate the actual process of

- consuming and producing data to Kafka, and

- decoding and encoding JSON encode messages

from user defined functions to the framework. The FastKafka framework delegates these jobs to AIOKafka and Pydantic libraries.

These decorators make it easy to specify the processing logic for your Kafka consumers and producers, allowing you to focus on the core business logic of your application without worrying about the underlying Kafka integration.

This following example shows how to use the `@kafka_app.consumes` and `@kafka_app.produces` decorators in a FastKafka application:

- The `@kafka_app.consumes` decorator is applied to the `on_input_data` function, which specifies that this function should be called whenever a message is received on the "input_data" Kafka topic. The `on_input_data` function takes a single argument which is expected to be an instance of the `IrisInputData` message class. Specifying the type of the single argument is instructing the Pydantic to use `IrisInputData.parse_raw()` on the consumed message before passing it to the user defined function `on_input_data`.

- The `@produces` decorator is applied to the `to_predictions` function, which specifies that this function should produce a message to the "predictions" Kafka topic whenever it is called. The `to_predictions` function takes a single integer argument `species_class` representing one of three possible strign values predicted by the mdoel. It creates a new `IrisPrediction` message using this value and then returns it. The framework will call the `IrisPrediction.json().encode("utf-8")` function on the returned value and produce it to the specified topic.

In [None]:
@kafka_app.consumes(topic="input_data", auto_offset_reset="latest")
async def on_input_data(msg: IrisInputData):
    global model
    species_class = model.predict(
        [[msg.sepal_length, msg.sepal_width, msg.petal_length, msg.petal_width]]
    )[0]

    to_predictions(species_class)

@kafka_app.produces(topic="predictions")
def to_predictions(species_class: int) -> IrisPrediction:
    iris_species = ["setosa", "versicolor", "virginica"]

    prediction = IrisPrediction(species=iris_species[species_class])
    return prediction

## Testing the service

The service can be tested using the `Tester` instances which internally starts Kafka broker and zookeeper.

In [None]:
from fastkafka.application import Tester

msg = IrisInputData(
    sepal_length=0.1,
    sepal_width=0.2,
    petal_length=0.3,
    petal_width=0.4,
)

# Start Tester app and create local Kafka broker for testing
async with Tester(kafka_app) as tester:
    
    # Send IrisInputData message to input_data topic
    await tester.to_input_data(msg)
    
    # Assert that the kafka_app responded with IrisPedictionData in predictions topic
    await tester.awaited_mocks.on_predictions.assert_awaited_with(
        IrisPrediction(species="setosa"), timeout=2
    )

[INFO] fastkafka._testing.local_broker: Java is already installed.
[INFO] fastkafka._testing.local_broker: But not exported to PATH, exporting...
[INFO] fastkafka._testing.local_broker: Kafka is already installed.
[INFO] fastkafka._testing.local_broker: But not exported to PATH, exporting...
[INFO] fastkafka._testing.local_broker: Starting zookeeper...
[INFO] fastkafka._testing.local_broker: zookeeper started, sleeping for 5 seconds...
[INFO] fastkafka._testing.local_broker: Starting kafka...
[INFO] fastkafka._testing.local_broker: kafka started, sleeping for 5 seconds...
[INFO] fastkafka._testing.local_broker: Local Kafka broker up and running on 127.0.0.1:9092
[INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': '127.0.0.1:9092'}'
[INFO] fastkafka._components.aiokafka_producer_manager: AIOKafkaProducerManager.start(): Entering...
[INFO] fastkafka._components.aiokafka_producer_manager: _aiokafka_producer_manager(): Starting.

### Recap

We have created a Iris classification model and encapulated it into our fastkafka application.
The app will consume the IrisInputData from the `input_data` topic and produce the predictions to `predictions` topic.

To test the app we have:
1. Created the app
2. Started our Tester class which mirrors the developed app topics for testing purpuoses
3. Sent IrisInputData message to `input_data` topic
4. Asserted and checked that the developed iris classification service has reacted to IrisInputData message 

## Running the service

The service can be started using builtin faskafka run CLI command. Before we can do that, we will concatenate the code snippets from above and save them in a file `"application.py"`

In [None]:
kafka_app_source = """
from pydantic import BaseModel, NonNegativeFloat, Field

class IrisInputData(BaseModel):
    sepal_length: NonNegativeFloat = Field(
        ..., example=0.5, description="Sepal length in cm"
    )
    sepal_width: NonNegativeFloat = Field(
        ..., example=0.5, description="Sepal width in cm"
    )
    petal_length: NonNegativeFloat = Field(
        ..., example=0.5, description="Petal length in cm"
    )
    petal_width: NonNegativeFloat = Field(
        ..., example=0.5, description="Petal width in cm"
    )


class IrisPredictionData(BaseModel):
    species: str = Field(..., example="setosa", description="Predicted species")
    
from fastkafka.application import FastKafka

kafka_brokers = {
    "localhost": {
        "url": "localhost",
        "description": "local development kafka broker",
        "port": 9092,
    },
    "production": {
        "url": "kafka.airt.ai",
        "description": "production kafka broker",
        "port": 9092,
        "protocol": "kafka-secure",
        "security": {"type": "plain"},
    },
}

kafka_app = FastKafka(
    kafka_brokers=kafka_brokers,
    bootstrap_servers="localhost:9092",
)

iris_species = ["setosa", "versicolor", "virginica"]

@kafka_app.consumes(topic="input_data", auto_offset_reset="latest")
async def on_input_data(msg: IrisInputData):
    global model
    species_class = model.predict([
          [msg.sepal_length, msg.sepal_width, msg.petal_length, msg.petal_width]
        ])[0]

    to_predictions(species_class)


@kafka_app.produces(topic="predictions")
def to_predictions(species_class: int) -> IrisPredictionData:
    prediction = IrisPredictionData(species=iris_species[species_class])
    return prediction
"""

with open("application.py", "w") as source:
    source.write(kafka_app_source)

To run the service, you will need a running Kafka broker on localhost as specified by the `bootstrap_servers="localhost:9092"` parameter above. We can start the Kafka broker locally using the `LocalKafkaBroker`. Notice that the same happens automatically in the `Tester` as shown above.

In [None]:
from fastkafka.testing import LocalKafkaBroker

broker = LocalKafkaBroker(apply_nest_asyncio=True)

broker.start()

[INFO] fastkafka._testing.local_broker: LocalKafkaBroker.start(): entering...
[INFO] fastkafka._testing.local_broker: Java is already installed.
[INFO] fastkafka._testing.local_broker: Kafka is already installed.
[INFO] fastkafka._testing.local_broker: Starting zookeeper...
[INFO] fastkafka._testing.local_broker: zookeeper started, sleeping for 5 seconds...
[INFO] fastkafka._testing.local_broker: Starting kafka...
[INFO] fastkafka._testing.local_broker: kafka started, sleeping for 5 seconds...
[INFO] fastkafka._testing.local_broker: Local Kafka broker up and running on 127.0.0.1:9092
[INFO] fastkafka._testing.local_broker: <class 'fastkafka._testing.local_broker.LocalKafkaBroker'>.start(): returning 127.0.0.1:9092
[INFO] fastkafka._testing.local_broker: LocalKafkaBroker.start(): exited.


'127.0.0.1:9092'

Then, we start the FastKafka service by running the following command in the folder where the server.py file is located:

In [None]:
!fastkafka run --num-workers=2 application:kafka_app

[107042]: [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'localhost:9092'}'
[107042]: [INFO] fastkafka._components.aiokafka_producer_manager: AIOKafkaProducerManager.start(): Entering...
[107040]: [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'localhost:9092'}'
[107040]: [INFO] fastkafka._components.aiokafka_producer_manager: AIOKafkaProducerManager.start(): Entering...
[107040]: [INFO] fastkafka._components.aiokafka_producer_manager: _aiokafka_producer_manager(): Starting...
[107040]: [INFO] fastkafka._components.aiokafka_producer_manager: _aiokafka_producer_manager(): Starting send_stream
[107040]: [INFO] fastkafka._components.aiokafka_producer_manager: AIOKafkaProducerManager.start(): Finished.
[107042]: [INFO] fastkafka._components.aiokafka_producer_manager: _aiokafka_producer_manager(): Starting...
[107042]: [INFO] fastkafka._components.aiokafka_pr

You need to interupt running of the cell above by selecting `Runtime->Interupt execution` on the toolbar above.

Finally, we can stop the local Kafka Broker:

In [None]:
broker.stop()

[INFO] fastkafka._testing.local_broker: LocalKafkaBroker.stop(): entering...
[INFO] fastkafka._components._subprocess: terminate_asyncio_process(): Terminating the process 106434...
[INFO] fastkafka._components._subprocess: terminate_asyncio_process(): Process 106434 was already terminated.
[INFO] fastkafka._components._subprocess: terminate_asyncio_process(): Terminating the process 106051...
[INFO] fastkafka._components._subprocess: terminate_asyncio_process(): Process 106051 was already terminated.
[INFO] fastkafka._testing.local_broker: LocalKafkaBroker.stop(): exited.


## Documentation

The kafka app comes with builtin documentation generation using [AsyncApi HTML generator](https://www.asyncapi.com/tools/generator).



Before we can install the requirements for using the generator, we will update `Node.js` and `npm` needed to run this tool:  

In [None]:
! npm install -g n
! n lts
! npm install -g npm@latest

[K[?25h[37;40mnpm[0m [0m[31;40mERR![0m [0m[35mcode[0m EACCESdealTree[0m buildDeps[0m[K[0m[K
[0m[37;40mnpm[0m [0m[31;40mERR![0m [0m[35msyscall[0m mkdir
[0m[37;40mnpm[0m [0m[31;40mERR![0m [0m[35mpath[0m /usr/lib/node_modules/n
[0m[37;40mnpm[0m [0m[31;40mERR![0m [0m[35merrno[0m -13
[0m[37;40mnpm[0m [0m[31;40mERR![0m[35m[0m Error: EACCES: permission denied, mkdir '/usr/lib/node_modules/n'
[0m[37;40mnpm[0m [0m[31;40mERR![0m[35m[0m  [Error: EACCES: permission denied, mkdir '/usr/lib/node_modules/n'] {
[0m[37;40mnpm[0m [0m[31;40mERR![0m[35m[0m   errno: -13,
[0m[37;40mnpm[0m [0m[31;40mERR![0m[35m[0m   code: 'EACCES',
[0m[37;40mnpm[0m [0m[31;40mERR![0m[35m[0m   syscall: 'mkdir',
[0m[37;40mnpm[0m [0m[31;40mERR![0m[35m[0m   path: '/usr/lib/node_modules/n'
[0m[37;40mnpm[0m [0m[31;40mERR![0m[35m[0m }
[0m[37;40mnpm[0m [0m[31;40mERR![0m[35m[0m 
[0m[37;40mnpm[0m [0m[31;40mERR![0m[35m[0m The

After that, we can install all dependancies from shell using the following command line:

In [None]:
! fastkafka docs install_deps

[INFO] fastkafka._components.asyncapi: AsyncAPI generator installed


To generate the documentation programatically you just need to call `.generate_docs()` method of your app. This will generate the *asyncapi* folder in relative path where all your documentation will be saved. 

In [None]:
! fastkafka docs generate application:kafka_app

[INFO] fastkafka._components.asyncapi: New async specifications generated at: '/work/fastkafka/nbs/guides/asyncapi/spec/asyncapi.yml'
[INFO] fastkafka._components.asyncapi: Async docs generated at 'asyncapi/docs'
[INFO] fastkafka._components.asyncapi: Output of '$ npx -y -p @asyncapi/generator ag asyncapi/spec/asyncapi.yml @asyncapi/html-template -o asyncapi/docs --force-write'[32m

Done! ✨[0m
[33mCheck out your shiny new generated files at [0m[35m/work/fastkafka/nbs/guides/asyncapi/docs[0m[33m.[0m




In [None]:
! ls asyncapi/

[0m[01;34mdocs[0m  [01;34mspec[0m


In docs folder you will find the servable static html file of your documentation. This can also be served using our `fastkafka docs serve` CLI command (more on that in our guides).

In spec folder you will find a asyncapi.yml file containing the async API specification of your application. 
Lets view the generated docs now.

In [None]:
from fastkafka.testing import display_docs

await display_docs(docs_path="asyncapi/docs")

[INFO] fastkafka._components._subprocess: terminate_asyncio_process(): Terminating the process 107222...
[INFO] fastkafka._components._subprocess: terminate_asyncio_process(): Process 107222 was already terminated.
