# Using multiple Kafka clusters

In [None]:
# | hide

import platform
import pytest
from IPython.display import Markdown as md

from pydantic import BaseModel, Field

from fastkafka import FastKafka
from fastkafka.testing import Tester, ApacheKafkaBroker, run_script_and_cancel

Ready to take your FastKafka app to the next level? This guide shows you how to connect to multiple Kafka clusters effortlessly. Consolidate topics and produce messages across clusters like a pro. 
Unleash the full potential of your Kafka-powered app with FastKafka. Let's dive in and elevate your application's capabilities!

### Test message

To showcase the functionalities of FastKafka and illustrate the concepts discussed, we can use a simple test message called `TestMsg`. Here's the definition of the `TestMsg` class:

In [None]:
class TestMsg(BaseModel):
    msg: str = Field(...)

## Defining multiple broker configurations

When building a FastKafka application, you may need to consume messages from multiple Kafka clusters, each with its own set of broker configurations. FastKafka provides the flexibility to define different broker clusters using the brokers argument in the consumes decorator. Let's explore an example code snippet

In [None]:
from pydantic import BaseModel, Field

from fastkafka import FastKafka

class TestMsg(BaseModel):
    msg: str = Field(...)

kafka_brokers_1 = dict(
    development=dict(url="dev.server_1", port=9092),
    production=dict(url="prod.server_1", port=9092),
)
kafka_brokers_2 = dict(
    development=dict(url="dev.server_2", port=9092),
    production=dict(url="prod.server_1", port=9092),
)

app = FastKafka(kafka_brokers=kafka_brokers_1)


@app.consumes(topic="preprocessed_signals")
async def on_preprocessed_signals_1(msg: TestMsg):
    print(f"Received on s1: {msg=}")
    await to_predictions_1(msg)


@app.consumes(topic="preprocessed_signals", brokers=kafka_brokers_2)
async def on_preprocessed_signals_2(msg: TestMsg):
    print(f"Received on s2: {msg=}")
    await to_predictions_2(msg)
    
@app.produces(topic="predictions")
async def to_predictions_1(msg: TestMsg) -> TestMsg:
    return msg
    
@app.produces(topic="predictions", brokers=kafka_brokers_2)
async def to_predictions_2(msg: TestMsg) -> TestMsg:
    return msg

In this example, the application has two consumes endpoints, both of which will consume events from `preprocessed_signals` topic. `on_preprocessed_signals_1` will consume events from `kafka_brokers_1` configuration and `on_preprocessed_signals_2` will consume events from `kafka_brokers_2` configuration.
When producing, `to_predictions_1` will produce to `predictions` topic on `kafka_brokers_1` cluster and `to_predictions_2` will produce to `predictions` topic on `kafka_brokers_2` cluster.


#### How it works

The `kafka_brokers_1` configuration represents the primary cluster, while `kafka_brokers_2` serves as an alternative cluster specified in the decorator.

Using the FastKafka class, the app object is initialized with the primary broker configuration (`kafka_brokers_1`). By default, the `@app.consumes` decorator without the brokers argument consumes messages from the `preprocessed_signals` topic on `kafka_brokers_1`.

To consume messages from a different cluster, the `@app.consumes` decorator includes the `brokers` argument. This allows explicit specification of the broker cluster in the `on_preprocessed_signals_2` function, enabling consumption from the same topic but using the `kafka_brokers_2` configuration.

The brokers argument can also be used in the @app.produces decorator to define multiple broker clusters for message production.

It's important to ensure that all broker configurations have the same required settings as the primary cluster to ensure consistent behavior.

## Testing the application

To test our FastKafka 'mirroring' application, we can use our testing framework. Lets take a look how it's done:

In [None]:
from fastkafka.testing import Tester

async with Tester(app) as tester:
    # Send TestMsg to topic/broker pair on_preprocessed_signals_1 is consuming from
    await tester.mirrors[app.on_preprocessed_signals_1](TestMsg(msg="signal_s1"))
    # Assert on_preprocessed_signals_1 consumed sent message
    await app.awaited_mocks.on_preprocessed_signals_1.assert_called_with(
        TestMsg(msg="signal_s1"), timeout=5
    )
    # Assert app has produced a prediction
    await tester.mirrors[app.to_predictions_1].assert_called_with(
        TestMsg(msg="signal_s1"), timeout=5
    )

    # Send TestMsg to topic/broker pair on_preprocessed_signals_2 is consuming from
    await tester.mirrors[app.on_preprocessed_signals_2](TestMsg(msg="signal_s2"))
    # Assert on_preprocessed_signals_2 consumed sent message
    await app.awaited_mocks.on_preprocessed_signals_2.assert_called_with(
        TestMsg(msg="signal_s2"), timeout=5
    )
    # Assert app has produced a prediction
    await tester.mirrors[app.to_predictions_2].assert_called_with(
        TestMsg(msg="signal_s2"), timeout=5
    )

23-05-30 10:33:08.720 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker._start() called
23-05-30 10:33:08.720 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker._patch_consumers_and_producers(): Patching consumers and producers!
23-05-30 10:33:08.721 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker starting
23-05-30 10:33:08.721 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'dev.server_1:9092'}'
23-05-30 10:33:08.722 [INFO] fastkafka._testing.in_memory_broker: AIOKafkaProducer patched start() called()
23-05-30 10:33:08.722 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'dev.server_2:9092'}'
23-05-30 10:33:08.723 [INFO] fastkafka._testing.in_memory_broker: AIOKafkaProducer patched start() called()
23-05-30 10:33:08.741 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_server

The usage of the `tester.mirrors` dictionary allows specifying the desired topic/broker combination for sending the test messages, especially when working with multiple Kafka clusters. 
This ensures that the data is sent to the appropriate topic/broker based on the consuming function, and consumed from appropriate topic/broker based on the producing function.

## Running the application

You can run your application using `fastkafka run` CLI command in the same way that you would run a single cluster app.

To start your app, copy the code above in multi_cluster_example.py and run it by running:

In [None]:
# | echo: false

script_file = "multi_cluster_example.py"
filename = script_file.split(".py")[0]
cmd = f"fastkafka run --num-workers=1 --kafka-broker=development {filename}:app"
md(
    f"Now we can run the app. Copy the code above in {script_file}, adjust your server configurations, and run it by running\n```shell\n{cmd}\n```"
)

Now we can run the app. Copy the code above in multi_cluster_example.py, adjust your server configurations, and run it by running
```shell
fastkafka run --num-workers=1 --kafka-broker=development multi_cluster_example:app
```

In [None]:
# | hide

multi_cluster_example = """
from pydantic import BaseModel, Field

from fastkafka import FastKafka

class TestMsg(BaseModel):
    msg: str = Field(...)

kafka_brokers_1 = dict(
    development=dict(url="<url_of_your_kafka_bootstrap_server_1>", port=<port_of_your_kafka_bootstrap_server_1>),
    production=dict(url="prod.server_1", port=9092),
)
kafka_brokers_2 = dict(
    development=dict(url="<url_of_your_kafka_bootstrap_server_2>", port=<port_of_your_kafka_bootstrap_server_2>),
    production=dict(url="prod.server_1", port=9092),
)

app = FastKafka(kafka_brokers=kafka_brokers_1)


@app.consumes(topic="preprocessed_signals")
async def on_preprocessed_signals_1(msg: TestMsg):
    print(f"Received on s1: {msg=}")


@app.consumes(topic="preprocessed_signals", brokers=kafka_brokers_2)
async def on_preprocessed_signals_2(msg: TestMsg):
    print(f"Received on s2: {msg=}")
"""

In [None]:
# | hide

with ApacheKafkaBroker(
    topics=["preprocessed_signals"], apply_nest_asyncio=True
) as bootstrap_server_1, ApacheKafkaBroker(
    topics=["preprocessed_signals"], apply_nest_asyncio=True
) as bootstrap_server_2:
    server_url_1 = bootstrap_server_1.split(":")[0]
    server_port_1 = bootstrap_server_1.split(":")[1]
    server_url_2 = bootstrap_server_2.split(":")[0]
    server_port_2 = bootstrap_server_2.split(":")[1]
    exit_code, output = await run_script_and_cancel(
        script=multi_cluster_example.replace(
            "<url_of_your_kafka_bootstrap_server_1>", server_url_1
        )
        .replace("<port_of_your_kafka_bootstrap_server_1>", server_port_1)
        .replace("<url_of_your_kafka_bootstrap_server_2>", server_url_2)
        .replace("<port_of_your_kafka_bootstrap_server_2>", server_port_2),
        script_file=script_file,
        cmd=cmd,
        cancel_after=5,
    )

    expected_returncode = [0, 1]
    assert exit_code in expected_returncode, output.decode("UTF-8")

23-05-30 10:33:14.199 [INFO] fastkafka._testing.apache_kafka_broker: ApacheKafkaBroker.start(): entering...
23-05-30 10:33:14.202 [INFO] fastkafka._components.test_dependencies: Java is already installed.
23-05-30 10:33:14.202 [INFO] fastkafka._components.test_dependencies: But not exported to PATH, exporting...
23-05-30 10:33:14.203 [INFO] fastkafka._components.test_dependencies: Kafka is installed.
23-05-30 10:33:14.203 [INFO] fastkafka._components.test_dependencies: But not exported to PATH, exporting...
23-05-30 10:33:14.204 [INFO] fastkafka._testing.apache_kafka_broker: Starting zookeeper...
23-05-30 10:33:15.045 [INFO] fastkafka._testing.apache_kafka_broker: Starting kafka...
23-05-30 10:33:17.449 [INFO] fastkafka._testing.apache_kafka_broker: Local Kafka broker up and running on 127.0.0.1:9092
23-05-30 10:33:19.347 [INFO] fastkafka._testing.apache_kafka_broker: <class 'fastkafka.testing.ApacheKafkaBroker'>.start(): returning 127.0.0.1:9092
23-05-30 10:33:19.348 [INFO] fastkafka.

In your app logs, you should see your app starting up and your two consumer functions connecting to different kafka clusters.

In [None]:
# | echo: false

print(output.decode("UTF-8"))

[90735]: 23-05-30 10:33:29.699 [INFO] fastkafka._components.aiokafka_consumer_loop: aiokafka_consumer_loop() starting...
[90735]: 23-05-30 10:33:29.700 [INFO] fastkafka._components.aiokafka_consumer_loop: aiokafka_consumer_loop(): Consumer created using the following parameters: {'auto_offset_reset': 'earliest', 'max_poll_records': 100, 'bootstrap_servers': '127.0.0.1:9092'}
[90735]: 23-05-30 10:33:29.700 [INFO] fastkafka._components.aiokafka_consumer_loop: aiokafka_consumer_loop() starting...
[90735]: 23-05-30 10:33:29.700 [INFO] fastkafka._components.aiokafka_consumer_loop: aiokafka_consumer_loop(): Consumer created using the following parameters: {'auto_offset_reset': 'earliest', 'max_poll_records': 100, 'bootstrap_servers': '127.0.0.1:57647'}
[90735]: 23-05-30 10:33:29.714 [INFO] fastkafka._components.aiokafka_consumer_loop: aiokafka_consumer_loop(): Consumer started.
[90735]: 23-05-30 10:33:29.714 [INFO] aiokafka.consumer.subscription_state: Updating subscribed topics to: frozense

## Application documentation

At the moment the documentation for multicluster app is not yet implemented, but it is under development and you can expecti it soon!

## Examples on how to use multiple broker configurations

### Example #1

In this section, we'll explore how you can effectively forward topics between different Kafka clusters, enabling seamless data synchronization for your applications.

Imagine having two Kafka clusters, namely `kafka_brokers_1` and `kafka_brokers_2`, each hosting its own set of topics and messages. Now, if you want to forward a specific topic (in this case: `preprocessed_signals`) from kafka_brokers_1 to kafka_brokers_2, FastKafka provides an elegant solution.

Let's examine the code snippet that configures our application for topic forwarding:

In [None]:
from pydantic import BaseModel, Field

from fastkafka import FastKafka

class TestMsg(BaseModel):
    msg: str = Field(...)

kafka_brokers_1 = dict(localhost=dict(url="server_1", port=9092))
kafka_brokers_2 = dict(localhost=dict(url="server_2", port=9092))

app = FastKafka(kafka_brokers=kafka_brokers_1)


@app.consumes(topic="preprocessed_signals")
async def on_preprocessed_signals_original(msg: TestMsg):
    await to_preprocessed_signals_forward(msg)


@app.produces(topic="preprocessed_signals", brokers=kafka_brokers_2)
async def to_preprocessed_signals_forward(data: TestMsg) -> TestMsg:
    return data

Here's how it works: our FastKafka application is configured to consume messages from `kafka_brokers_1` and process them in the `on_preprocessed_signals_original` function. We want to forward these messages to `kafka_brokers_2`. To achieve this, we define the `to_preprocessed_signals_forward` function as a producer, seamlessly producing the processed messages to the preprocessed_signals topic within the `kafka_brokers_2` cluster.

#### Testing

To test our FastKafka forwarding application, we can use our testing framework. Let's take a look at the testing code snippet:

In [None]:
from fastkafka.testing import Tester

async with Tester(app) as tester:
    await tester.mirrors[app.on_preprocessed_signals_original](TestMsg(msg="signal"))
    await tester.mirrors[app.to_preprocessed_signals_forward].assert_called(timeout=5)

23-05-30 10:33:40.969 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker._start() called
23-05-30 10:33:40.970 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker._patch_consumers_and_producers(): Patching consumers and producers!
23-05-30 10:33:40.971 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker starting
23-05-30 10:33:40.972 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'server_2:9092'}'
23-05-30 10:33:40.972 [INFO] fastkafka._testing.in_memory_broker: AIOKafkaProducer patched start() called()
23-05-30 10:33:40.982 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'server_1:9092'}'
23-05-30 10:33:40.982 [INFO] fastkafka._testing.in_memory_broker: AIOKafkaProducer patched start() called()
23-05-30 10:33:40.983 [INFO] fastkafka._components.aiokafka_consumer_loop: aiokafka_consumer_loop() starting...
23-05-30 10:33:40.984 [INFO]

With the help of the **Tester** object, we can simulate and verify the behavior of our FastKafka application. Here's how it works:

1. We create an instance of the **Tester** by passing in our *app* object, which represents our FastKafka application.

2. Using the **tester.mirrors** dictionary, we can send a message to a specific Kafka broker and topic combination. In this case, we use `tester.mirrors[app.on_preprocessed_signals_original]` to send a TestMsg message with the content "signal" to the appropriate Kafka broker and topic.

3. After sending the message, we can perform assertions on the mirrored function using `tester.mirrors[app.to_preprocessed_signals_forward].assert_called(timeout=5)`. This assertion ensures that the mirrored function has been called within a specified timeout period (in this case, 5 seconds).

### Example #2

In this section, we'll explore how you can effortlessly consume data from multiple sources, process it, and aggregate the results into a single topic on a specific cluster.

Imagine you have two Kafka clusters: **kafka_brokers_1** and **kafka_brokers_2**, each hosting its own set of topics and messages. Now, what if you want to consume data from both clusters, perform some processing, and produce the results to a single topic on **kafka_brokers_1**? FastKafka has got you covered!

Let's take a look at the code snippet that configures our application for aggregating multiple clusters:

In [None]:
from pydantic import BaseModel, Field

from fastkafka import FastKafka

class TestMsg(BaseModel):
    msg: str = Field(...)

kafka_brokers_1 = dict(localhost=dict(url="server_1", port=9092))
kafka_brokers_2 = dict(localhost=dict(url="server_2", port=9092))

app = FastKafka(kafka_brokers=kafka_brokers_1)


@app.consumes(topic="preprocessed_signals")
async def on_preprocessed_signals_1(msg: TestMsg):
    print(f"Default: {msg=}")
    await to_predictions(msg)


@app.consumes(topic="preprocessed_signals", brokers=kafka_brokers_2)
async def on_preprocessed_signals_2(msg: TestMsg):
    print(f"Specified: {msg=}")
    await to_predictions(msg)


@app.produces(topic="predictions")
async def to_predictions(prediction: TestMsg) -> TestMsg:
    print(f"Sending prediction: {prediction}")
    return [prediction]

Here's the idea: our FastKafka application is set to consume messages from the topic "preprocessed_signals" on **kafka_brokers_1** cluster, as well as from the same topic on **kafka_brokers_2** cluster. We have two consuming functions, `on_preprocessed_signals_1` and `on_preprocessed_signals_2`, that handle the messages from their respective clusters. These functions perform any required processing, in this case, just calling the to_predictions function.

The exciting part is that the to_predictions function acts as a producer, sending the processed results to the "predictions" topic on **kafka_brokers_1 cluster**. By doing so, we effectively aggregate the data from multiple sources into a single topic on a specific cluster.

This approach enables you to consume data from multiple Kafka clusters, process it, and produce the aggregated results to a designated topic. Whether you're generating predictions, performing aggregations, or any other form of data processing, FastKafka empowers you to harness the full potential of multiple clusters.

#### Testing

Let's take a look at the testing code snippet:

In [None]:
from fastkafka.testing import Tester

async with Tester(app) as tester:
    await tester.mirrors[app.on_preprocessed_signals_1](TestMsg(msg="signal"))
    await tester.mirrors[app.on_preprocessed_signals_2](TestMsg(msg="signal"))
    await tester.on_predictions.assert_called(timeout=5)

23-05-30 10:33:50.827 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker._start() called
23-05-30 10:33:50.827 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker._patch_consumers_and_producers(): Patching consumers and producers!
23-05-30 10:33:50.828 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker starting
23-05-30 10:33:50.829 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'server_1:9092'}'
23-05-30 10:33:50.829 [INFO] fastkafka._testing.in_memory_broker: AIOKafkaProducer patched start() called()
23-05-30 10:33:50.875 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'server_1:9092'}'
23-05-30 10:33:50.875 [INFO] fastkafka._testing.in_memory_broker: AIOKafkaProducer patched start() called()
23-05-30 10:33:50.876 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'ser

Here's how the code above works:

1. Within an `async with` block, create an instance of the Tester by passing in your app object, representing your FastKafka application.

2. Using the tester.mirrors dictionary, you can send messages to specific Kafka broker and topic combinations. In this case, we use `tester.mirrors[app.on_preprocessed_signals_1]` and `tester.mirrors[app.on_preprocessed_signals_2]` to send TestMsg messages with the content "signal" to the corresponding Kafka broker and topic combinations.

3. After sending the messages, you can perform assertions on the **on_predictions** function using `tester.on_predictions.assert_called(timeout=5)`. This assertion ensures that the on_predictions function has been called within a specified timeout period (in this case, 5 seconds).

### Example #3

In some scenarios, you may need to produce messages to multiple Kafka clusters simultaneously. FastKafka simplifies this process by allowing you to configure your application to produce messages to multiple clusters effortlessly. Let's explore how you can achieve this:

Consider the following code snippet that demonstrates producing messages to multiple clusters:

In [None]:
from pydantic import BaseModel, Field

from fastkafka import FastKafka

class TestMsg(BaseModel):
    msg: str = Field(...)

kafka_brokers_1 = dict(localhost=dict(url="server_1", port=9092))
kafka_brokers_2 = dict(localhost=dict(url="server_2", port=9092))

app = FastKafka(kafka_brokers=kafka_brokers_1)


@app.consumes(topic="preprocessed_signals")
async def on_preprocessed_signals(msg: TestMsg):
    print(f"{msg=}")
    await to_predictions_1(TestMsg(msg="prediction"))
    await to_predictions_2(TestMsg(msg="prediction"))


@app.produces(topic="predictions")
async def to_predictions_1(prediction: TestMsg) -> TestMsg:
    print(f"Sending prediction to s1: {prediction}")
    return [prediction]


@app.produces(topic="predictions", brokers=kafka_brokers_2)
async def to_predictions_2(prediction: TestMsg) -> TestMsg:
    print(f"Sending prediction to s2: {prediction}")
    return [prediction]

Here's what you need to know about producing to multiple clusters:

1. We define two Kafka broker configurations: **kafka_brokers_1** and **kafka_brokers_2**, representing different clusters with their respective connection details.

2. We create an instance of the FastKafka application, specifying **kafka_brokers_1** as the primary cluster for producing messages.

3. The `on_preprocessed_signals` function serves as a consumer, handling incoming messages from the "preprocessed_signals" topic. Within this function, we invoke two producer functions: `to_predictions_1` and `to_predictions_2`.

4. The `to_predictions_1` function sends predictions to the "predictions" topic on *kafka_brokers_1* cluster.

5. Additionally, the `to_predictions_2` function sends the same predictions to the "predictions" topic on *kafka_brokers_2* cluster. This allows for producing the same data to multiple clusters simultaneously.

By utilizing this approach, you can seamlessly produce messages to multiple Kafka clusters, enabling you to distribute data across different environments or leverage the strengths of various clusters.

Feel free to customize the producer functions as per your requirements, performing any necessary data transformations or enrichment before sending the predictions.

With FastKafka, producing to multiple clusters becomes a breeze, empowering you to harness the capabilities of multiple environments effortlessly.

#### Testing

Let's take a look at the testing code snippet:

In [None]:
from fastkafka.testing import Tester

async with Tester(app) as tester:
    await tester.to_preprocessed_signals(TestMsg(msg="signal"))
    await tester.mirrors[to_predictions_1].assert_called(timeout=5)
    await tester.mirrors[to_predictions_2].assert_called(timeout=5)

23-05-30 10:34:00.033 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker._start() called
23-05-30 10:34:00.034 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker._patch_consumers_and_producers(): Patching consumers and producers!
23-05-30 10:34:00.035 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker starting
23-05-30 10:34:00.036 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'server_1:9092'}'
23-05-30 10:34:00.037 [INFO] fastkafka._testing.in_memory_broker: AIOKafkaProducer patched start() called()
23-05-30 10:34:00.038 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'server_2:9092'}'
23-05-30 10:34:00.038 [INFO] fastkafka._testing.in_memory_broker: AIOKafkaProducer patched start() called()
23-05-30 10:34:00.052 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'ser

Here's how you can perform the necessary tests:

1. Within an async with block, create an instance of the **Tester** by passing in your app object, representing your FastKafka application.

2. Using the `tester.to_preprocessed_signals` method, you can send a TestMsg message with the content "signal".

3. After sending the message, you can perform assertions on the to_predictions_1 and to_predictions_2 functions using `tester.mirrors[to_predictions_1].assert_called(timeout=5)` and `tester.mirrors[to_predictions_2].assert_called(timeout=5)`. These assertions ensure that the respective producer functions have produced data to their respective topic/broker combinations.

By employing this testing approach, you can verify that the producing functions correctly send messages to their respective clusters. The testing framework provided by FastKafka enables you to ensure the accuracy and reliability of your application's producing logic.