# Using multiple kafka clusters

In [None]:
# | hide

import pytest
from IPython.display import Markdown as md

from pydantic import BaseModel, Field

from fastkafka import FastKafka
from fastkafka.testing import Tester

Hey there! Are you ready to take your FastKafka application to the next level? In this guide, we'll show you how to connect your application to multiple Kafka clusters like a pro! :rocket:

Imagine this: you have different Kafka clusters running, each with its own set of topics and messages. But what if you want to bring all those topics together into one, or mirror them across clusters? Or perhaps you want to produce messages to multiple clusters simultaneously? That's where this guide comes in handy!

We'll walk you through the process of seamlessly integrating your FastKafka application with multiple Kafka clusters. You'll learn how to aggregate topics from different clusters into a single one, mirror topics across clusters, and even produce messages to multiple clusters all at once!

So buckle up and get ready for an exciting journey into the world of multi-cluster connectivity with FastKafka. Let's dive right in and unlock the full potential of your Kafka-powered application! :muscle:

### Test message

To showcase the functionalities of FastKafka and illustrate the concepts discussed, we can use a simple test message called `TestMsg`. This message serves as a basic representation of the data that can be processed and exchanged within your FastKafka application. Here's the definition of the `TestMsg` class:

In [None]:
class TestMsg(BaseModel):
    msg: str = Field(...)

## Defining multiple broker configurations

When building a FastKafka application, you may need to consume messages from multiple Kafka clusters, each with its own set of broker configurations. FastKafka provides the flexibility to define different broker clusters using the brokers argument in the consumes decorator. Let's explore an example code snippet

In [None]:
kafka_brokers_1 = dict(localhost=dict(url="server_1", port=9092))
kafka_brokers_2 = dict(localhost=dict(url="server_2", port=9092))

app = FastKafka(kafka_brokers=kafka_brokers_1)


@app.consumes(topic="preprocessed_signals")
async def on_preprocessed_signals_1(msg: TestMsg):
    print(f"Default: {msg=}")


@app.consumes(topic="preprocessed_signals", brokers=kafka_brokers_2)
async def on_preprocessed_signals_2(msg: TestMsg):
    print(f"Specified: {msg=}")

In this example, we have two broker configurations: **kafka_brokers_1** and **kafka_brokers_2**. The **kafka_brokers_1** configuration represents the default or primary cluster, while **kafka_brokers_2** is an alternative cluster that can be specified in the decorator.

The app object is initialized with the primary broker configuration (kafka_brokers_1) using the FastKafka class.

The first `@app.consumes` decorator without the brokers argument represents the default behavior. The `on_preprocessed_signals_1` function consumes messages from the "preprocessed_signals" topic using the primary broker configuration. Any messages received will be processed accordingly.

The second `@app.consumes` decorator includes the `brokers=kafka_brokers_2` argument, allowing you to explicitly specify the broker cluster to consume messages from. The `on_preprocessed_signals_2` function consumes messages from the same "preprocessed_signals" topic but using the **kafka_brokers_2** configuration. This provides the flexibility to consume from different clusters based on your requirements.

Similarly, you can use the brokers argument in the `@app.produces` decorator to define multiple broker clusters for producing messages.

It's important to note that when defining multiple broker configurations, all configurations must have the same set of required configurations as the primary cluster. This ensures consistent behavior across clusters.


Now let's go through some use cases on how to use the `brokers` decorator argument to enable your FastKafka app to work with multiple Kafka broker clusters.

## Mirroring topics

In this section, we'll explore how you can effectively mirror topics between different Kafka clusters, enabling seamless data synchronization for your applications.

Imagine having two Kafka clusters, namely **kafka_brokers_1** and **kafka_brokers_2**, each hosting its own set of topics and messages. Now, if you want to mirror a specific topic (in this case: *preprocessed_signals*) from kafka_brokers_1 to kafka_brokers_2, ensuring data consistency across clusters, FastKafka provides a powerful solution.

Let's examine the code snippet that configures our application for topic mirroring:

In [None]:
kafka_brokers_1 = dict(localhost=dict(url="server_1", port=9092))
kafka_brokers_2 = dict(localhost=dict(url="server_2", port=9092))

app = FastKafka(kafka_brokers=kafka_brokers_1)


@app.consumes(topic="preprocessed_signals")
async def on_preprocessed_signals_original(msg: TestMsg):
    await to_preprocessed_signals_mirror(msg)


@app.produces(topic="preprocessed_signals", brokers=kafka_brokers_2)
async def to_preprocessed_signals_mirror(data: TestMsg) -> TestMsg:
    return data

Here's how it works: our FastKafka application is configured to consume messages from **kafka_brokers_1** and process them in the **on_preprocessed_signals_original** function. We want to mirror these messages to **kafka_brokers_2**. To achieve this, we define the **to_preprocessed_signals_mirror** function as a producer, seamlessly producing the processed messages to the preprocessed_signals topic within the kafka_brokers_2 cluster.

### Testing the app

To test our FastKafka 'mirroring' application, we can use our testing framework. Let's take a look at the testing code snippet:

In [None]:
async with Tester(app) as tester:
    await tester.mirrors[app.on_preprocessed_signals_original](TestMsg(msg="signal"))
    await tester.mirrors[app.to_preprocessed_signals_mirror].assert_called(timeout=5)

23-05-29 11:12:20.203 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker._start() called
23-05-29 11:12:20.204 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker._patch_consumers_and_producers(): Patching consumers and producers!
23-05-29 11:12:20.205 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker starting
23-05-29 11:12:20.206 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'server_2:9092'}'
23-05-29 11:12:20.206 [INFO] fastkafka._testing.in_memory_broker: AIOKafkaProducer patched start() called()
23-05-29 11:12:20.217 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'server_1:9092'}'
23-05-29 11:12:20.217 [INFO] fastkafka._testing.in_memory_broker: AIOKafkaProducer patched start() called()
23-05-29 11:12:20.218 [INFO] fastkafka._components.aiokafka_consumer_loop: aiokafka_consumer_loop() starting...
23-05-29 11:12:20.219 [INFO]

With the help of the **Tester** object, we can simulate and verify the behavior of our FastKafka application. Here's how it works:

1. We create an instance of the **Tester** by passing in our *app* object, which represents our FastKafka application.

2. Using the **tester.mirrors** dictionary, we can send a message to a specific Kafka broker and topic combination. In this case, we use `tester.mirrors[app.on_preprocessed_signals_original]` to send a TestMsg message with the content "signal" to the appropriate Kafka broker and topic.

3. After sending the message, we can perform assertions on the mirrored function using `tester.mirrors[app.to_preprocessed_signals_mirror].assert_called(timeout=5)`. This assertion ensures that the mirrored function has been called within a specified timeout period (in this case, 5 seconds).

## Agregate multiple clusters

In this section, we'll explore how you can effortlessly consume data from multiple sources, process it, and aggregate the results into a single topic on a specific cluster.

Imagine you have two Kafka clusters: **kafka_brokers_1** and **kafka_brokers_2**, each hosting its own set of topics and messages. Now, what if you want to consume data from both clusters, perform some processing, and produce the results to a single topic on **kafka_brokers_1**? FastKafka has got you covered!

Let's take a look at the code snippet that configures our application for aggregating multiple clusters:

In [None]:
kafka_brokers_1 = dict(localhost=dict(url="server_1", port=9092))
kafka_brokers_2 = dict(localhost=dict(url="server_2", port=9092))

app = FastKafka(kafka_brokers=kafka_brokers_1)


@app.consumes(topic="preprocessed_signals")
async def on_preprocessed_signals_1(msg: TestMsg):
    print(f"Default: {msg=}")
    await to_predictions(msg)


@app.consumes(topic="preprocessed_signals", brokers=kafka_brokers_2)
async def on_preprocessed_signals_2(msg: TestMsg):
    print(f"Specified: {msg=}")
    await to_predictions(msg)


@app.produces(topic="predictions")
async def to_predictions(prediction: TestMsg) -> TestMsg:
    print(f"Sending prediction: {prediction}")
    return [prediction]

Here's the idea: our FastKafka application is set to consume messages from the topic "preprocessed_signals" on **kafka_brokers_1** cluster, as well as from the same topic on **kafka_brokers_2** cluster. We have two consuming functions, `on_preprocessed_signals_1` and `on_preprocessed_signals_2`, that handle the messages from their respective clusters. These functions perform any required processing, in this case, just calling the to_predictions function.

The exciting part is that the to_predictions function acts as a producer, sending the processed results to the "predictions" topic on **kafka_brokers_1 cluster**. By doing so, we effectively aggregate the data from multiple sources into a single topic on a specific cluster.

This approach enables you to consume data from multiple Kafka clusters, process it, and produce the aggregated results to a designated topic. Whether you're generating predictions, performing aggregations, or any other form of data processing, FastKafka empowers you to harness the full potential of multiple clusters.

### Testing the app

Let's take a look at the testing code snippet:

In [None]:
async with Tester(app) as tester:
    await tester.mirrors[app.on_preprocessed_signals_1](TestMsg(msg="signal"))
    await tester.mirrors[app.on_preprocessed_signals_2](TestMsg(msg="signal"))
    await tester.on_predictions.assert_called(timeout=5)

23-05-29 11:12:24.257 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker._start() called
23-05-29 11:12:24.257 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker._patch_consumers_and_producers(): Patching consumers and producers!
23-05-29 11:12:24.258 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker starting
23-05-29 11:12:24.259 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'server_1:9092'}'
23-05-29 11:12:24.260 [INFO] fastkafka._testing.in_memory_broker: AIOKafkaProducer patched start() called()
23-05-29 11:12:24.273 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'server_1:9092'}'
23-05-29 11:12:24.274 [INFO] fastkafka._testing.in_memory_broker: AIOKafkaProducer patched start() called()
23-05-29 11:12:24.274 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'ser

Here's how the code above works:

1. Within an `async with` block, create an instance of the Tester by passing in your app object, representing your FastKafka application.

2. Using the tester.mirrors dictionary, you can send messages to specific Kafka broker and topic combinations. In this case, we use `tester.mirrors[app.on_preprocessed_signals_1]` and `tester.mirrors[app.on_preprocessed_signals_2]` to send TestMsg messages with the content "signal" to the corresponding Kafka broker and topic combinations.

3. After sending the messages, you can perform assertions on the **on_predictions** function using `tester.on_predictions.assert_called(timeout=5)`. This assertion ensures that the on_predictions function has been called within a specified timeout period (in this case, 5 seconds). 


!!! info \"Syntax sugar and topic ambiguity\"

    Since the "predictions" topic exists only on kafka_brokers_1, there is no ambiguity to resolve, and you can directly use the tester.on_predictions syntax.

## Producing to multiple clusters

In some scenarios, you may need to produce messages to multiple Kafka clusters simultaneously. FastKafka simplifies this process by allowing you to configure your application to produce messages to multiple clusters effortlessly. Let's explore how you can achieve this:

Consider the following code snippet that demonstrates producing messages to multiple clusters:

In [None]:
kafka_brokers_1 = dict(localhost=dict(url="server_1", port=9092))
kafka_brokers_2 = dict(localhost=dict(url="server_2", port=9092))

app = FastKafka(kafka_brokers=kafka_brokers_1)


@app.consumes(topic="preprocessed_signals")
async def on_preprocessed_signals(msg: TestMsg):
    print(f"{msg=}")
    await to_predictions_1(TestMsg(msg="prediction"))
    await to_predictions_2(TestMsg(msg="prediction"))


@app.produces(topic="predictions")
async def to_predictions_1(prediction: TestMsg) -> TestMsg:
    print(f"Sending prediction: {prediction}")
    return [prediction]


@app.produces(topic="predictions", brokers=kafka_brokers_2)
async def to_predictions_2(prediction: TestMsg) -> TestMsg:
    print(f"Sending prediction: {prediction}")
    return [prediction]

Here's what you need to know about producing to multiple clusters:

1. We define two Kafka broker configurations: **kafka_brokers_1** and **kafka_brokers_2**, representing different clusters with their respective connection details.

2. We create an instance of the FastKafka application, specifying **kafka_brokers_1** as the primary cluster for producing messages.

3. The `on_preprocessed_signals` function serves as a consumer, handling incoming messages from the "preprocessed_signals" topic. Within this function, we invoke two producer functions: `to_predictions_1` and `to_predictions_2`.

4. The `to_predictions_1` function sends predictions to the "predictions" topic on *kafka_brokers_1* cluster.

5. Additionally, the `to_predictions_2` function sends the same predictions to the "predictions" topic on *kafka_brokers_2* cluster. This allows for producing the same data to multiple clusters simultaneously.

By utilizing this approach, you can seamlessly produce messages to multiple Kafka clusters, enabling you to distribute data across different environments or leverage the strengths of various clusters.

Feel free to customize the producer functions as per your requirements, performing any necessary data transformations or enrichment before sending the predictions.

With FastKafka, producing to multiple clusters becomes a breeze, empowering you to harness the capabilities of multiple environments effortlessly.

### Testing the app

Let's take a look at the testing code snippet:

In [None]:
async with Tester(app) as tester:
    await tester.to_preprocessed_signals(TestMsg(msg="signal"))
    await tester.mirrors[to_predictions_1].assert_called(timeout=5)
    await tester.mirrors[to_predictions_2].assert_called(timeout=5)

23-05-29 11:12:28.309 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker._start() called
23-05-29 11:12:28.309 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker._patch_consumers_and_producers(): Patching consumers and producers!
23-05-29 11:12:28.310 [INFO] fastkafka._testing.in_memory_broker: InMemoryBroker starting
23-05-29 11:12:28.310 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'server_1:9092'}'
23-05-29 11:12:28.311 [INFO] fastkafka._testing.in_memory_broker: AIOKafkaProducer patched start() called()
23-05-29 11:12:28.311 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'server_2:9092'}'
23-05-29 11:12:28.312 [INFO] fastkafka._testing.in_memory_broker: AIOKafkaProducer patched start() called()
23-05-29 11:12:28.326 [INFO] fastkafka._application.app: _create_producer() : created producer using the config: '{'bootstrap_servers': 'ser

Here's how you can perform the necessary tests:

1. Within an async with block, create an instance of the **Tester** by passing in your app object, representing your FastKafka application.

2. Using the `tester.to_preprocessed_signals` method, you can send a TestMsg message with the content "signal".

3. After sending the message, you can perform assertions on the to_predictions_1 and to_predictions_2 functions using `tester.mirrors[to_predictions_1].assert_called(timeout=5)` and `tester.mirrors[to_predictions_2].assert_called(timeout=5)`. These assertions ensure that the respective producer functions have produced data to their respective topic/broker combinations.

By employing this testing approach, you can verify that the producing functions correctly send messages to their respective clusters. The testing framework provided by FastKafka enables you to ensure the accuracy and reliability of your application's producing logic.