Ibis Flink backend example

This repository contains a simple, self-contained example using the Apache Flink backend for Ibis.

Installation prerequisites

Docker Compose: This tutorial uses Docker Compose to manage an Apache Kafka environment (including sample data generation) and a Flink cluster (for remote execution). You can download and install Docker Compose from the official website.
JDK 11: Flink requires Java 11. If you don't already have JDK 11 installed, you can get the appropriate Eclipse Temurin release.
Python: To follow along, you need Python 3.9 or 3.10.

Installing the Flink backend for Ibis

We strongly recommend creating a virtual environment for your project. In your virtual environment, install the dependencies from the requirements file:

python -m pip install -r requirements.txt

Caution

The Flink backend for Ibis is unreleased. Please check back in early 2024 if you feel more comfortable using a released version.

Spinning up the services using Docker Compose

From your project directory, run docker compose up to create Kafka topics, generate sample data, and launch a Flink cluster.

Tip

If you don't intend to try remote execution, you can start only the Kafka-related services with docker compose up kafka init-kafka data-generator.

After a few seconds, you should see messages indicating your Kafka environment is ready:

ibis-flink-example-init-kafka-1      | Successfully created the following topics:
ibis-flink-example-init-kafka-1      | payment_msg
ibis-flink-example-init-kafka-1      | sink
ibis-flink-example-init-kafka-1 exited with code 0
ibis-flink-example-data-generator-1  | Connected to Kafka
ibis-flink-example-data-generator-1  | Producing 20000 records to Kafka topic payment_msg

The payment_msg Kafka topic contains messages in the following format:

{
    "createTime": "2023-09-20 22:19:02.224",
    "orderId": 1695248388,
    "payAmount": 88694.71922270155,
    "payPlatform": 0,
    "provinceId": 6
}

In a separate terminal, we can explore what these messages look like:

>>> from kafka import KafkaConsumer
>>>
>>> consumer = KafkaConsumer("payment_msg")
>>> for _, msg in zip(range(3), consumer):
...     print(msg)
... 
ConsumerRecord(topic='payment_msg', partition=0, offset=628, timestamp=1702073942808, timestamp_type=0, key=None, value=b'{"createTime": "2023-12-08 22:19:02.808", "orderId": 1702074256, "payAmount": 79901.88673289565, "payPlatform": 1, "provinceId": 1}', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=131, serialized_header_size=-1)
ConsumerRecord(topic='payment_msg', partition=0, offset=629, timestamp=1702073943310, timestamp_type=0, key=None, value=b'{"createTime": "2023-12-08 22:19:03.309", "orderId": 1702074257, "payAmount": 34777.62234573957, "payPlatform": 0, "provinceId": 3}', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=131, serialized_header_size=-1)
ConsumerRecord(topic='payment_msg', partition=0, offset=630, timestamp=1702073943811, timestamp_type=0, key=None, value=b'{"createTime": "2023-12-08 22:19:03.810", "orderId": 1702074258, "payAmount": 17101.347666982423, "payPlatform": 0, "provinceId": 2}', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=132, serialized_header_size=-1)

Running the example

The included window aggregation example uses the Flink backend for Ibis to process the aforementioned payment messages, computing the total pay amount per province in the past 10 seconds (as of each message, for the province in the incoming message).

Local execution

You can run the example using the Flink mini cluster from your project directory:

python window_aggregation.py local

Within a few seconds, you should see messages from the sink topic (containing the results of your computation) flooding the screen.

Note

The mini cluster shuts down as soon as the Python session ends, so we print the Kafka messages until the process is cancelled (e.g. with Ctrl+C).

Remote execution

You can also submit the example to the remote cluster started using Docker Compose. We will use the method described in the official Flink documentation.

Tip

You can find the ./bin/flink executable with the following command:

python -c'from pathlib import Path; import pyflink; print(Path(pyflink.__spec__.origin).parent / "bin" / "flink")'

My full command looks like this:

/opt/miniconda3/envs/ibis-dev/lib/python3.10/site-packages/pyflink/bin/flink run --jobmanager localhost:8081 --python window_aggregation.py

The command will exit after displaying a submission message:

Job has been submitted with JobID b816faaf5ef9126ea5b9b6a37012cf56

Similar to how we viewed messages in the payment_msg topic, we can print results from the sink topic:

>>> from kafka import KafkaConsumer
>>> 
>>> consumer = KafkaConsumer("sink")
>>> for _, msg in zip(range(10), consumer):
...     print(msg)
... 
ConsumerRecord(topic='sink', partition=0, offset=8264, timestamp=1702076548075, timestamp_type=0, key=None, value=b'{"province_id":1,"pay_amount":102381.88254099473}', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=49, serialized_header_size=-1)
ConsumerRecord(topic='sink', partition=0, offset=8265, timestamp=1702076548480, timestamp_type=0, key=None, value=b'{"province_id":1,"pay_amount":114103.59313794877}', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=49, serialized_header_size=-1)
ConsumerRecord(topic='sink', partition=0, offset=8266, timestamp=1702076549085, timestamp_type=0, key=None, value=b'{"province_id":5,"pay_amount":65711.48588438489}', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=48, serialized_header_size=-1)
ConsumerRecord(topic='sink', partition=0, offset=8267, timestamp=1702076549488, timestamp_type=0, key=None, value=b'{"province_id":3,"pay_amount":388965.01567530684}', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=49, serialized_header_size=-1)
ConsumerRecord(topic='sink', partition=0, offset=8268, timestamp=1702076550098, timestamp_type=0, key=None, value=b'{"province_id":4,"pay_amount":151524.24311058817}', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=49, serialized_header_size=-1)
ConsumerRecord(topic='sink', partition=0, offset=8269, timestamp=1702076550502, timestamp_type=0, key=None, value=b'{"province_id":2,"pay_amount":290018.422116076}', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=47, serialized_header_size=-1)
ConsumerRecord(topic='sink', partition=0, offset=8270, timestamp=1702076550910, timestamp_type=0, key=None, value=b'{"province_id":5,"pay_amount":47098.24626524143}', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=48, serialized_header_size=-1)
ConsumerRecord(topic='sink', partition=0, offset=8271, timestamp=1702076551516, timestamp_type=0, key=None, value=b'{"province_id":4,"pay_amount":155309.68873659955}', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=49, serialized_header_size=-1)
ConsumerRecord(topic='sink', partition=0, offset=8272, timestamp=1702076551926, timestamp_type=0, key=None, value=b'{"province_id":2,"pay_amount":367397.8759861871}', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=48, serialized_header_size=-1)
ConsumerRecord(topic='sink', partition=0, offset=8273, timestamp=1702076552530, timestamp_type=0, key=None, value=b'{"province_id":3,"pay_amount":182191.45302431137}', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=49, serialized_header_size=-1)

Shutting down the Compose environment

Press Ctrl+C to stop the Docker Compose containers. Once stopped, run docker compose down to remove the services created for this tutorial.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Dockerfile		Dockerfile
README.md		README.md
compose.yaml		compose.yaml
data-generator.Dockerfile		data-generator.Dockerfile
flink-sql-connector-kafka-3.0.2-1.18.jar		flink-sql-connector-kafka-3.0.2-1.18.jar
generate_source_data.py		generate_source_data.py
requirements.txt		requirements.txt
window_aggregation.py		window_aggregation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dockerfile

Dockerfile

README.md

README.md

compose.yaml

compose.yaml

data-generator.Dockerfile

data-generator.Dockerfile

flink-sql-connector-kafka-3.0.2-1.18.jar

flink-sql-connector-kafka-3.0.2-1.18.jar

generate_source_data.py

generate_source_data.py

requirements.txt

requirements.txt

window_aggregation.py

window_aggregation.py

Repository files navigation

Ibis Flink backend example

Installation prerequisites

Installing the Flink backend for Ibis

Spinning up the services using Docker Compose

Running the example

Local execution

Remote execution

Shutting down the Compose environment

About

Releases

Packages

Languages

claypotai/ibis-flink-example

Folders and files

Latest commit

History

Repository files navigation

Ibis Flink backend example

Installation prerequisites

Installing the Flink backend for Ibis

Spinning up the services using Docker Compose

Running the example

Local execution

Remote execution

Shutting down the Compose environment

About

Resources

Stars

Watchers

Forks

Languages