# Transform data streams with Apache Flink®

The goal of this additional tutorial is to showcase how data in Apache Kafka can be read and manipulated using Apache Flink.

## What is Apache Flink?

Apache Flink is stream processor that allows you to transmit a subset of your data based on a query.

This allows you to run your normal processes and analyze your messages in real-time and output 

## Requirements

In order to execute this tutorial you need an Aiven for Apache Flink service. Instructions for this setup are detailed in the [Workshop README](README.md).

After a couple of minutes, the service will be up and running.

## Create a Filtering pipeline

In this section we'll create a new streaming data pipeline to filter pizza `Hawaii` from the `pizzas` topic. To create such pipeline:

1. Navigate to the Aiven for Apache Kafka service page, **Topics** tab
2. Create a new topic called `pizzaFiltered`
3. Navigate to the Aiven for Apache Flink service page
4. Create an integration between the Aiven for Apache Flink service and the Aiven for Apache Kafka service
5. Navigate to the **Applications** tab
6. Create a new Application named `filtering`
7. Create a new version
8. In the source table section:
    * Select the integration with Aiven for Apache Kafka
    * Include the following SQL

      ```sql
      CREATE TABLE pizzas(
        id INT,
        name STRING,
        pizza STRING
      )
      WITH (
        'connector' = 'kafka',
        'topic' = 'pizzas',
        'value.format' = 'json',
        'properties.bootstrap.servers' = '',
        'scan.startup.mode' = 'earliest-offset'
      )
      ```
9. In the sink table section:
    * Select the integration with Aiven for Apache Kafka
    * Include the following SQL

      ```sql
      CREATE TABLE pizzas_filtered(
        id INT,
        name STRING,
        pizza STRING
      )
      WITH (
        'connector' = 'kafka',
        'topic' = 'pizzaFiltered',
        'value.format' = 'json',
        'properties.bootstrap.servers' = '',
        'scan.startup.mode' = 'earliest-offset'
      )
      ```
10. In the transformation sql section include the following SQL

    ```sql
    INSERT INTO pizzas_filtered 
    SELECT * FROM pizzas WHERE pizza LIKE 'Hawaii%'
    ```
11. Click **Save and deploy later**
12. Click **Create deployment**
13. Click **Deploy without savepoint**
14. Check in Aiven for Apache Kafka that the data is filled in the `pizzaFiltered` topic with only pizzas Hawaii

![Consume messages from an Apache Kafka Topic](../img/hawaii.png)

## Congratulations 🥳

We've ran through all of our learnings, but there is one last step. Click to [Power Down your Services to save energy and compute resources](7-power-down-services.ipynb)