# Data Streaming Infrastructure | Consumer
created by Dirk Derichsweiler November 2023


<img src="./media/data_streaming.gif" style="height:800px">

The Streaming Part of the demo shows retail stores publishing data to Data Fabric streams and then this data is transfered to EZUA. Refer to Notebook 1-Consumer.ipynb to follow how the data is transferred to EZUA. In this notebook is explained how the data is published to DF as shown in below picture.

<img src="./media/data_streaming_producer.gif" style="height:600px">

# create a Dataset and store it under folder data

First we need to create data which we then can later produce into DF. We are creating data for three different countries: Czech Republic, Germany, and Switzerland.
    
<img src="./media/czech.jpg" style="height:50px">

In [None]:
!python ../create_load_transform_data/create_csv.py -csv ./czech_sales_data_2019_2023.csv -c 'Czech Republic' -cu CZK -s 5 -sy 2019 -ey 2023

<img src="./media/germany.jpg" style="height:50px">

In [None]:
!python ../create_load_transform_data/create_csv.py -csv ./czech_sales_data_2019_2023.csv -c 'Germany' -cu EUR -s 15 -sy 2019 -ey 2023

<img src="./media/swiss.jpg" style="height:50px">

In [None]:
!python ../create_load_transform_data/create_csv.py -csv ./czech_sales_data_2019_2023.csv -c 'Swiss' -cu CHF -s 5 -sy 2019 -ey 2023

After we created this data now we want to publish the data into a Topic inside DF. We're using the topic _demo_ which we created in advance via the DF GUI. We're writing row by row into the Topic in the function copy_csv_file using a KafkaProducer.
```python
    producer = KafkaProducer(bootstrap_servers=kafka_servers,
                             security_protocol='SASL_PLAINTEXT',
                             sasl_mechanism='PLAIN',
                             sasl_plain_username=user,
                             sasl_plain_password=pw)
```
    
With this we are using one of the several Data Fabric capabilities which is Streaming.

In [None]:
from kafka import KafkaProducer
import csv

def copy_csv_file(input_file, output_topic, kafka_servers, completion_keyword):
    producer = KafkaProducer(bootstrap_servers=kafka_servers,
                             security_protocol='SASL_PLAINTEXT',
                             sasl_mechanism='PLAIN',
                             sasl_plain_username=user,
                             sasl_plain_password=pw)

    with open(input_file, 'r') as file:
        csv_reader = csv.reader(file)

        ## Skip header if present
        #header = next(csv_reader, None)

        for row in csv_reader:
            # Convert row to string
            message = ','.join(row).encode('utf-8')
            # Publish the row as a message to the Kafka topic
            producer.send(output_topic, value=message)

        # Publish completion keyword
        producer.send(output_topic, value=completion_keyword.encode('utf-8'))

    producer.flush()
    producer.close()

# Configuration
input_file = 'data/Germany_sales_data_2019_2023.csv'
user = 'mapr'
pw = 'mapr123'
output_topic = 'demo'
kafka_servers = '10.1.84.129:9092,10.1.84.130:9092'
completion_keyword = 'endofdemo'

# Copy CSV file to Kafka topic
copy_csv_file(input_file, output_topic, kafka_servers, completion_keyword)