founder: Dirk Derichsweiler, Contributors: Vincent Charbonnier and Isabelle Steinhauser, September 2023
# Data Streaming Infrastructure | Consumer




<img src="./media/data_streaming.gif" style="height:800px">

The Streaming Part of the demo shows retail stores publishing data to Data Fabric streams and then this data is trasnferred to EZUA. Refer to Notebook _1-Producer.ipynb_ to follow how the data is published. In this notebook is explained how the data is transferred to UA as shown in below picture.

# transfer data to Ezmeral Unified Analytics infrastructure

<img src="./media/data_streaming_consumer.gif" style="height:600px">

In order to transfer the data to EZUA there are 2 steps needed:
    
    1. get the data out of the DF stream
    2. get the data into EZUA
    
In order to achieve __1__ we're using a __Kafka Consumer__ to connect to the DF and consume the messages in the DF like this:
```python
consumer = KafkaConsumer(input_topic, bootstrap_servers=kafka_servers,
                         security_protocol='SASL_PLAINTEXT',
                         sasl_mechanism='PLAIN',
                         sasl_plain_username=user,
                         sasl_plain_password=pw)
```
These messages are written into a CSV file (function _consume_and_write_csv_).
  
  
To achieve __2__ we are saving the CSV file in which we stored all the messages into S3 inside EZUA (function _upload_csv_to_s3_).
You can find the whole script in the cell below.

In [None]:
from kafka import KafkaConsumer  # Import KafkaConsumer to interact with Kafka
import csv  # Import csv module for reading and writing CSV files
import boto3  # Import boto3 library for interacting with AWS services
import urllib3  # Import urllib3 library for disabling warnings
import os  # Import os module for file operations

urllib3.disable_warnings()  # Disable warnings from urllib3 library

# Define a function to consume messages from Kafka and write to CSV file
def consume_and_write_csv(consumer, output_file, completion_keyword):
    with open(output_file, 'w', newline='') as file:  # Open output file in write mode
        csv_writer = csv.writer(file)  # Create a CSV writer object

        for message in consumer:  # Iterate over messages received from Kafka
            row = message.value.decode('utf-8')  # Decode the message value from bytes to string

            if row == completion_keyword:  # Check if the message indicates completion
                break  # Exit the loop if completion keyword is received

            fields = row.split(',')  # Split the row string into fields
            csv_writer.writerow(fields)  # Write the fields as a row in the CSV file

# Define a function to upload CSV file to S3 bucket
def upload_csv_to_s3(input_file, bucket_name, object_key, username, password, endpoint):
    session = boto3.Session(
        aws_access_key_id=username,
        aws_secret_access_key=password
    )
    s3 = session.client('s3', endpoint_url=f"{endpoint}:{port}", verify=False)  # Create an S3 client

    with open(input_file, 'rb') as file:  # Open the input file in binary read mode
        s3.upload_fileobj(file, bucket_name, object_key)  # Upload the file to S3 bucket

    # Delete the file after successful transfer
    # os.remove(input_file)

# Configuration parameters
# Configuration
output_file = 'output.csv'  # Path to the output CSV file
input_topic = 'demo'  # Kafka topic to consume messages from
kafka_servers = '10.1.84.129:9092,10.1.84.130:9092'  # Kafka server address and port
completion_keyword = 'endofdemo'  # Keyword to mark the end of data stream
bucket_name = 'ezaf-demo'  # Name of the S3 bucket
object_key = 'output.csv'  # Key of the object in S3 bucket
endpoint = 'https://home.ezua-cb.ezmeral.demo.local'  # S3-compatible endpoint
port = '31900'  # Port for the S3-compatible endpoint
aws_access_key_id = 'minioadmin'  # AWS access key ID for authentication
aws_secret_key = 'minioadmin'  # AWS secret key for authentication
user = 'mapr'
pw = 'mapr123'

# Create a Kafka consumer
consumer = KafkaConsumer(input_topic, bootstrap_servers=kafka_servers,
                         security_protocol='SASL_PLAINTEXT',
                         sasl_mechanism='PLAIN',
                         sasl_plain_username=user,
                         sasl_plain_password=pw)

# Consume messages and write to CSV file
consume_and_write_csv(consumer, output_file, completion_keyword)

# Close the consumer
consumer.close()

# Move the file to S3 bucket
upload_csv_to_s3(output_file, bucket_name, object_key, aws_access_key_id, aws_secret_key, endpoint)

print("File transferred to " + endpoint + ":" + port + " successfully.")


# Import Data

In [None]:
python ../create_load_transform_data/import_data.py -db mysql -H ddk3s.westcentralus.cloudapp.azure.com -u root -p nfWCEHWNDe -P 31870 -d db_g1 -t sales_data -c "./data/Czech Republic_sales_data_2019_2023.csv"

# VIDEO

In [13]:
%%HTML
<video width="1024" height="768" controls>
  <source src="../videos/1-faster-no-sound.mp4" type="video/mp4">
</video>