# 1. Producing the data
In this task, we will implement one Apache Kafka producer to simulate real-time data streaming. Spark is not allowed in this part since it’s simulating a streaming data source.

1.1 Your program should send one batch of click_stream data every 5 seconds. One batch consists of a random 500-1000 rows from the clickstream_rt dataset. The CSV shouldn’t be loaded to memory at once to conserve memory (i.e. Read row as needed).  
1.2 For each row, add an integer column named ‘ts’, a Unix timestamp in seconds since the epoch (UTC timezone). Spead your batch out evenly for 5 seconds.  
For example, if you send a batch of 600 records at 2023-09-01 00:00:00 (ISO format: YYYY-MM-DD HH:MM:SS) -> (ts = 1693526400) :  
Record 1-120: ts = 1693526400  
Record 121-240: ts = 1693526401  
Record 241-360: ts = 1693526402  
….  
1.3 Send your batch to a Kafka topic with an appropriate name.  

All the data except for the ‘ts’ column should be sent in the original String type, without changing to any other types.  


#### We are reading data from a CSV file and publishing it as messages to a Kafka topic using the Confluent Kafka Producer. 

In [None]:
import time
import random
import csv
import json
from confluent_kafka import Producer



kafka_config = {
    'bootstrap.servers': '172.20.10.5:9092',  
}
kafka_topic = 'ass2a'  
batch_size = random.randint(500, 1000)
sleep_interval = 5
csv_file_path = 'click_stream_rt.csv'



#This function is used to send a batch of records to a Kafka topic. 
#It generates a unique key for each record in the batch
# uses the Kafka producer to send them to the specified topic. 

def send_kafka_batch(producer, kafka_topic, batch, batch_number):
    for i, row in enumerate(batch, 1):
        key = f'batch_{batch_number}_record_{i}'
        producer.produce(kafka_topic, key=key, value=row)
    producer.flush()

    
    
#This function is used to assign unique timestamps to records in a batch before sending them to a Kafka topic.
def convert_to_unix_timestamp(start_time, record_index):
    return start_time + record_index


#a. It reads a batch of records from a CSV 
#b. For each record read from the CSV file, it adds a new field 'ts' to the record timestamp. 
#c. The record is converted to a JSON string using json.dumps and added to the batch list.

if __name__ == '__main__':
    producer = Producer(kafka_config)
    batch_number = 1  

    while True:
        try:
            # Read a random batch from the CSV file
            batch = []
            with open(csv_file_path, 'r') as csv_file:
                reader = csv.DictReader(csv_file)
                for i, row in enumerate(reader, start=1):
                    # Add 'ts' column with Unix timestamp
                    row['ts'] = convert_to_unix_timestamp(int(time.time()), i)
                    batch.append(json.dumps(row))

                    if i % batch_size == 0:
                        # Send the batch to Kafka with a unique key
                        send_kafka_batch(producer, kafka_topic, batch, batch_number)
                        print(f'Sent batch {batch_number} - {len(batch)} records to Kafka.')
                        batch_number += 1
                        batch = []

                    # Sleep to spread records over 5 seconds
                    time.sleep(5 / batch_size)

        except Exception as e:
            print(f'Error: {e}')

        time.sleep(sleep_interval)


Sent batch 1 - 682 records to Kafka.
Sent batch 2 - 682 records to Kafka.
Sent batch 3 - 682 records to Kafka.
Sent batch 4 - 682 records to Kafka.
Sent batch 5 - 682 records to Kafka.
Sent batch 6 - 682 records to Kafka.
Sent batch 7 - 682 records to Kafka.
Sent batch 8 - 682 records to Kafka.
Sent batch 9 - 682 records to Kafka.
Sent batch 10 - 682 records to Kafka.
Sent batch 11 - 682 records to Kafka.
Sent batch 12 - 682 records to Kafka.
Sent batch 13 - 682 records to Kafka.
Sent batch 14 - 682 records to Kafka.
Sent batch 15 - 682 records to Kafka.
Sent batch 16 - 682 records to Kafka.
Sent batch 17 - 682 records to Kafka.
Sent batch 18 - 682 records to Kafka.
Sent batch 19 - 682 records to Kafka.
Sent batch 20 - 682 records to Kafka.
Sent batch 21 - 682 records to Kafka.
Sent batch 22 - 682 records to Kafka.
Sent batch 23 - 682 records to Kafka.
Sent batch 24 - 682 records to Kafka.
Sent batch 25 - 682 records to Kafka.
Sent batch 26 - 682 records to Kafka.
Sent batch 27 - 682 r

Sent batch 214 - 682 records to Kafka.
Sent batch 215 - 682 records to Kafka.
Sent batch 216 - 682 records to Kafka.
Sent batch 217 - 682 records to Kafka.
Sent batch 218 - 682 records to Kafka.
Sent batch 219 - 682 records to Kafka.
Sent batch 220 - 682 records to Kafka.
Sent batch 221 - 682 records to Kafka.
Sent batch 222 - 682 records to Kafka.
Sent batch 223 - 682 records to Kafka.
Sent batch 224 - 682 records to Kafka.
Sent batch 225 - 682 records to Kafka.
Sent batch 226 - 682 records to Kafka.
Sent batch 227 - 682 records to Kafka.
Sent batch 228 - 682 records to Kafka.
Sent batch 229 - 682 records to Kafka.
Sent batch 230 - 682 records to Kafka.
Sent batch 231 - 682 records to Kafka.
Sent batch 232 - 682 records to Kafka.
Sent batch 233 - 682 records to Kafka.
Sent batch 234 - 682 records to Kafka.
Sent batch 235 - 682 records to Kafka.
Sent batch 236 - 682 records to Kafka.
Sent batch 237 - 682 records to Kafka.
Sent batch 238 - 682 records to Kafka.
Sent batch 239 - 682 reco

Sent batch 425 - 682 records to Kafka.
Sent batch 426 - 682 records to Kafka.
Sent batch 427 - 682 records to Kafka.
Sent batch 428 - 682 records to Kafka.
Sent batch 429 - 682 records to Kafka.
Sent batch 430 - 682 records to Kafka.
Sent batch 431 - 682 records to Kafka.
Sent batch 432 - 682 records to Kafka.
Sent batch 433 - 682 records to Kafka.
Sent batch 434 - 682 records to Kafka.
Sent batch 435 - 682 records to Kafka.
Sent batch 436 - 682 records to Kafka.
Sent batch 437 - 682 records to Kafka.
Sent batch 438 - 682 records to Kafka.
Sent batch 439 - 682 records to Kafka.
Sent batch 440 - 682 records to Kafka.
Sent batch 441 - 682 records to Kafka.
Sent batch 442 - 682 records to Kafka.
Sent batch 443 - 682 records to Kafka.
Sent batch 444 - 682 records to Kafka.
Sent batch 445 - 682 records to Kafka.
Sent batch 446 - 682 records to Kafka.
Sent batch 447 - 682 records to Kafka.
Sent batch 448 - 682 records to Kafka.
Sent batch 449 - 682 records to Kafka.
Sent batch 450 - 682 reco