## EVENT PRODUCER 3- Streaming of Hotspot Terra Data<hr />

<b>Event Producer 3 </b> - The event producer 3 will be responsible for streaming climate data from hotspot_TERRA_streaming.csv file and inject the topic(channel) <b>'test'</b> in a randomized fashion to masquerade a real application. Each chunk of the data that is streamed is appended with additional information in terms of sender_id and created_time. Furthermore, the data will be streamed every 3 seconds to model a real-world application with plausible latency. 

This producer produces data every 3 seconds. We have decided to go with 3 seconds as the streaming client application needs to have more fire data to make a possible relation among the climate and hotspot data more possible. Although this digresses from the specification, we have a plausible reason for this digression as we are trying to make it more proabable for any two data chunks from different sources to be closely related. This is essentially important as we are streaming data in a randomized manner.

In [None]:
#imporing libraries
from time import sleep
import pandas as pd
from json import dumps
from kafka import KafkaProducer
import random
import datetime as dt


# function to publish the data to the topic with an established connection. the data recieved will be sent in a key 
# value format. key being an indicator and the value being the stringified form of a dictionary(json) which will be easier
# to be consumed and processed and insertable format for mongoDB.
def publish_message(producer_instance, topic_name, key, value):
    try:
        # encoding the key and value in utf-8 format.
        key_bytes = bytes(key, encoding='utf-8')
        value_bytes = bytes(value, encoding='utf-8')
        
        # sending the data to the specified topic with key and value as encoded strings from the passed producer instance.
        producer_instance.send(topic_name, key=key_bytes, value=value_bytes)
        
        # wait for the all the messages in the queue to be delivered to the topic until the message queue gets empty or the
        # producer runs out of time
        producer_instance.flush()
        print('Message published successfully. Data: ' + str(value))
    except Exception as ex:
        print('Exception in publishing message.')
        print(str(ex))

# Function to create a connection to the Kafka instance that accepts connection on port 9092. The function returns an instance of the 
# connected object. 
def connect_kafka_producer():
    _producer = None
    try:
        # establishing a connection to the kafka instance and assign the instance to a variable. 
        _producer = KafkaProducer(bootstrap_servers=['localhost:9092'],
                                  api_version=(0, 10))
    except Exception as ex:
        print('Exception while connecting Kafka.')
        print(str(ex))
    finally:
        return _producer


# function to the read data as a dataframe and convert the dataframe to a dictionary
def read_data():
    hotspot_TERRA_stream = pd.read_csv("hotspot_TERRA_streaming.csv")
    TERRA_dict = hotspot_TERRA_stream.to_dict(orient = "records")
    return TERRA_dict
    
        
    
    
if __name__ == '__main__':
   
    # initiating a topic 'test' for the prodcuer to connect to.
    topic = 'test'
    
    
    print('Publishing records..')
    
    # read the data as a dictionary
    x = read_data()
    
    # get a connection and get a producer instance
    producer = connect_kafka_producer()
    
    # send each dictioanry(json) as a data chunk. all the data chunks will be sent. 
    for i in range(0,len(x)):
        
        # generate a random number from 0 to length of the entire dataset. The randomly generated random number will be used as the 
        # index to extract the dictionary at the attained index would be fetched. However this also shows that there can be duplicates data chunks that could be spent across.
        index = random.randrange(0,len(x))
        
        # appending the extracted index with a sender id.
        x[index]["sender_id"] = "fire_TERRA_producer_3"
        
        # appending the created_time to the fetched data.
        x[index]["created_time"] = dt.datetime.now().strftime("%X")
        
        # send across the data to the specified topic as a stringified dictionary.
        publish_message(producer, topic, 'parsed', str(x[index]))
        
        # the producers waits for 3 seconds until it sends the next chunk of data.
        sleep(3)
