# Consumer File

#### Author: Akash Goyal


**Libraries used:**
* `sleep` So that we can use the sleep function to introduce delay
* `KafkaConsumer` To use the kafkato consume the data
* `pandas` To use the pandas dataframe module
* `foilum` To be able to visualize the data in a map
* `clear_output` To be able to clear the output screen
* `dumps` Used for serializing the values

## Table of Contents

* [Importing Libraries](#lib)
* [Introduction](#intro)
* [Creating kafka consumer](#kfcon)
* [Creating dictionary for sensors](#sendict)
* [Creating the Folium Map for visualizing the data](#map)
* [Conclusion](#con)

## Importing Libraries <a class="anchor" name="lib"></a>

In [1]:
# import sleep() function to suspends execution of given thread for specified number of seconds.
from time import sleep
# import KafkaConsumer
from kafka import KafkaConsumer

import pandas as pd

# Used for serializing the values
from json import dumps

# This part will install the folium package if there is an error
# while importing it
try:
    import folium
except:
    !pip install folium

from IPython.display import clear_output

## Introduction <a class="anchor" name="intro"></a>

In this task we would be consuming the data that has been published in the second task to `final_joined_df` kafka topic. Further we would visualize the data on a map.

## Creating kafka consumer <a class="anchor" name="kfcon"></a>
In this step we would be creating a function for kafka consumer which would have the configuration of the kafka that needs to be used for futher steps. The name of this function is **connect_kafka_consumer()**, which has only the **kafka topic** as the parameter. Inside the function we would be setting the **the server and port address**, the **api version** to use, the part where the value would be serialized, th **auto_offset_reset** property. Below are the short descriptions of these parameters.
* **auto_offset_reset:** The value for parameter is `latest` by default. This is so that always the latest position of the committed data is taken.
* **api_version:** This is to specify which API version to use.
* **consumer_timeout_ms:** This the timeout in milliseconds after which the consumer would stop if its able to consume no messages.
* **value_deserializer:** Serializing the value and converting it into string data.
* **bootstrap_servers:** The server address that has to be used.

In [2]:
def connect_kafka_consumer(kafka_topic):
    """
    This function would help in creating an object using which one can consume tha dat from
    the required kafka topic.
    Params:
    1. kafka_topic: The kafka topic from  which the data needs to be consumed.
    """
    
    try:
    # Trying to create an object form the KafkaConsumer with the appropriate config
    # Also the value would be decoded from utf-8 to string.
        consumer = KafkaConsumer(kafka_topic,
                                 auto_offset_reset = 'latest', 
                                 api_version = (0, 10),
                                 consumer_timeout_ms = 90000,
                                 value_deserializer = lambda x: x.decode('utf-8'),
                                 bootstrap_servers = ['127.0.0.1:9092'])
    
    except:
        # Initializing a None object
        consumer = None

    
    return consumer

## Creating dictionary for sensors <a class="anchor" name="sendict"></a>
In this step we would be creating a dictionary which would have the mapping between the sensor ids and the sensor descriptions. This dictionary would come in handy when we would be creating the tooltip on the map visualization. here we would just normally read the sensor data and then using the **zip()** function create the dictionary.

In [3]:
# Reading the data using the read_csv function of the pandas module
sensordf = pd.read_csv('Pedestrian_Counting_System_-_Sensor_Locations.csv') # read the CSV file using pandas

# Creating the dictionary
sensor_dict = dict(zip(sensordf['sensor_id'],sensordf['sensor_description']))

## Creating the Folium Map for visualizing the data <a class="anchor" name="map"></a>
In this section we will be following the below steps for data visualization:
1. We need to create the map object using the imported **folium** library with the coordiates of Melbourne as that is the area of interest, also setting the zoo to 15 so that the data can be properly seen.
2. The kafka topic from which the data needs to be consumed needs to be set.
3. The object of the above created function **connect_kafka_consumer()** needs to be created.
4. A dictionary `count_dict` needs to be initialized an anther counter variable `i` would be created.
5. We would need to loop through the the messages that would be consumed by the kafka consumer. In the loop we would be doing the following:
    * The key for thr `count_dict` dictionary would be the entire coordinate string i.e. the latitude and the longitude.
    * If the key would be present in the dictionary then we would add 1 to the counter in the dictionary.
    * If the key would not be sent then the list would be created with the counter variable initialized to 1 and the sensor name would also be set.
    * Using the overall counter variable  `i` created before we can only refresh the map periodically when the counter gets 20 new messages.
6. Next inside the loop in point (5) we would need to again loop on the `count_dict` created. From the keys of the dictionary we can extract the latitude and longitude values by splitting the string with **','**. Also the values of the dictionary would contain the count(of records having pedestrian count > 2000) and the sensor name which would be used to create a tooltip.
7. Next we need to create a circle marker to show the locations. The radius would change as per the count. After creating the object it can be displayed using the **display()** function and then create a delay (using the sleep function) for 5 seconds.

In [4]:
# Create a map object with Melbournes coordinates with zoom = 15 so that
#one can see it properly
melb_map = folium.Map(location = [-37.8136, 144.9631], zoom_start = 15)

# Kafka topic from which the data needs to be consumed
kafka_topic = 'final_joined_df'

# Creating the consumer object
consumer = connect_kafka_consumer(kafka_topic)

# Intializing the dictionary
count_dict = {}

# Intializing the overall conuter variable
i = 0

# Looping through the data
for message in consumer:
    
    # Getting the coordinates as a string
    coordinate = ",".join(message.value.split(',')[0:2])
    
    # Populating the dictionary
    if coordinate in count_dict:
        count_dict[coordinate][0] += 1
    else:
        sensor_name = sensor_dict[int(message.value.split(',')[2])]
        count_dict[coordinate] = [1, sensor_name]
    
    # Incrementing the overall counter variable
    i += 1
    
    # Checking to see of there are 20 new messages or not
    if i % 20 == 0:
        
        # Clearing the earlier map display to refresh
        clear_output()
      
        # Looping through the dictionary count_dict
        for coordinates_key, coordinates_val in count_dict.items():
            
            # Getting the lats and longs from the key
            lat = coordinates_key.split(',')[0]
            lng = coordinates_key.split(',')[1]
            
            # Creating the tooltip
            tooltip_text = "Sensor Name: {}<br> Latitude: {}<br> Longitude: {}<br> Count(>2000): {}"
            tooltip_text = tooltip_text.format(
                      coordinates_val[1],
                      lat,
                      lng,
                      coordinates_val[0]
                      )
            
            # Creating the folium circle marker and adding it to the map
            folium.Circle(location=[lat,lng], radius = 2 * coordinates_val[0],
                          fill = True, color = 'red', fill_color = 'green',
                          tooltip = tooltip_text).add_to(melb_map)
        
        # Displaying the map
        display(melb_map)
        
        # Creating the delay of 5 seconds
        sleep(5)

## Conclusion <a class="anchor" name="con"></a>

From the above visulization we can conclude the sensors on the Flinders railway line are the ones which have the maximum count of records where the pedestrian count has been predicted to be greater than 2000. Therefore, it is recommended that the performers try to perform there.