Now that our data is inside a kafka topic (remember we ditched faust so we only have 1 topic), we will consume it from a simplest pythonic way `kafka-python`. 

The rest of this notebook is just to show how a live map can be made with python and GeoPandas library. While the final version will be using superset which contains very similar graphs with much less effort, I wanted to do it myself. You can skip this notebook if you feel like it.

Geopandas requires a .shp file (it also reads other files around it, so don't just put the .shp file and ignore the rest) to know the geometry of states,countries,counties,etc.

As per Geopandas [documentation](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.html), `plot` generates a matplotlib plot; however, that's not what we want. We want to use the `explore` method to have an interactive map, and explore uses folium maps.

What is the join operation? It's just to add another column to the dataframe that contains the count. Can't we add a column directly? Yes we can, I just used the first thing that came to mind; I just became more familiar with SQL that this method came to my mind first.

We also set `record = msg[6]` because kafka returns some 'metadata' alongside the message, like processing time, key if available, partition, and other data that you can print to check out.

Geopandas .shp file download from [here](https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html).

In [1]:
from kafka import KafkaConsumer
import json

import pandas as pd
import geopandas as gpd
from config import SOURCE_TOPIC, SERVER_PORT, RUNTIME, DATADIR
import time
import folium
from IPython.display import display, clear_output

REFRESH_RATE = RUNTIME / 360
%matplotlib inline

#
geo_df = gpd.read_file(DATADIR + 'cb_2021_us_all_500k/cb_2021_us_state_500k/cb_2021_us_state_500k.shp')
geo_df.rename(columns = {'STUSPS': 'abrv'}, inplace = True)
stream_df = pd.DataFrame()
stream_df['abrv'] = ''
stream_df['Count'] = ''
stream_df = geo_df.merge(stream_df, on = 'abrv', how = 'left')

consumer = KafkaConsumer('quickstart-events', bootstrap_servers=[SERVER_PORT],
                         value_deserializer = lambda x: json.loads(x.decode('utf-8')),
                         auto_offset_reset="earliest")

stream_df = stream_df.fillna(0)
start_time = time.time()

try:
    for msg in consumer:
        record = msg[6]
        if time.time() - start_time >=  REFRESH_RATE:
            start_time = time.time()
            plot = folium.Map(location=[37, -95], zoom_start = 4, zoom_control = False,
                               scrollWheelZoom = False, dragging = False)
            stream_df.explore(column = 'Count', cmap = 'OrRd', m = plot, 
                              tooltip = ['Count','abrv', 'NAME'], legend = False)

            display(plot) # Jupyter waits for the code to finish, force to display now
            clear_output(wait = True) # Wait for the following input before clearing

        if record['Event'] == 'Start':
            stream_df.loc[stream_df.abrv == record['State'], 'Count'] += 1
except KeyboardInterrupt:
    pass