# 🚦Visualization Challenges of the Metr-LA Traffic Dataset with Pydeck🎊


### 📃Introduction to the Metr-LA dataset

Metr-LA, forked from [github link](https://github.com/deepkashiwa20/DL-Traff-Graph), is a dataset that can be used in a <span style='color: brown;'><b>Traffic Flow Forecasting Task</b></span> in Los Angeles. There are a total of <span style='color: brown;'><b>207 nodes</b></span> in the dataset (i.e. sensors placed on different sections of the freeway), which have collected traffic speeds <span style='color: brown;'><b>at five-minute intervals</b></span> over a period of three consecutive months. The dataset also records information about the graph structure between the nodes.

### 🚀Goal
This notebook aims to use <span style='color: brown;'><b>Pydeck tools</b></span> for more advanced map visualization of the dataset, combining Pydeck's <span style='color: brown;'><b>TextLayer</b></span>, <span style='color: brown;'><b>HeatMapLayer</b></span>, <span style='color: brown;'><b>ScatterLayer</b></span> and <span style='color: brown;'><b>GridLayer</b></span> to show the <span style='color: brown;'><b>daily changes</b> of the 207 nodes, and presenting it as an animation. 

<blockquote>
📌Pydeck itself <u style="color: red; font-style: italic;">does not support animation</u>, it can only export <u style="color: red; font-style: italic;">HTML files</u>, so there will be no part of creating an animation in this notebook!!
</blockquote>

In [None]:
!pip install -q pydeck

In [None]:
import numpy as np
import pandas as pd
import os
from IPython.display import display
from sklearn.preprocessing import MinMaxScaler
from tqdm.notebook import tqdm

# Use Pydeck for Map Visualization
import pydeck as pdk

# Get MapBox Token
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
TOKEN = user_secrets.get_secret("MAPBOX_TOKEN")

# 1. Read Raw Data from Dataset

### ✏️<span style='color: Orange;'><b>Input files:</b></span>

<blockquote>
    
#### **metr-la.h5**
<blockquote>
It records the <u style="color: red; font-style: italic;">traffic flow</u> of 207 nodes at a frequency of 5 minutes over a period of three months.
</blockquote>
    
#### **adj_mx.pkl**
<blockquote>
It records the <u style="color: red; font-style: italic;">graph structure</u> information, here we only need the <u style="color: red; font-style: italic;">index and name</u> information of 207 sensors.
</blockquote>

#### **graph_sensor_locations.csv**
<blockquote>
It records the <u style="color: red; font-style: italic;">latitude and longitude</u> information of 207 sensors.
</blockquote>
    </blockquote>

In [None]:
# Define data path
FLOWPATH = '/kaggle/input/metr-la-complete/metr-la.h5'
adj_mat_path = "/kaggle/input/metr-la-complete/adj_mx.pkl"
graph_sensor_loc_path = "/kaggle/input/metr-la-complete/graph_sensor_locations.csv"

########### Read RAW Data ########### 
traff_flow_data = pd.read_hdf(FLOWPATH)
sensor_loc_latlong = pd.read_csv(graph_sensor_loc_path)
sensor_ref = pd.read_pickle(adj_mat_path)[0]
sensor_ref_id_dict = pd.read_pickle(adj_mat_path)[1]

display(traff_flow_data.head(2))
r, c = traff_flow_data.shape
print(f"Pivot DataFrame Shape: {traff_flow_data.shape}\n")

## 1.1 Convert pivot dataframe to melted dataframe

The current <span style='color: brown;'><b>traff_flow_data</b></span> (dataframe) contains only the sensor name, time and traffic flow data, and its form is <span style='color: brown;'><b>not suitable</b></span> for data visualisation as it is in the form of a <span style='color: brown;'><b>pivot</b></span>, we need to convert it using <span style='color: brown;'><b>Melt function</b></span> and add other useful information: latitude, longitude and sensor index.

In [None]:
# Pivot df to Melted df
traff_flow_data['date'] = pd.to_datetime(traff_flow_data.index)
df_melted = pd.melt(traff_flow_data.reset_index(drop=True), id_vars="date", var_name='sensor_id', value_name='flow')

# Convert Object Type to int Type
df_melted['sensor_id'] = df_melted['sensor_id'].astype(int)

# Outer join to concate the latitude, longtitude and sensor index
new_data = pd.merge(df_melted, sensor_loc_latlong, on='sensor_id', how='outer')
new_data.rename(columns={'index': 'sensor_index'}, inplace=True)
new_data = new_data.sort_values(['sensor_index','date'], ignore_index=True)

def remove_time(x):
    return pd.Timestamp(year=x.year, month=x.month, day=x.day)
new_data['day'] = new_data['date'].apply(lambda x : remove_time(x))

display(new_data.head(2))
print(f"Melted-Joined DataFrame Shape: {new_data.shape}, should be equal to {r}*{c}({r*c})")

## 1.2 Further Process Dataframe

In the previous step we have successfully obtained a dataframe containing date, sensor_id, flow, sensor_index, latitude, longitude and day.

<span style='color: brown;'><b>We need to process the dataset further.</b></span>

I want to visualise the dataset day by day, so I need to take the <span style='color: brown;'><b>average flow</b></span> for each sensor on each day to make sure that there is only one flow value for each sensor during the day.

Once we have the average flow, we need to <span style='color: brown;'><b>classify this flow value</b></span>. Theoretically, the faster the flow rate, the smoother the road, so I defined <span style='color: brown;'><b>3 different road LEVELS</b></span>:

<blockquote>
<span style='color: ORANGE;'><b>LIGHT</b></span>: 1-34 mph;

<span style='color: ORANGE;'><b>MEDIUM</b></span>: 35-54 mph;

<span style='color: ORANGE;'><b>HEAVY</b></span>: >55 mph and = 0 cases;
</blockquote>

In [None]:
# Use the groupby method to group by sensor_id and day, calculate the average of the flow, 
# keeping the index, latitude and longitude values.
mean_flow_df = new_data.groupby(['sensor_id', 'day']).agg({
    'flow': 'mean',  
    'sensor_index': 'first',  # Take the first value
    'latitude': 'first',  
    'longitude': 'first',
}).reset_index()

# Rename columns to reflect average flow, then sort by day and sensor_index.
mean_flow_df.rename(columns={'flow': 'mean_flow'}, inplace=True)
mean_flow_df = mean_flow_df.sort_values(['day','sensor_index'], ignore_index=True)

# Calculate the weight by mean flow, the higher the speed of the vehicle and the smoother the road is, 
# the smaller the weight is.
scaler = MinMaxScaler()
mean_flow_df['normalized'] = scaler.fit_transform(mean_flow_df[['mean_flow']])
mean_flow_df['inverse_normalized'] = mean_flow_df['normalized'].apply(lambda x: 1-x)

levels = ["light", "medium", "heavy"]
def choose_level(x):
    if x >= 55: return levels[0]
    elif x>=35 and x <55: return levels[1]
    elif x>0 and x <35: return levels[2]
    else: return levels[0]
mean_flow_df['level'] = mean_flow_df['mean_flow'].apply(lambda x: choose_level(x))

# Define the radius of the scatter according to the level.
def get_radius(x):
    if x == "light": return 20
    elif x == 'medium': return 40
    else: return 80
mean_flow_df['radius'] = mean_flow_df['level'].apply(lambda x: get_radius(x))

display(mean_flow_df.head(2))
print(f"Mean Flow DataFrame Shape: {mean_flow_df.shape}\n")

The dataframe we have processed so far can be used in the <span style='color: brown;'><b>HeatMapLayer and ScatterLayer</b></span>!

Next we need to process the dataframe so that it can be used in the <span style='color: brown;'><b>GridLayer</b></span>.  

🥸<u style="color: red; font-style: italic;">In fact GridLayer does not see the average flow as the level of the grid!!!</u>

In simple terms, GridLayer uses <span style='color: brown;'><b>counting</b></span> to determine how high or low the grid is at each latitude and longitude. So to make the height of the grid correspond to the average flow, we need to <span style='color: brown;'><b>insert some rows with a value of 1</b></span>. 

<blockquote>
    <u style="color: green;">For example, if Sensor1 has an average flow of 35.5, I would insert 35 rows, each with a value of 1, so that the GridLayer counts match the average flow.</u>

</blockquote>

In [None]:
# Process dataframe for Pydeck Grid Layer
def expand_row(row):
    return pd.DataFrame([row] * int(100*row['inverse_normalized'])).assign(value=1)

expanded_dfs = mean_flow_df.apply(expand_row, axis=1)
grid_df = pd.concat(expanded_dfs.values.tolist()).reset_index(drop=True)

# test
grid_df[(grid_df.latitude == 34.15497) & (grid_df.longitude == -118.31829)]

# 2. Pydeck Viz

So far we haven't defined a dataframe for <span style='color: brown;'><b>TextLayer</b></span>, which will be defined in the <span style='color: brown;'><b>plot_func</b></span> here to display time information at the specified latitude and longitude.

In [None]:
def plot_func(df_day, grid_day, angle, index, output_directory, mode="SUM", mode_plot="False"):
    
    # create day data(date+weekday) for text layer
    day_name = df_day.day.iloc[0].day_name()
    day = df_day.day.values[0]
    str_day = str(day).replace("T00:00:00.000000000", " ") + day_name
    text_data = [{"position": [-118.47605, 34.09478], "text": str_day},]
    
    # 3 levels for heatmap layer and scatter layer
    light_df = df_day[df_day.level == "light"]
    medium_df = df_day[df_day.level == "medium"]
    heavy_df = df_day[df_day.level == "heavy"]
    
    # 3 levels for grid layer
    light_grid = grid_day[(grid_day.level == "light")]
    medium_grid = grid_day[(grid_day.level == "medium") | (grid_day.level == "heavy")]
    heavy_grid = grid_day[(grid_day.level == "medium") | (grid_day.level == "heavy")]
    
    # Define lighting effect
    ambient_light = {"@@type": "AmbientLight", "color": [255, 255, 255], "intensity": 1.0}
    pointLight1 = {"@@type": "PointLight", "color": [82, 194, 230],"intensity": 0.8, "position": [-0.144528, 49.739968, 80000]} #blue
    pointLight2 = {"@@type": "PointLight", "color": [255, 255, 255], "intensity": 0.8, "position": [-3.807751, 54.104682, 8000]} #white
    lighting_effect = {
        "@@type": "LightingEffect",
        "shadowColor": [0, 0, 0, 0.5],
        "ambientLight": ambient_light,
        "pointLight1": pointLight1,
        "pointLight2": pointLight2,
    }
    
    # Define material
    material = {
      "ambient": 0.64,
      "diffuse": 0.6,
      "shininess": 32,
      "specularColor": [51, 51, 51]
    };
    
    # Define color scale
    COLOR_BREWER_LIGHT_GREEN_SCALE = [
        [247, 252, 245],  # light green ++
        [199, 233, 192],  # light green +
        [161, 217, 155],  # medium green ++
        [116, 196, 118],  # medium green +
    ]

    COLOR_BREWER_LIGHT_YELLOW_SCALE = [
        [255, 255, 229],  # light yellow +++
        [255, 247, 188],  # light yellow ++
        [254, 227, 145],  # light yellow +
        [254, 217, 118],  # medium yellow +
        [254, 178, 76],   # medium yellow ++
        [253, 141, 60],   # medium yellow +++
    ]

    COLOR_BREWER_GRAY_SCALE = [
        [247, 247, 247,150],  # light gray ++
        [204, 204, 204,150],  # light gray +
    ]
    
    MODE = mode  #“SUM” or "MEAN"
    
    ###########################################################
    ### Text Layer ########################################
    ###########################################################
    text_layer = pdk.Layer(
        "TextLayer",
        text_data,
        get_position='position',
        get_text='text',
        get_size=20,
        get_color=[161, 217, 155, 255], #green
        get_angle=0,
        get_text_anchor="'middle'",
        get_alignment_baseline="'top'"
    )


    ###########################################################
    ### Heat Map Layer ########################################
    ###########################################################
    light = pdk.Layer(
        "HeatmapLayer",
        data=light_df,
        opacity=1,
        get_position=["longitude", "latitude"],
        aggregation=pdk.types.String(MODE),
        color_range=COLOR_BREWER_LIGHT_GREEN_SCALE,
        threshold=1,
        get_weight="inverse_normalized",
        pickable=True,
    )

    medium = pdk.Layer(
        "HeatmapLayer",
        data=medium_df,
        opacity=1,
        get_position=["longitude", "latitude"],
        aggregation=pdk.types.String(MODE),
        color_range=COLOR_BREWER_LIGHT_YELLOW_SCALE,
        threshold=1,
        get_weight="inverse_normalized",
        pickable=True,
    )

    heavy = pdk.Layer(
        "HeatmapLayer",
        data=heavy_df,
        opacity=1,
        get_position=["longitude", "latitude"],
        threshold=1,
        aggregation=pdk.types.String(MODE),
        color_range=[[255, 0, 0]],
        get_weight="radius",
        pickable=True,
    )

    ###########################################################
    ### Scatter Plot Layer ####################################
    ###########################################################
    light_scatter = pdk.Layer(
                    "ScatterplotLayer",
                    light_df,
                    pickable=True,
                    opacity=0.5,
                    stroked=False,
                    filled=True,
                    radius_scale=6,
                    radius_min_pixels=1,
                    radius_max_pixels=100,
                    line_width_min_pixels=1,
                    get_position=["longitude", "latitude"],
                    get_radius="radius",
                    billboard = False, # scatter (face to camera)， false (horizonal)
                    get_fill_color=[199, 233, 192], # light green
                    get_line_color=[199, 233, 192],
                )

    medium_scatter = pdk.Layer(
                    "ScatterplotLayer",
                    medium_df,
                    pickable=True,
                    opacity=0.5,
                    stroked=False,
                    filled=True,
                    radius_scale=6,
                    radius_min_pixels=1,
                    radius_max_pixels=100,
                    line_width_min_pixels=1,
                    get_position=["longitude", "latitude"],
                    get_radius="radius",
                    billboard = False,
                    get_fill_color=[254, 217, 118], # medium yellow
                    get_line_color=[254, 217, 118],
                )

    heavy_scatter = pdk.Layer(
                    "ScatterplotLayer",
                    heavy_df,
                    pickable=True,
                    opacity=0.5,
                    stroked=False,
                    filled=True,
                    radius_scale=6,
                    radius_min_pixels=1,
                    radius_max_pixels=100,
                    line_width_min_pixels=1,
                    billboard = False,
                    get_position=["longitude", "latitude"],
                    get_radius="radius",
                    get_fill_color=[255, 0, 0], # red
                    get_line_color=[255, 0, 0], # red
                )
    
    ###########################################################
    ### Grid Layer ####################################
    ###########################################################
#     light_grid_layer = pdk.Layer(
#                 "GridLayer",
#                 light_grid,
#                 pickable=True,
#                 extruded=True,
#                 cell_size=100,
#                 elevation_scale=2,
#                 elevationAggregation="MAX",
#                 get_position=["longitude", "latitude"],
#                 color_range=[[199, 233, 192]],
#     )
    medium_grid_layer = pdk.Layer(
                "GridLayer",
                medium_grid,
                pickable=True,
                extruded=True,
                cell_size=150,
                elevation_scale=4,
                elevationAggregation="MAX",
                get_position=["longitude", "latitude"],
                color_range=[[254, 217, 118]],
    )
    heavy_grid_layer = pdk.Layer(
                "GridLayer",
                heavy_grid,
                pickable=True,
                extruded=True,
                cell_size=150,
                elevation_scale=4,
                elevationAggregation="MAX",
                get_position=["longitude", "latitude"],
    #             color_range=[[255, 0, 0]],
    )

    # Set the viewport location
    view_state = pdk.ViewState(
        longitude=np.mean(mean_flow_df.longitude)- 0.035,
        latitude=np.mean(mean_flow_df.latitude)-0.015, # vertical
        zoom=10.5,
        pitch=55,  # 55 vertical
        bearing=angle  # 15 # horizon
    )

    r = pdk.Deck(
        height=600, width='60%',
        layers=[text_layer,light,medium,heavy,medium_grid_layer,heavy_grid_layer,light_scatter,medium_scatter,heavy_scatter],
        initial_view_state=view_state,
    #     effects=[lighting_effect],
        map_style="dark",  # Mapbox style, need token
        api_keys ={"mapbox":TOKEN},
        map_provider="mapbox",
    )
    
    save_path = os.path.join(output_directory, f'dataset_viz_day{index}.html')
    r.to_html(save_path)
    
    if mode_plot:
        return r

## Iterate on different days 

In this step we will visualise each day in the dataset and take a different viewing angle, then get the HTML file.

⚡🚀 <u style="color: red; font-style: italic;">You can skip this step and go down to the next cell to see the pydeck effect directly.!!!</u>

In [None]:
output_directory = "/kaggle/working/html_outputs"

if not os.path.exists(output_directory):
    os.makedirs(output_directory)
    print(f"Directory created: {output_directory}")
else:
    print(f"Directory already exists: {output_directory}")
    
for i, d in tqdm(enumerate(np.unique(mean_flow_df.day.values))):
#     print(f"Processing day {d}. \n")
    
    df_day = mean_flow_df[mean_flow_df.day == d]
    grid_day = grid_df[grid_df.day == d]
    
    mode = "SUM"
    
    #     angles = [i for i in range(0,361,15)]
    #     if i//25 ==0:
    #         angle_bear = angles[i]
    #     else:
    #         newi = i-25*(i//25)
    #         angle_bear = angles[newi]

    angles = [i for i in range(60,-60,-1)]
    plot_func(df_day=df_day, grid_day=grid_day, angle=angles[i], 
              index=i, output_directory=output_directory, mode=mode, mode_plot="False")
    

In [None]:
!zip -r output_htmls.zip /kaggle/working/html_outputs/

## Show an example

In [None]:
d = mean_flow_df.day[0]
df_day = mean_flow_df[mean_flow_df.day == d]
grid_day = grid_df[grid_df.day == d]
mode = "SUM"
angle_bear = 0
output_directory = "/kaggle/working/"
r = plot_func(df_day, grid_day, angle_bear, 0, output_directory, mode, mode_plot=True)
r