# Summary
This notebook deomnstrates very basic usage of the loading python modules. Datasets used include the Ookla tile data, 
Statistics Canada hexagons and Pseudo-household population distribution. The data is geographic info on internet speed 
test results as well as population data and nominal access to internet based on the National Broadband Map. 

In [None]:
import sys
sys.path.append("..")
# Useful for testing/debugging in Jupyter
# %load_ext autoreload
# %autoreload 1
# %aimport src.datasets.loading.statcan 

import pandas as pd
import matplotlib.pyplot as plt 
from src.datasets.loading import statcan
from src.datasets.loading import ookla

## Load Datasets
Load the data and display samples of their content.

In [None]:
#calcs take about 5 minutes
hex_data = statcan.hexagon_geometry().merge(statcan.hexagons_phh(), how='right', on="HEXuid_HEXidu")
tiles = ookla.canada_speed_tiles()
hex_data = hex_data.to_crs(tiles.crs)

In [None]:
tiles.head(2)

In [None]:
hex_data.head(2)

## Compute Spatial Joins 
Merge data based on it's location. For this data, identify smaller Ookla tiles which are inside the 
federal government hexagon areas.

In [None]:
spatial_join = hex_data.sjoin(tiles) # calc takes about 3 minutes

In [None]:
spatial_join.head(2)

## Aggregate Join Info
The above spatial join adds hexagon information to individual tiles, and we'd like to 
aggregate and calculate some statistics on the overall hexagon areas.

In [None]:
grps = spatial_join.loc[lambda s:(s.year ==2022) & (s.conn_type=='fixed')].groupby('HEXuid_HEXidu')
hex_aggs = pd.concat([
    grps['avg_d_kbps'].mean(),
    grps['avg_u_kbps'].mean(),
    grps['avg_lat_ms'].mean(),
    grps['tests'].sum(),
    grps['devices'].sum(),
],axis=1)

In [None]:
hex_aggs.head(5)

In [None]:
hex_data_w_speeds = hex_data.merge(hex_aggs, left_on='HEXuid_HEXidu',right_index=True)

## Visualizations
Plot the hexagons accross Canada, and also compare population of hexagon area to dowload speed of 50 Mbps.

In [None]:
ax = hex_data_w_speeds.plot(column='avg_d_kbps', legend=True, vmin=0, vmax=100e3, figsize=(14,10))
ax.set(xlabel="Degrees Longitude", ylabel="Degrees Latitude")
ax.set_title("Average Download Speed (kbps)")
statcan.boundary('provinces').to_crs(hex_data_w_speeds.crs).boundary.plot(ax=ax);

In [None]:
hex_data_w_speeds.plot.scatter(x='Pop2016',y='avg_d_kbps')
ax = plt.gca()
ax.axhline(50e3,color='k', zorder=100);