# About this practical session
## Importation risk

If a disease is detected in county A. Which are the countries at highest risk of importing a case from county A? 
The risk of importation from a county A to a county B is can be defined as the probability of traveling from A to B, conditional on traveling.  
In other words: let us assume that an infected person is about to travel out of the country. Where will they go? Importation risk to country B is the probability that they will go to B.  

So we can turn the definition of risk into a mathematical law:   
$$C_{ab} = \frac{W_{ab}}{W_a}$$

where the sum of is computed over all countries except the origin country , to obtain the probabily of traveling from to conditionally to traveling outside of . Here, is the risk matrix: is the probability that a case traveling out of country a, ends up in country b. As you can see, this formula is extremely simple and relies on mobility. Nowhere we needed epidemiological data!

# Import libraries

In [None]:
import pandas as pd
import numpy as np

from matplotlib import pyplot as plt
import matplotlib as mpl
import matplotlib.dates as mdates

import os
import geopandas as gpd
import statsmodels.api as sm

In [None]:
# If you encounter an ImportError try install packages using the following command:
# !pip install geopandas

In [None]:
# A function for formatting dates in plots
def dateFormat(ax):
    locator = mdates.AutoDateLocator(minticks=5, maxticks=10)
    formatter = mdates.ConciseDateFormatter(locator, show_offset=False)
    ax.xaxis.set_major_locator(locator)
    ax.xaxis.set_major_formatter(formatter)

# Load geoPandas map

In [None]:
# Load the json file with county coordinates
geoData = gpd.read_file('https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/US-counties.geojson')
geoData = geoData.to_crs("ESRI:102003")
geoData = geoData.set_index('id')

hideStates = ['02', '69','44' ,'66' ,'15' ,'60' ,'78' ,'72']
geoData = geoData.query("STATE not in @hideStates")

# Read data
Due to time constraints and the large size of the data, we directly read the results of the previous code

In [None]:
# Read county to county csv file
# (if executed on Google Colab change the path in 'https://github.com/EPIcx-lab/ESPIDAM2024_Networks-and-Contact-Patterns-in-Infectious-Disease-Models/raw/main/mobilityflows/mobilityFlowsCounty.csv.xz')
 
c2c = pd.read_csv('./mobilityflows/mobilityFlowsCounty.csv.xz')
c2c['date'] = pd.to_datetime(c2c['date']) # transform column in datetime

In [None]:
# Ensure 'county_o' and 'county_d' are strings and containing 5 characters (adding leading zeros if necessary)
# Hint: similar to what done before

c2c['county_o'] = c2c['county_o'].astype(str)
c2c['county_o'] = c2c['county_o'].apply(lambda a: a.zfill(5))

c2c['county_d'] = c2c['county_d'].astype(str)
c2c['county_d'] = c2c['county_d'].apply(lambda a: a.zfill(5))

In [None]:
# Display 10 random rows from the dataset
c2c.sample(10)

In [None]:
# Filter for a certain date from the c2c DataFrame
...

In [None]:
# Compute the denominator W_a
# Hint: Use groupby to aggregate the total population flows
...
Wa = ...
Wa.name = 'Wa'

In [None]:
# Add the denominator W_a to the df dataframe
# Hint: in each row we need to add Wa to the corresponding couty of origin
df = df.merge(...)

In [None]:
df.head(2)

In [None]:
# Now compute C_ab
df['C'] = ...

In [None]:
df.head(2)

## Select a source county and plot on a map the risk 

- **New York City, New York**: 36061 
- **Los Angeles, California**: 06037
- **Chicago, Illinois**: 17031
- **Miami, Florida**: 12086 
- **Las Vegas, Nevada**: 32003 
- **Washington, D.C.**: 11001
- **Napa Valley, California**: 06055
- **Yellowstone National Park, Wyoming/Montana/Idaho**: 56029 / 30067 / 16019
- **Jackson Hole, Wyoming**: 56039
- **Lake Placid, New York**: 36031
- **Amish Country, Pennsylvania**: 42071
- **Great Smoky Mountains National Park, Tennessee/North Carolina**: 47009 / 37087 
- **Bar Harbor, Maine**: 23009 

In [None]:
geoData.head(2)

In [None]:
# Create a new database for a selected source
# Hint: Filter the DataFrame 'df' for travels from 'county_o' 
...

In [None]:
# Merge geoData with dfSource
# Hint: Use the merge function to combine geoData and dfSource.
geoDataC = geoData.merge(...)

In [None]:
geoDataC.head(2)

In [None]:
colorNorm = mpl.colors.LogNorm(vmin=geoDataC['C'].min(), vmax=geoDataC['C'].max())

In [None]:
fig, ax = plt.subplots(figsize=(12, 4), ncols=1, layout='constrained')

# Remove axis
ax.axis('off')

# Plot the destination color based on risk C.
geoDataC.plot(column='C', cmap='inferno_r', ax=ax, linewidth=0, vmax=0.008, legend=True, norm=colorNorm)

# Plot the whole map only borders
geoData.plot(facecolor='None', ax=ax, linewidth=0.1)

# Plot the source in green
geoData.loc[[source]].plot(facecolor='green', ax=ax, linewidth=0)

# EXTRA: Compare two dates

In [None]:
def addColumnC(df):
    # Compute the denominator W_a
    Wa = df.groupby('county_o')['pop_flows'].sum()
    Wa.name = 'Wa'
    
    # Add the denominator W_a to the df dataframe
    df = df.merge(Wa, left_on='county_o', right_index=True)

    # Now compute C_ab
    df['C'] = df['pop_flows']/df['Wa']
    return df

In [None]:
# Select a certain date from the c2c DataFrame
source = '36061'

dfDate1 = c2c.query('date == "2020-03-02" and county_o == @source')
dfDate2 = c2c.query('date == "2020-03-28" and county_o == @source')

dfDate1 = addColumnC(dfDate1)
dfDate2 = addColumnC(dfDate2)

In [None]:
# Merge geoData with dfSource
geoDataC1 = geoData.merge(dfDate1, left_index=True, right_on='county_d')
geoDataC2 = geoData.merge(dfDate2, left_index=True, right_on='county_d')

In [None]:
fig, axs = plt.subplots(figsize=(12, 4), ncols=2, layout='constrained')
colorNorm = mpl.colors.LogNorm(vmin=min(geoDataC1['C'].min(),geoDataC1['C'].min()), vmax=max(geoDataC1['C'].max(),geoDataC1['C'].max()))

# Plot the destination color based on risk C.
geoDataC1.plot(column='C', cmap='inferno_r', ax=axs[0], linewidth=0, legend=True, norm=colorNorm)
geoDataC2.plot(column='C', cmap='inferno_r', ax=axs[1], linewidth=0, legend=True, norm=colorNorm)

for ax in axs: 
    ax.axis('off')
    geoData.plot(facecolor='None', ax=ax, linewidth=0.1)
    geoData.loc[[source]].plot(facecolor='green', ax=ax, linewidth=0)

In [None]:
# why do they look similar?