# Spatial Correlation Analytics Between Population and COVID-19 Confirmed Cases in New York State

This Jupyter notebook demostrates spatial correlation analytics bewtween population and CVOID-19 Confirmed cases in New York State.

We are using New York, US as our study area. And this notebook uses geospatial libraries to show the spatial distribution of population, COVID confirmed cases in New York State, and show the results for the spatial correlation analytics between population and COVID-19 confirmed cases in New York State.

# Notebook Outline
- [Data preparing](#Data)
    - [Setup](#setup)
    - [Population Data](#Pop)
    - [COVID-19 Data](#CSV)
- [Spatial Analysis](#explore)
    - [Spatial distrinution](#spatial)
    - [Spatial Correlation Analytics](#statistical)

<a id='Data'></a>
## Data Preparation

The first part is a demostration that shows user how to prepare population data and COVID data in New York State.

<a id='setup'></a>
### Set up the environment by importing libraries
Import numpy, pandas, geopandas, shapely and other libraries available in CyberGIS-Jupyter to set up an environment to store and manipulate the Population data.

In [None]:
import pathlib
import os
import tarfile

import requests
import shutil
import zipfile
 
import pandas as pd
import pathlib
import os
import tarfile

import requests
import shutil
import zipfile

# Plotting the population data
import matplotlib.pyplot as plt
import datetime
%matplotlib inline

import numpy as np
import geopandas as gpd
from shapely.geometry import Point

import plotly.figure_factory as ff
import plotly.express as px
import json
import plotly.graph_objects as go

import seaborn as sns


<a id='Pop'></a>
### Population data
Population data for New York State

The dataformat is shapefile.

The original link is https://www.arcgis.com/home/item.html?id=3b69769aa9b646a483af81d05e7702d2.

U.S. Counties represents the counties of the United States in the 50 states, the District of Columbia, and Puerto Rico.

Originally extracted from this layer package: http://www.arcgis.com/home/item.html?id=a00d6b6149b34ed3b833e10fb72ef47b


In [None]:
%%time
file = pathlib.Path("USA_Counties_as_Shape.zip")
if file.exists ():
    print ("Population data exist")
else:
    print ("Population data not exist, Downloading the Population data...")
    !wget https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/22153815/USA_Counties_as_Shape.zip


Show the first five records of the new york state 

In [None]:
%%time
pop = gpd.read_file("zip://USA_Counties_as_Shape.zip")
pop = pop[pop.STATE_NAME=='New York']
pop

<a id='CSV'></a>
### COVID-19 Data

The data is retrieved from [Johns Hopkins CSSE COVID-19 cases dataset repository](https://github.com/CSSEGISandData/COVID-19/)

The data format is CSV file.

In [None]:
%%time
confirmed_cases = pd.read_csv(
    "https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv"
)
confirmed_cases = confirmed_cases[confirmed_cases['Province_State'] == 'New York']
confirmed_cases.head(5)

Show the time series data

In [None]:
columns = confirmed_cases.columns
dates = columns[11:-1]
dates

In [None]:
pop["Admin2"]=pop["NAME"]
pop.shape

In [None]:
pop.describe()

In [None]:
confirmed_cases = confirmed_cases[confirmed_cases['Admin2'] != 'Unassigned']

In [None]:
confirmed_cases

In [None]:
confirmed_cases['3/29/2020'].hist(bins=100)

<a id='explore'></a>

## Spatial Analysis

The part is a demostration that shows spatial correlation analytics bwtween population and CVOID-19 Confirmed cases in New York State.

<a id='spatial'></a>
### Spatial distribution

In [None]:
from urllib.request import urlopen
import json
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)

Show the spatial distribution of the COVID-19 Confirmed Cases in New York State

In [None]:
fig = go.Figure(
    go.Choroplethmapbox(
        geojson=counties, locations=confirmed_cases.FIPS, 
        z=np.log1p(confirmed_cases['3/29/2020']),
#         z=confirmed_cases['3/29/20'],
        colorscale="reds", marker_opacity=0.5, marker_line_width=0,
        ids = confirmed_cases['Admin2'],  
        name = 'Confirmed Cases',
        colorbar_thickness = 10,
        hoverinfo = 'text',
        text = confirmed_cases['Admin2'] + ', ' + confirmed_cases['Province_State'] + '\n' + confirmed_cases['3/29/2020'].astype('str'),
#         showlegend = True,
        showscale = True,
        colorbar = dict(
            title = "# confirmed cases",
            titleside = 'top',
            tickmode = 'array',
            tickvals = np.arange(11),
            ticktext = np.round(np.exp(np.arange(0,11)) - 1),
            ticks = 'inside',
            outlinewidth = 0
        )
    ))
fig.update_layout(mapbox_style="carto-positron",
                  mapbox_zoom=5, #mapbox_center = {"lat": 37.0902, "lon": -95.7129},)
                  mapbox_center={"lat": 42.7, "lon": -76},
                 )
fig.update_layout(margin={"r":10,"t":10,"l":10,"b":10})

fig.show()

The density map for the COVID-19 Confirmed Cases in New York State

In [None]:
%%time 
fig = go.Figure(
    go.Densitymapbox(
        name = 'Density of Confirmed Cases',
        opacity = 0.7,
        z = np.log1p(confirmed_cases['3/29/2020']),
        lat = confirmed_cases['Lat'],
        lon = confirmed_cases['Long_'],
        colorscale = 'reds',
        radius = 30,
        
        text = confirmed_cases['Admin2'] + ', ' + confirmed_cases['Province_State'] + '\n' + confirmed_cases['3/29/2020'].astype('str'),
        hoverinfo = 'text',
        colorbar = dict(
            title = "# confirmed cases",
            titleside = 'top',
            tickmode = 'array',
            tickvals = np.arange(11),
            ticktext = np.round(np.exp(np.arange(0,11)) - 1),
            ticks = 'inside',
            outlinewidth = 0
        )
    )
)
fig.update_layout(mapbox_style="carto-positron",
                  mapbox_zoom=5, #mapbox_center = {"lat": 37.0902, "lon": -95.7129},)
                  mapbox_center={"lat": 42.7, "lon": -76})
fig.update_layout(margin={"r":0.1,"t":0.1,"l":0.1,"b":0.1})

fig.show()

The trend for New York City

In [None]:
nyc_count = confirmed_cases[confirmed_cases['Admin2'] == "New York"]
nyc_count = nyc_count.T.iloc[11:]

In [None]:
nyc_count.columns = ['count']

In [None]:
px.line(nyc_count, x = nyc_count.index, y=nyc_count['count'])

<a id='statistical'></a>
### Spatial Correlation Analytics 

In [None]:
sns.set(style='darkgrid', palette="deep", font_scale=1.1, rc={"figure.figsize": [10, 8]})
sns.distplot(pop['POP2012'], norm_hist=False, kde=False).set(xlabel='POP2012', ylabel='Count');
plt.savefig('POP2012_distplot.png')

In [None]:
sns.jointplot(x=pop['POP2012'], y=pop['POP2010']);

In [None]:
sns.jointplot(x=pop['POP2012'], y=pop['POP12_SQMI']);

In [None]:
%%time
merged_population = pop.merge(confirmed_cases, on=["Admin2"], how='outer')
merged_population.head()

Exploratory data analysis for population data and COVID-19 Confirmed Cases

In [None]:
%%time
fig, ax = plt.subplots(1,2, figsize=(18,18))
merged_population.plot(column='POP2012', scheme='Quantiles', k=5, cmap='YlGnBu', legend=True, ax=ax[0]);
merged_population.plot(column='3/29/2020', scheme='Quantiles', k=5, cmap='YlGnBu', legend=True, ax=ax[1]);
plt.tight_layout()
ax[0].set_title("Population Count")
ax[1].set_title("COVID-19 Confirmed Cases on 3/29/2020")
plt.savefig('comparison.png', bbox_inches="tight")
plt.show()

calculate correlation matrix and plot the heatmap

In [None]:
%%time
columns = ['POP2012','POP12_SQMI','MALES','FEMALES','WHITE','BLACK','AMERI_ES','ASIAN','HAWN_PI','HISPANIC','OTHER','3/25/2020', '3/26/2020', 
           '3/27/2020', '3/28/2020','3/29/2020','3/30/2020']

# 
correlation = merged_population[columns].corr()

fig, ax = plt.subplots(figsize=(12,10))


sns.heatmap(correlation, xticklabels=columns,yticklabels=columns, ax=ax)
plt.show()