# Notebook for preparing redesign of data management

First, we need to make sure we are in the root directory of the project to import custom modules. These notebooks are stored in `notebooks/` for cleanliness.

In [1]:
pwd

'/Users/DanOvadia/Projects/covid-hotspots/notebooks'

In [2]:
cd ..

/Users/DanOvadia/Projects/covid-hotspots


### Python Libraries

In [3]:
import pandas as pd

### Custom Modules

In [4]:
from modules import data_processing
from modules import plotting

# Extension to auto reload custom modules
%load_ext autoreload

%autoreload 1

%aimport modules.data_processing
%aimport modules.plotting

## Data Importing

### County GeoJson - polygons for choropleth plot

We are getting these data from [plotly](https://plotly.com/python/mapbox-county-choropleth/).

In [5]:
# Get county geojson
COVID_GEOJSON = data_processing.load_county_geojson()

Pulling geojson from file.


### County Coronavirus Data
We import data from [New York Times GitHub](https://github.com/nytimes/covid-19-data) to get county level coronavirus data.

In [6]:
# Get county data
COVID_COUNTIES_DF = data_processing.get_covid_county_data()

Retrieving Covid County data
Pulling county data from file.


### State Coronavirus Data

We get state level data from The Atlantic's [Covid Tracking Project](https://covidtracking.com/) through their [Data API](https://covidtracking.com/data/api).

In [7]:
# Get state data
COVID_STATES_DF = data_processing.get_covid_state_data()

Pulling state data from file.


Lets check nulls for fips codes from NYtimes

In [13]:
date_mask = (COVID_COUNTIES_DF['date'] == '2020-08-28')

fips_error_mask = (COVID_COUNTIES_DF['fips'].map(lambda x:len(x)) < 5)

print(len(COVID_COUNTIES_DF[fips_error_mask & date_mask]))
COVID_COUNTIES_DF[fips_error_mask & date_mask]

30


Unnamed: 0,date,county,state,fips,cases,deaths,fipsnum,log_deaths
476351,2020-08-28,Unknown,Alaska,,56,0,,0.0
476440,2020-08-28,Unknown,Arkansas,,815,0,,0.0
476574,2020-08-28,Unknown,Connecticut,,104,0,,0.0
476579,2020-08-28,Unknown,Delaware,,279,0,,0.0
476644,2020-08-28,Unknown,Florida,,959,0,,0.0
476793,2020-08-28,Unknown,Georgia,,2497,4,,1.609438
476809,2020-08-28,Unknown,Guam,,2256,11,,2.484907
476949,2020-08-28,Unknown,Illinois,,1645,207,,5.337538
477141,2020-08-28,Unknown,Iowa,,16,0,,0.0
477434,2020-08-28,Unknown,Louisiana,,350,163,,5.099866


We have 30 counties that have null fips codes. Some of them even have null county names. Not sure how to deal with this for the dashboard. NYC combined the five borroughs to represent the whole city.

------

### County Census Data

In [8]:
# Get Census Data
CENSUS_COUNTY_DF = data_processing.get_census_county_data()