# Step 3 - Prepare Data - Transform

This notebook provides the Python code for taking Covid total cases at county level and creating state level aggregation.  It demonstrates how you can transform the data.  Its results are stored in an intermediate file for rest of exercises.

Q 3-2 USA Facts data has dates in columns, transform the data so it is indexed by date and has a row for each date for a county.
- State and county as index
- New column - Date
- New column - Total Cases

Students will be developing a similar notebook for total deaths.  The corresponding notebook is included in the answer section.

# Q 3-2 

3-2-1. Read USA Facts data on total cases

3-2-2. Filter LA County so you can check results on a sample

3-2-3. Transform LA County data and check results

3-2-4  Transform data for all counties

3-2-5  Write transformed data in output folder


## Import Libraries

In [1]:
import os
import pandas as pd
from datetime import date


# Set Input Folder

Depending on the Operating System you are using the file access may differ. 
Choose your operating system by setting its value to True and keep the rest False

In [2]:
using_Google_colab = False
using_Anaconda_on_Mac_or_Linux_or_Azure = True
using_Anaconda_on_windows = False

if using_Google_colab:
    dir_input = "/content/drive/MyDrive/COVID_Project/input"
    dir_output = "/content/drive/MyDrive/COVID_Project/output"
if using_Anaconda_on_Mac_or_Linux_or_Azure:
    dir_input = "../input"
    dir_output = "../output"
if using_Anaconda_on_windows:
    dir_input = r"..\input"   
    dir_output = r"..\output" 

## Connect to Google Drive

This step will only be executed if you have set environment flag using_Google_colab to True

In [3]:
if using_Google_colab:
    from google.colab import drive
    drive.mount('/content/drive')

### 3-2-1 Read Total Cases data

Read data
Fix data types for CountyFIPS and StateFIPS

In [4]:
df_total_cases = pd.read_csv(os.path.join(dir_input, "USA_Facts", "covid_confirmed_usafacts.csv"))
df_total_cases = df_total_cases.astype({'countyFIPS': str}).astype({'StateFIPS': str})
df_total_cases

Unnamed: 0,countyFIPS,County Name,State,StateFIPS,2020-01-22,2020-01-23,2020-01-24,2020-01-25,2020-01-26,2020-01-27,...,2022-01-18,2022-01-19,2022-01-20,2022-01-21,2022-01-22,2022-01-23,2022-01-24,2022-01-25,2022-01-26,2022-01-27
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,12738,12833,12928,13019,13019,13019,13251,13251,13251,13251
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,47143,47662,48338,49168,49168,49168,50313,50313,50313,50313
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,4741,4800,4843,4902,4902,4902,5054,5054,5054,5054
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,5385,5486,5565,5663,5663,5663,5795,5795,5795,5795
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3188,56037,Sweetwater County,WY,56,0,0,0,0,0,0,...,9082,9184,9241,9449,9449,9449,9609,9712,9810,10007
3189,56039,Teton County,WY,56,0,0,0,0,0,0,...,8531,8638,8741,8814,8814,8814,8960,9049,9121,9195
3190,56041,Uinta County,WY,56,0,0,0,0,0,0,...,4660,4751,4827,4927,4927,4927,5034,5081,5167,5222
3191,56043,Washakie County,WY,56,0,0,0,0,0,0,...,1994,2002,2023,2025,2025,2025,2041,2066,2093,2130


### 3-2-2 Filter data for LA County

In [5]:
df_total_cases_LA = df_total_cases[df_total_cases['County Name'] == 'Los Angeles County ']
df_total_cases_LA

Unnamed: 0,countyFIPS,County Name,State,StateFIPS,2020-01-22,2020-01-23,2020-01-24,2020-01-25,2020-01-26,2020-01-27,...,2022-01-18,2022-01-19,2022-01-20,2022-01-21,2022-01-22,2022-01-23,2022-01-24,2022-01-25,2022-01-26,2022-01-27
209,6037,Los Angeles County,CA,6,375,379,382,384,385,388,...,2276388,2343261,2367401,2384427,2390482,2430653,2453693,2468026,2472960,2473095


### 3-2-3 Transform LA County data to total cases by date

In [6]:
df_total_cases_LA_by_date = df_total_cases_LA.melt(id_vars=['State', 
                                                            'StateFIPS', 
                                                            'County Name',
                                                            'countyFIPS'],
                                                   var_name='Date', 
                                                   value_name='Total Cases')
df_total_cases_LA_by_date

Unnamed: 0,State,StateFIPS,County Name,countyFIPS,Date,Total Cases
0,CA,6,Los Angeles County,6037,2020-01-22,375
1,CA,6,Los Angeles County,6037,2020-01-23,379
2,CA,6,Los Angeles County,6037,2020-01-24,382
3,CA,6,Los Angeles County,6037,2020-01-25,384
4,CA,6,Los Angeles County,6037,2020-01-26,385
...,...,...,...,...,...,...
732,CA,6,Los Angeles County,6037,2022-01-23,2430653
733,CA,6,Los Angeles County,6037,2022-01-24,2453693
734,CA,6,Los Angeles County,6037,2022-01-25,2468026
735,CA,6,Los Angeles County,6037,2022-01-26,2472960


### 3-2-4 Transform all County data to total cases by date

In [7]:
df_total_county_cases_by_date = df_total_cases.melt(id_vars=['State', 
                                                      'StateFIPS', 
                                                      'County Name',
                                                      'countyFIPS'],
                                             var_name='Date', 
                                             value_name='Total Cases')
df_total_county_cases_by_date

Unnamed: 0,State,StateFIPS,County Name,countyFIPS,Date,Total Cases
0,AL,1,Statewide Unallocated,0,2020-01-22,0
1,AL,1,Autauga County,1001,2020-01-22,0
2,AL,1,Baldwin County,1003,2020-01-22,0
3,AL,1,Barbour County,1005,2020-01-22,0
4,AL,1,Bibb County,1007,2020-01-22,0
...,...,...,...,...,...,...
2353236,WY,56,Sweetwater County,56037,2022-01-27,10007
2353237,WY,56,Teton County,56039,2022-01-27,9195
2353238,WY,56,Uinta County,56041,2022-01-27,5222
2353239,WY,56,Washakie County,56043,2022-01-27,2130


### 3-2-5 Write Transformed data in output folder

In [8]:
df_total_county_cases_by_date.to_csv(os.path.join(dir_output, "total_county_cases_by_date.csv"))