## JHU COVID-19 Daily Reports Data  

The purpose of the notebook is to evaluate the coronavirus (COVID-19) data provided by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU).  This workbook only evaluates the time series dataset provided by JHU-CSSE.  

__Data Source:__  [CSSE-JHU COVID-19 data repository](https://github.com/CSSEGISandData/COVID-19)

#### Python Libraries

In [1]:
import pandas as pd
from pathlib import Path

#### Load csv data into dataframes

In [8]:
start_date = 20200306  #date of first KY confirmed case
end_date = 20200310  # last date of old file format
filePath = Path("data")   # the file path for data
flag=0

In [20]:
def retrieve_daily_data(file_string):

    daily_rpts_url = 'https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_daily_reports/'+file_string
    daily_df = pd.read_csv(daily_rpts_url)

    df = daily_df[daily_df['Province/State'].notnull()].copy()
    df = df[df['Province/State'].str.contains(',') & (df['Country/Region']=='US')].sort_values('Province/State').copy()
    df = df.rename(columns={'Province/State': 'county_state'})

    split_county_state=df.county_state.str.split(", ",expand=True)
    df['county']=split_county_state[0]
    df['state']=split_county_state[1]
    df = df[df.state=='KY'].copy()
    df_list = df.values.tolist()
    return df_list

#### Main processing loop

In [31]:
df_list=[]
for increment in range(start_date, end_date):
    file_string = str(increment)+'_ky_covid19_case_information.pdf'
    file_string = str(increment)[4:6]+'-'+str(increment)[6:8]+'-'+str(increment)[:4]+'.csv'
    print (file_string)
    df_list=retrieve_daily_data(file_string)
    

03-06-2020.csv
03-07-2020.csv
03-08-2020.csv
03-09-2020.csv


In [32]:
print(df_list)

[['Fayette County, KY', 'US', '2020-03-06T23:23:03', 1, 0, 0, 38.0606, -84.4803, 'Fayette County', 'KY'], ['Harrison County, KY', 'US', '2020-03-09T08:43:03', 2, 0, 0, 38.4333, -84.3542, 'Harrison County', 'KY'], ['Jefferson County, KY', 'US', '2020-03-09T00:23:10', 1, 0, 0, 38.1938, -85.6435, 'Jefferson County', 'KY']]


In [34]:
df = pd.DataFrame(df_list, columns =['county_state','country_region','last_udate','confirmed','deaths','recovered','latitude','longitude','county','state'])
df

Unnamed: 0,county_state,country_region,last_udate,confirmed,deaths,recovered,latitude,longitude,county,state
0,"Fayette County, KY",US,2020-03-06T23:23:03,1,0,0,38.0606,-84.4803,Fayette County,KY
1,"Harrison County, KY",US,2020-03-09T08:43:03,2,0,0,38.4333,-84.3542,Harrison County,KY
2,"Jefferson County, KY",US,2020-03-09T00:23:10,1,0,0,38.1938,-85.6435,Jefferson County,KY


In [35]:
file_name = 'ky_base.csv'
file_out = filePath.joinpath(file_name)  # path and filename

df.to_csv(file_out)  # output to csv