# 2020-nCoV Global Cases by Wooil Jeong

- **Wooil Jeong**  
[Dashboard by WooilJeong](https://plot.ly/dashboard/coronavirus:34/present#/)  
[Blog](https://wooiljeong.github.io/etc/corona_dash/)  
[Github Repository](https://github.com/WooilJeong/novel_coronavirus)  


- **Novel Coronavirus (2019-nCoV) Cases, provided by JHU CSSE**  
[Dashboard by JHU CSSE](https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6)  
[Old Data Sheets](https://docs.google.com/spreadsheets/d/1yZv9w9zRKwrGTaR-YzmAqMefw4wMlaXocejdxZaTs6w/htmlview?usp=sharing&sle=true#)  
[New Google Sheet Link (support comments)](https://docs.google.com/spreadsheets/d/1wQVypefm946ch4XDp37uZ-wartW4V7ILdg-qYiDXUHM/edit?usp=sharing)  
[Time series google sheet](https://docs.google.com/spreadsheets/d/1UF2pSkFTURko2OvfHWWlFpDFAr1UxCBA4JLwlSP6KFo/edit?usp=sharing)  
[Github Repository](https://github.com/CSSEGISandData/COVID-19)  


- **Contact**  
email : wooil@kakao.com  

## Dataset Pipeline

In [None]:
import pandas as pd
import numpy as np

In [None]:
url_confirmed = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"
url_deaths = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv"
url_recovered = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Recovered.csv"

Confirmed = pd.read_csv(url_confirmed)
Deaths = pd.read_csv(url_deaths)
Recovered = pd.read_csv(url_recovered)

## Reshape Dataset

In [None]:
id_vars=['Province/State',
         'Country/Region',
         'Lat',
         'Long'
         ]

df_names=['Confirmed','Deaths','Recovered']
for i in df_names:
    globals()['df_'+i] = pd.melt(globals()[i],
                                 id_vars=id_vars,
                                 var_name='Last Update',
                                 value_name=i,
                                ).sort_values('Last Update', ascending=True)
    globals()['df_'+i].index=range(len(globals()['df_'+i]))
    
df = pd.merge(df_Confirmed, df_Deaths, how='left')
df = pd.merge(df, df_Recovered, how='left')
# df = df.dropna()

## Pre-Processing

### Date type

In [None]:
df['Last Update']=pd.to_datetime(df['Last Update'])

In [None]:
# # Replace spaces with zeros
# df.loc[df['Province/State']=='', 'Province/State'] = 'None'
# df.loc[df['Confirmed']=='', 'Confirmed'] = 0
# df.loc[df['Death']=='', 'Death'] = 0
# df.loc[df['Recovered']=='', 'Recovered'] = 0
# df.loc[df['Confirmed']=='`', 'Confirmed'] = 0
df.loc[df['Province/State'].isna(), 'Province/State'] = 'None'

# Data type conversion
df['Lat'] = pd.to_numeric(df['Lat'])
df['Long'] = pd.to_numeric(df['Long'])
df['Confirmed'] = pd.to_numeric(df['Confirmed'])
df['Deaths'] = pd.to_numeric(df['Deaths'])
df['Recovered'] = pd.to_numeric(df['Recovered'])

In [None]:
# # Fill Na with Zeros
# df = df.fillna(0)

# Sort
df=df.sort_values('Last Update', ascending=True)

## Save Dataset

In [None]:
import os
import datetime

if not os.path.exists('Data'):
    os.mkdir('Data')
    
now=datetime.datetime.strftime(datetime.datetime.now(), "%Y%m%d_%H%M")
save_path = "Data/Dataset_"+now+".csv"
df.to_csv(save_path, index=False, encoding='utf-8')