# Covid-19: Exploratory Analysis
## Introduction
All of the data for this project has been taken from the following Kaggle project:
* https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset?select=time_series_covid_19_recovered.csv

This dataset contains information regarding numbers of cases (split by confirmed cases, deaths and recoveries), location, dates and a few text notes relating to symptoms, significance of individuals (e.g. 1st case from Wuhan) and a few other less relevant variables.

## Loading the Data
To begin with, we will simply load all the data files into memory.

In [33]:
# load libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# show plot in notebook
%matplotlib inline

# read data into dataframes from csv
ts_conf = pd.read_csv('../Data/time_series_confirmed.csv')
ts_rec = pd.read_csv('../Data/time_series_recovered.csv')
ts_dead = pd.read_csv('../Data/time_series_deaths.csv')
ts_all = pd.read_csv('../Data/covid_19_data.csv')
ts_conf_us = pd.read_csv('../Data/time_series_covid_19_confirmed_US.csv')
ts_dead_us = pd.read_csv('../Data/time_series_covid_19_deaths_US.csv')
patients_1 = pd.read_csv('../Data/line_list_data.csv')
patients_2 = pd.read_csv('../Data/COVID19_open_line_list.csv')

# convert date fields from string to datetime
ts_all['Last Update'] = pd.to_datetime(ts_all['Last Update'])

patients_1['reporting date'] = pd.to_datetime(patients_1['reporting date'])
patients_1['symptom_onset'] = pd.to_datetime(patients_1['symptom_onset'])
patients_1['hosp_visit_date'] = pd.to_datetime(patients_1['hosp_visit_date'])
patients_1['exposure_start'] = pd.to_datetime(patients_1['exposure_start'])
patients_1['exposure_end'] = pd.to_datetime(patients_1['exposure_end'])

## Initial Exploration
Key Starting Knowledge:
* The numbers of confirmed cases, deaths and recoveries in this dataset are cumulative on any given day and not individual daily counts.
* 

In [54]:
# extract categories and values
categories = ts_conf[['Province/State', 'Country/Region', 'Lat', 'Long']]
values = ts_conf.loc[:, '1/22/20':]

# flatten into 'date' and 'confirmed cases' cols
ts_conf_melt = pd.melt(ts_conf, id_vars = categories, value_vars = values,
                       var_name = 'Date', value_name = 'Confirmed Cases (cumulative))')

# convert 'date' column from string to datetime
ts_conf_melt['Date'] = pd.to_datetime(ts_conf_melt['Date'])

# write df to csv
ts_conf_melt.to_csv('../Outputs/Confirmed Cases.csv')