# Covid-19 examples

The data is provided by NYTimes which can be found here: https://github.com/nytimes/covid-19-data.  
The data file used here is state-based US data.

In [1]:
from chart_ipynb.chart_framework import ChartSuperClass
from chart_ipynb import utils, line, time_series, bar, scatter, bubble
import numpy as np
import pandas as pd

In [2]:
path = '../data/us-states.csv'
data = pd.read_csv(path)

In [3]:
data.head()

Unnamed: 0,date,state,fips,cases,deaths
0,2020-01-21,Washington,53,1,0
1,2020-01-22,Washington,53,1,0
2,2020-01-23,Washington,53,1,0
3,2020-01-24,Illinois,17,1,0
4,2020-01-24,Washington,53,1,0


First, we define a function called get_state_data to extract information based on states.

In [4]:
def get_state_data(data, state, start=None, end=None):
    """
    data: pd.DataFrame
    state: a str of state or a list of states
    """
    if start is None:
        start = data.iloc[0].date
    if end is None:
        end = data.iloc[-1].date
    data = data[(data.date >= start) & (data.date <= end)]
    states = data.reset_index().groupby('state')
    if isinstance(state, str):
        state = [state]
    state_data = dict()
    for s in state:
        idx = states.groups[s]
        state_data[s] = data.iloc[idx].reset_index()
    return state_data

## Covid-19 Cases by States

We can compare the number of Covid-19 cases of New York and New Jersey starting from March 1st. 

In [5]:
state = ['New York', 'New Jersey']
start = '2020-03-01'
state_data = get_state_data(data, state, start = start)
input_dataset = [state_data[s] for s in state]

`time_series_Chart` support two types of charts: line and bar. More details about time series function can be found [here](https://github.com/AaronWatters/Chart_ipynb/blob/master/notebooks/time%20series%20example.ipynb).

### Line Chart

In [6]:
time_series.time_series_Chart('line', state, 'cases', date_col = 'date', colors = ['red', 'blue'], 
                           data_provide = True, title = 'Covid-19 Cases - line chart',
                           input_dataset = input_dataset,
                           multi_axis = True)

Line(status='deferring flush until render')

Two states had their very first case on March 4th, and the number of cases both leaped to hundred thousand during next two months.

### Bar Chart

In [7]:
time_series.time_series_Chart('bar', state, 'cases', date_col = 'date', colors = ['red', 'blue'], 
                               data_provide = True, title = 'Covid-19 Cases - bar chart',
                               input_dataset = input_dataset, 
                               stacked = True)

Bar(status='deferring flush until render')

From the stacked bar chart, we can notice that the number of Covid-19 cases doubled in New York state.

## Daily Increase  in Cases by States  

Next, we can look at the daily increase cases for New York and New Jersey.  

The following function is defined to extract the number of cases increasing daily.

In [8]:
def get_daily_increase(data, state, start=None, end=None):
    df = get_state_data(data, state, start=start, end=end)
    daily_df = {}
    for i in df:
        temp = df[i].set_index('date')[['cases','deaths']].diff().reset_index().fillna(0)
        daily_df[i] = temp
    return daily_df

In [9]:
daily_data = get_daily_increase(data, state, start=start, end=None)
input_dataset = [daily_data[s] for s in state]

In [10]:
time_series.time_series_Chart('line', state, 'cases', date_col = 'date', colors = ['red', 'blue'], 
                               data_provide = True, title = 'Covid-19 Daily Increase in Cases - line chart',
                               input_dataset = input_dataset)

Line(status='deferring flush until render')

From the end of March to the beginning of May, the number of cases increasing by day were more than 4000. On April 7th, the number of daily increase in cases reached 12,000. 

## Cases and Deaths - Population

An additional dataset about population by states will be used to help demonstrate the information, which can be found [here](https://worldpopulationreview.com/states/).

In [11]:
pop = pd.read_csv('../data/us-population.csv')
population = pop.set_index('State').to_dict('dict')['Pop'] #key: state; value: population

In [12]:
end_date = '2020-06-24'

covid_bubble = bubble.Bubble()

k=0
for s in population:
    df = data[(data.state==s)&(data.date==end_date)]
    x = round(float(df['cases'].values[0]/population[s]*100),2)
    y = round(float(df['deaths'].values[0]/population[s]*100),2)
    r = round(float(np.log(population[s])),2)
    dataset = [{'x':x, 'y': y}]
    covid_bubble.add_dataset(
                            dataset, s,
                            radius = r,
                            backgroundColor = utils.color_rgb(utils.color_name[k], 0.5),
                            borderColor = utils.color_rgb(utils.color_name[k]),
                            hoverBackgroundColor = 'transparent',
                            hoverBorderColor = utils.color_rgb(utils.color_name[k])
                            )
    k+=1

In [13]:
covid_bubble.options.update({'scales':{
    'xAxes':[utils.axes(
                            display = True,
                            scaleLabel = {
                                'display': True,
                                'labelString': 'Positive test/Population'
                                }
                        )],
    'yAxes':[utils.axes(
                            display = True,
                            scaleLabel = {
                                'display': True,
                                'labelString': 'Deaths/Population'
                                }
                        )]
    }})
covid_bubble.set_title("Covid-19 Deaths/Population vs. Cases/Population")
covid_bubble.setup(width=1600)
covid_bubble

Bubble(status='deferring flush until render')