# Analysis of COVID-19 data
With help from Carmen and her associates!

In [2]:
from pathlib import Path
from process_covid import (load_covid_data,
                           cases_per_population_by_age,
                           hospital_vs_confirmed,
                           create_confirmed_plot,
                           count_high_rain_low_tests_days)

The data for each area is held in a specific file. Start by loading it in.

In [3]:
data_directory = Path("covid_data")
data_file = "ER-Mi-EV_2020-03-16_2020-04-24.json"
data_er = load_covid_data(data_directory / data_file)

NotImplementedError: 

And now I can use this variable to do my different analyses.

First, I want to see how the number of cases has changed across time, but separated into age groups. This will help me find age-dependent patterns in the spread of the virus.

In [None]:
cases_population = cases_per_population_by_age(data_er)
cases_population.get('0-24', "No data in that bin")

I am also interested in how many cases end up in hospital. Specifically, I want to look at the ratio
$$\frac{\textrm{people hospitalised}}{\textrm{confirmed cases}}$$
and how it changes over time.

I haven't decided what exactly I'll do with it yet, so for now I only want to get two lists: one with the dates on which the ratio is computed, and another with its corrsponding values.

In [None]:
hosp_conf_dates, hosp_conf_ratio = hospital_vs_confirmed(data_er)
for date, ratio in zip(hosp_conf_dates[:5], hosp_conf_ratio[:5]):
    print(f" {date}: {ratio:.2f}")

Plots will be crucial for getting the information across efficiently. Carmen says that this one function is flexible enough to process the data in different ways. One thing I want to see is the evolution of confirmed cases grouped by the patient's sex. This command should plot two lines, one each for male and female:

In [None]:
create_confirmed_plot(data_er, sex=True)

However, I also want to break it down by age instead of sex. In particular, I want to see the cases involving people
- up to age 15 (or the age bin they belong to);
- up to age 37;
- and up to age 99

all in the same plot.

In [None]:
create_confirmed_plot(data_er, max_ages=[15, 37, 99])

Finally, I want to see if the weather affects how likely people are to get tested. To simplify, I'll consider a day to be "rainy" if it rained more than the previous day. Out of those rainy days, on how many were there fewer tests carried out than the previous day? Because the data will be noisy, I first want to smooth the data by replacing each value with the average of the values in a 7-day window around it. Then I will use the smoothened values for this calculation instead of the originals.

Carmen says that this one line should do all that:

In [None]:
ratio = count_high_rain_low_tests_days(data_er)
print(f"A {ratio * 100:6.2f}% of rainy days affected the number of tests")

Let's see what works!