# COVID-19 Analysis

### Step 1: Explore the modeled tables

I'm interested in analyzing some of the data out of China. Let's see the different ways the country is represented.

In [4]:
%%bigquery
select distinct country
from covid_19_model.Location
where country like '%China%'

Unnamed: 0,country
0,Mainland China
1,China


Using the information above, I can grab all the records related to China. Let's look at the cumulative totals for confirmed cases, deaths, and recovered for each day by joining the Location and Event tables.

In [5]:
%%bigquery
select cast(last_update as date) as date, sum(confirmed) as confirmed, sum(deaths) as deaths, sum(recovered) as recovered
from covid_19_model.Event e 
join covid_19_model.Location l on e.location_id = l.id
and country in ('Mainland China', 'China') 
group by date
order by date desc

Unnamed: 0,date,confirmed,deaths,recovered
0,2020-05-01,83956,4637,78523
1,2020-04-30,83944,4637,78474
2,2020-04-29,83940,4637,78422
3,2020-04-28,5427,39,4816
4,2020-04-27,75726,4560,70084
...,...,...,...,...
91,2020-01-26,2062,56,49
92,2020-01-25,1399,42,39
93,2020-01-24,916,26,36
94,2020-01-23,639,18,30


I'm also interested in data from the US. Let's see how the country is represented.

In [6]:
%%bigquery
select distinct country
from covid_19_model.Location
where country like 'U%'

Unnamed: 0,country
0,UK
1,Uzbekistan
2,Ukraine
3,Uruguay
4,Uganda
5,US
6,United Kingdom
7,United Arab Emirates


In [7]:
%%bigquery
select cast(last_update as date) as date, sum(confirmed) as confirmed, sum(deaths) as deaths, sum(recovered) as recovered
from covid_19_model.Event e 
join covid_19_model.Location l on e.location_id = l.id
and country in ('US') 
group by date
order by date desc

Unnamed: 0,date,confirmed,deaths,recovered
0,2020-05-01,407238,30223.0,153947.0
1,2020-04-30,396648,29635.0,120720.0
2,2020-04-29,386468,28637.0,115936.0
3,2020-04-28,372109,27521.0,111424.0
4,2020-04-27,370300,27465.0,106988.0
...,...,...,...,...
73,2020-01-26,5,,
74,2020-01-25,2,,
75,2020-01-24,2,,
76,2020-01-23,1,,


We can check out Italy as well, since that is also a point of interest.

In [8]:
%%bigquery
select cast(last_update as date) as date, sum(confirmed) as confirmed, sum(deaths) as deaths, sum(recovered) as recovered
from covid_19_model.Event e 
join covid_19_model.Location l on e.location_id = l.id
and country in ('Italy') 
group by date
order by date desc

Unnamed: 0,date,confirmed,deaths,recovered
0,2020-05-01,205463,27967,75945
1,2020-04-30,203591,27682,71252
2,2020-04-29,201505,27359,68941
3,2020-04-28,199414,26977,66624
4,2020-04-27,197675,26644,64928
...,...,...,...,...
56,2020-02-23,155,3,2
57,2020-02-22,62,2,1
58,2020-02-21,20,1,0
59,2020-02-07,3,0,0


Let's circle back to the US and look at some state data.

In [40]:
%%bigquery
select state, cast(last_update as date) as date, sum(confirmed) as confirmed, sum(deaths) as deaths, sum(recovered) as recovered
from covid_19_model.Event e 
join covid_19_model.Location l on e.location_id = l.id
where country = 'US' and state in ('Texas', 'New York', 'Washington')
group by date, state
order by date desc, state
limit 12

Unnamed: 0,state,date,confirmed,deaths,recovered
0,New York,2020-05-01,167478,18069,0
1,Texas,2020-05-01,6356,114,0
2,Washington,2020-05-01,6207,447,0
3,New York,2020-04-30,164841,18076,0
4,Texas,2020-04-30,6161,109,0
5,Washington,2020-04-30,6103,438,0
6,New York,2020-04-29,162338,17682,0
7,Texas,2020-04-29,5827,98,0
8,Washington,2020-04-29,6001,429,0
9,New York,2020-04-28,160489,17515,0


### Step 2: Create views so that we can access them in Data Studio

In [46]:
%%bigquery
create or replace view covid_19_model.v_china_cases as
select cast(last_update as date) as date, sum(confirmed) as confirmed, sum(deaths) as deaths, sum(recovered) as recovered
from `arcane-footing-266618.covid_19_model.Event` e join `arcane-footing-266618.covid_19_model.Location` l
on e.location_id = l.id
where country in ('Mainland China', 'China') 
group by date
order by date desc

In [47]:
%%bigquery
create or replace view covid_19_model.v_italy_cases as
select cast(last_update as date) as date, sum(confirmed) as confirmed, sum(deaths) as deaths, sum(recovered) as recovered
from `arcane-footing-266618.covid_19_model.Event` e join `arcane-footing-266618.covid_19_model.Location` l
on e.location_id = l.id
where country in ('Italy') 
group by date
order by date desc

In [48]:
%%bigquery
create or replace view covid_19_model.v_us_cases as
select cast(last_update as date) as date, sum(confirmed) as confirmed, sum(deaths) as deaths, sum(recovered) as recovered
from `arcane-footing-266618.covid_19_model.Event` e join `arcane-footing-266618.covid_19_model.Location` l
on e.location_id = l.id
where country in ('US') 
group by date
order by date desc

### Step 3: Explore the data in Data Studio

To explore this data, go into the BigQuery interface and click on one of our newly created views. Click on "Export" >> "Explore with Data Studio". The view defaults to a table, but you can click over at the right and view it as a time series chart. Under Metric, drag in "confirmed", "deaths", and "recovered" and take our "Record Count".

Looking at the data for China, I saw a large drop in the counts for confirmed, deaths, and recovered. Below, I manually wanted to look at the data to see what was going on. Looks like there were just a small number recorded those days.

In [50]:
%%bigquery
select cast(last_update as date) as date, sum(confirmed) as confirmed, sum(deaths) as deaths, sum(recovered) as recovered
from covid_19_model.Event e 
join covid_19_model.Location l on e.location_id = l.id
and country in ('Mainland China', 'China') and last_update > '2020-03-18T02:32:29' and last_update < '2020-03-31T02:32:29'
group by date
order by date desc

Unnamed: 0,date,confirmed,deaths,recovered
0,2020-03-31,70960,3198,66182
1,2020-03-30,78087,3245,71873
2,2020-03-29,4590,64,4466
3,2020-03-26,1654,34,1594
4,2020-03-24,672,24,648
5,2020-03-21,73602,3179,64253
6,2020-03-20,74902,3194,64987
7,2020-03-19,73609,3183,63069
8,2020-03-18,70924,3161,59742
