# Our World in Data analysis notebook

This notebook contains analysis done largely for CBC News and the COVID Brief newsletter. It uses [one dataset from Our World in Data](https://ourworldindata.org/explorers/coronavirus-data-explorer?zoomToSelection=true&time=2020-03-01..latest&country=USA~GBR~CAN~DEU~ITA~IND&region=World&pickerMetric=location&pickerSort=asc&Interval=7-day+rolling+average&Relative+to+Population=true&Metric=Confirmed+cases&Color+by+test+positivity=false).

## IMPORT - Modules and raw data

This handles import of the code directly from OWID and some libraries that are used for analysis.

First, we'll important pandas and numpy, both of which I use in analysis down below.

In [2]:
import pandas as pd
import numpy as  np

Read in the data we're using straight from OWID's servers.

In [3]:
raw = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv')

## 2021/11/15 - Germany deaths

In [None]:
germany = raw[raw["location"].isin(["Germany"])]
germany = germany[["iso_code", "date", "total_deaths"]]

## 2021/11/15 - Which countries have surpassed 100K deaths?

Find when countries passed the 100K deaths marker.

In [None]:
thousand_deaths = (raw[raw["total_deaths"] >= 100000]
                   .sort_values("date", ascending=True)
                   .drop_duplicates('iso_code')
                   )
thousand_deaths = thousand_deaths[~thousand_deaths["iso_code"].str.contains("OWID")]
thousand_deaths = thousand_deaths[["iso_code", "location", "date", "total_deaths"]]

dates = thousand_deaths['date'].to_list()

world_deaths = raw[raw["location"] == "World"]
world_deaths = world_deaths[world_deaths["date"].isin(dates)]

world_deaths = world_deaths[["date", "total_deaths"]]

export = thousand_deaths.merge(world_deaths, on="date")

## 2021/11/15 - Deaths around the world

In [None]:
world_timeline = raw[raw["location"] == "World"]
world_timeline = world_timeline[["date", 'total_deaths']]
world_timeline["total_deaths"] = world_timeline["total_deaths"].dropna().astype(int, errors="ignore")

## 2021/11/25 - PAHO countries

A look at how PAHO countries are faring.

In [None]:
paho_countries = [
    "United States",
    "Brazil",
    "Argentina",
    "Colombia",
    "Mexico",
    "Peru",
    "Canada",
    "Chile",
    "Cuba",
    "Guatamala",
    "Costa Rica",
    "Bolivia",
    "Ecuador",
    "Panama",
    "Paraguay",
    "Venezuela",
    "Dominican Republic",
    "Uruguay",
    "Honduras",
    "Puerto Rico"
]

paho = raw[raw["location"].isin(paho_countries)]

paho = (paho[['iso_code', 'location', 'date', 'total_cases', 'total_deaths']]
        .sort_values(["location", "date"], ascending=False)
        .drop_duplicates("location")
        )
paho["CFR %"] = paho["total_deaths"] / paho["total_cases"] * 100
paho = paho.sort_values('CFR %', ascending=False)

## 2021/12/14 - Belgium and the Netherlands

In [None]:
netherlands = raw[raw["location"] == "Netherlands"]
belgium = raw[raw["location"] == "Belgium"]

countries = [belgium, netherlands]

for country in countries:
  country["new_deaths_7day"] = country["new_deaths"].rolling(7).mean()
  country["new_cases_7day"] = country["new_cases"].rolling(7).mean()
  country = country.reset_index()
  country = (country[['location', 'date', 'new_cases', 'new_cases_7day', 'new_deaths', 'new_deaths_7day', 'total_cases_per_million', "total_deaths_per_million"]]
             .dropna()
             )

## 2021/12/16 - Canada compared to world

In [None]:
top_data = raw[raw["location"].isin(["Canada", "United States", "United Kingdom", "France", "Italy", "Japan", "Germany"])]
top_data = top_data[top_data["date"] >= "2021-06-01"]
pivot = pd.pivot(top_data, columns="location", index="date", values="new_cases_per_million").rolling(7).mean()

## 2021/12/17 - Canada new cases and deaths

In [None]:
canada = raw[raw["location"] == "Canada"]
canada = canada[["date", "new_cases", "new_deaths", "hosp_patients"]].set_index("date")
canada = canada.rolling(7).mean()
max_deaths = canada["new_deaths"].max()
max_hosps = canada["hosp_patients"].max()
max_cases = canada["new_cases"].max()

canada["new_cases"] = canada["new_cases"] / max_cases *100
canada["hosp_patients"] = canada["hosp_patients"] / max_hosps *100
canada["new_deaths"] = canada["new_deaths"] / max_deaths *100

## 2022/01/06 - Worldwide new case rates

In [None]:
today = raw.sort_values("new_cases_per_million", ascending=False).drop_duplicates("location")
today = today.dropna(subset=["continent"])
today = today.sort_values("new_cases_per_million", ascending=False)
today = today[today["population"] > 1000000]
today.index = np.arange(1, len(today) + 1)

today = today[["location", "date", "new_cases_per_million"]]
canada = today[today["location"] == "Canada"]

all = pd.concat([today.head(50), canada])

Unnamed: 0,location,date,new_cases_per_million
1,Spain,2022-01-03,7974.421
2,Denmark,2021-12-27,7058.811
3,Ireland,2022-01-03,6834.769
4,Palestine,2021-10-06,5812.257
5,France,2022-01-05,4917.571
6,Greece,2022-01-04,4838.803
7,Switzerland,2022-01-03,4411.339
8,Sweden,2022-01-04,4229.166
9,Belgium,2021-11-29,4112.33
10,Portugal,2022-01-05,3891.65


In [None]:
g7 = raw[raw["location"].isin(["Canada", "United States", "United Kingdom", "France", "Italy", "Japan", "Germany"])]
today = g7.sort_values("people_vaccinated_per_hundred", ascending=False).drop_duplicates("location")
today = today.reset_index()
today = today[["location", "date", "people_vaccinated_per_hundred"]]

display(today)

Unnamed: 0,location,date,people_vaccinated_per_hundred
0,Canada,2021-12-22,82.96
1,Italy,2021-12-22,79.56
2,Japan,2021-12-22,79.53
3,France,2021-12-21,77.85
4,United Kingdom,2021-12-21,75.62
5,Germany,2021-12-21,72.95
6,United States,2021-12-22,72.76


## 2022/01/04 - Positive test rate

In [None]:
positivity = raw.dropna(subset=["continent"]).dropna(subset=["positive_rate"])
positivity = positivity.sort_values("date", ascending=False).drop_duplicates("location")
positivity = positivity[["location", "date", "positive_rate"]].sort_values("positive_rate", ascending=False).set_index("location")

## 2022/01/06 - Sweden

In [None]:
today = (raw
         .sort_values("new_deaths_per_million", ascending=False)
         .drop_duplicates("location")
         .dropna(subset=["continent"])
         )
today = today[today["population"] > 1000000]
today.index = np.arange(1, len(today) + 1)

today = today[["location", "date", "new_deaths_per_million"]]
canada = today[today["location"] == "Canada"]

all = pd.concat([today.head(50), canada])

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,total_cases_per_million,new_cases_per_million,new_cases_smoothed_per_million,total_deaths_per_million,new_deaths_per_million,new_deaths_smoothed_per_million,reproduction_rate,icu_patients,icu_patients_per_million,hosp_patients,hosp_patients_per_million,weekly_icu_admissions,weekly_icu_admissions_per_million,weekly_hosp_admissions,weekly_hosp_admissions_per_million,new_tests,total_tests,total_tests_per_thousand,new_tests_per_thousand,new_tests_smoothed,new_tests_smoothed_per_thousand,positive_rate,tests_per_case,tests_units,total_vaccinations,people_vaccinated,people_fully_vaccinated,total_boosters,new_vaccinations,new_vaccinations_smoothed,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,total_boosters_per_hundred,new_vaccinations_smoothed_per_million,new_people_vaccinated_smoothed,new_people_vaccinated_smoothed_per_hundred,stringency_index,population,population_density,median_age,aged_65_older,aged_70_older,gdp_per_capita,extreme_poverty,cardiovasc_death_rate,diabetes_prevalence,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
29998,CHN,Asia,China,2020-04-17,82694.0,353.0,112.429,4632.0,1290.0,185.143,57.259,0.244,0.078,3.207,0.893,0.128,1.13,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,56.94,1.444216e+09,147.674,38.7,10.641,5.929,15308.712,0.7,261.899,9.74,1.9,48.4,,4.34,76.91,0.761,,,,
66574,IND,Asia,India,2021-05-18,25496330.0,267334.0,307913.143,283248.0,4529.0,4150.143,18297.807,191.856,220.978,203.277,3.250,2.978,0.84,,,,,,,,,1869223.0,318292881.0,228.427,1.341,1813242.0,1.301,0.1698,5.9,samples tested,185191602.0,144270200.0,40921402.0,,1374398.0,1618423.0,13.29,10.35,2.94,,1161.0,1101836.0,0.079,81.94,1.393409e+09,450.419,28.2,5.989,3.414,6426.674,21.2,282.280,10.39,1.9,20.6,59.550,0.53,69.66,0.645,,,,
149449,USA,North America,United States,2021-01-20,24611999.0,188540.0,194439.571,412893.0,4442.0,3082.857,73928.761,566.331,584.052,1240.235,13.343,9.260,0.85,27003.0,81.111,117143.0,351.871,,,103154.0,309.851,2143113.0,292002519.0,877.108,6.437,1724767.0,5.181,0.1070,9.3,tests performed,22743126.0,18860482.0,3457123.0,,1596076.0,1047775.0,6.85,5.68,1.04,,3156.0,833607.0,0.251,71.76,3.329151e+08,35.608,38.3,15.413,9.732,54225.446,1.2,151.089,10.79,19.1,24.6,,2.77,78.86,0.926,,,,
67341,IDN,Asia,Indonesia,2021-07-27,3239936.0,45203.0,41411.143,86835.0,2069.0,1519.286,11723.531,163.565,149.844,314.208,7.487,5.497,0.98,,,,,,,,,180202.0,17189001.0,62.197,0.652,164697.0,0.596,0.2514,4.0,people tested,63944892.0,45278549.0,18666343.0,,1086694.0,735561.0,23.14,16.38,6.75,,2662.0,419125.0,0.152,71.76,2.763618e+08,145.725,29.3,5.319,3.053,11188.744,5.7,342.864,6.32,2.8,76.1,64.204,1.04,71.72,0.718,,,,
109639,PAK,Asia,Pakistan,2020-11-19,368665.0,2738.0,2338.429,7561.0,313.0,67.000,1637.056,12.158,10.384,33.575,1.390,0.298,1.27,,,,,,,,,36899.0,5055382.0,22.448,0.164,35029.0,0.156,0.0668,15.0,tests performed,,,,,,,,,,,,,,47.69,2.251999e+08,255.573,23.5,4.495,2.780,5034.708,4.0,423.031,8.35,2.8,36.7,59.607,0.60,67.27,0.557,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
105431,NIU,Oceania,Niue,2021-06-21,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,,,,,0.00,0.00,,,,,,,1.614000e+03,,,,,,,,,,,,,73.71,,,,,
142755,TKL,Oceania,Tokelau,2021-06-21,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,,,,,0.00,0.00,,,,,,,1.368000e+03,,,,,,,,,,,,,81.86,,,,,
152373,VAT,Europe,Vatican,2020-03-06,1.0,1.0,,,,,1231.527,1231.527,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,8.120000e+02,,,,,,,,,,,,,75.12,,,,,
114475,PCN,Oceania,Pitcairn,2021-06-15,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,47.0,47.0,,,,,100.00,100.00,,,,,,,4.700000e+01,,,,,,,,,,,,,,,,,,


Unnamed: 0,location,date,new_deaths_per_million
1,Bolivia,2020-09-07,139.948
2,Kazakhstan,2021-03-21,120.611
3,Kyrgyzstan,2020-07-18,109.68
4,Lesotho,2021-10-01,106.527
5,Argentina,2020-10-01,73.477
6,Bosnia and Herzegovina,2021-03-29,63.43
7,Botswana,2021-08-12,58.818
8,Namibia,2021-07-14,57.975
9,Chile,2020-07-17,55.017
10,Lebanon,2021-01-30,51.853


## 2022/01/06 - Continent/income comparison

In [None]:
continents = (raw
              .sort_values("new_cases_per_million", ascending=False)
              .drop_duplicates("location")
              )
continents = continents[continents["location"].isin(["Africa", "Europe", "Asia", "North America", "South America", "Oceania"])]
continents = continents.sort_values("new_cases_per_million", ascending=False)
continents = continents[continents["population"] > 1000000]
continents.index = np.arange(1, len(continents) + 1)

continents = continents[["location", "date", "new_cases_per_million"]]
canada = continents[continents["location"] == "Canada"]

all = pd.concat([continents.head(50), canada])

## 2022/01/12 - Commonwealth countries

In [None]:
countries = ["Australia", "New Zealand", "United Kingdom", "United States", "Canada"]

subset = raw[raw["location"].isin(countries)]
subset = subset[subset["date"] >= "2021-01-13"]

pivot = (pd.pivot(subset, columns="location", index="date", values="new_cases_per_million")
         .rolling(7)
         .mean()
         )

## 2022/01/14 - Canada

In [None]:
canada = raw[raw["location"] == "Canada"]
canada = canada[["date", "new_cases", "new_deaths", "hosp_patients"]].set_index("date")

## 2022/01/20 - Austria

In [None]:
austria = raw[raw["location"].isin(["Austria", "World"])]
austria = austria[["location", "date", "hosp_patients_per_million"]]
austria = pd.pivot(austria, index="date", columns="location", values="hosp_patients_per_million")

## 2022/01/28 - Sweden and the world over time

Sweden news cases over time.

In [None]:
sweden = raw[raw["location"] == "Sweden"]
sweden = sweden[["date", "new_cases"]]
sweden["new_cases"] = sweden["new_cases"].rolling(7).mean()
sweden = sweden.set_index("date")

The world, new cases over time.

In [None]:
world = raw[raw["location"] == "World"]
world = world[["date", "new_cases"]]
world["new_cases"] = world["new_cases"].rolling(7).mean()
world = world.set_index("date")

## 2022/01/27 - Ranking continents

Resulted in [this](https://www.datawrapper.de/_/UpKbt/) visualization.

In [5]:
continents = (raw[raw["location"].isin(["Africa", "Europe", "Asia", "North America", "South America", "Oceania"])]
              .pivot_table(columns="location", index="date", values="new_cases_per_million")
              .dropna()
              )
continents.index = pd.to_datetime(continents.index)

continents = continents.groupby([continents.index.year.values,continents.index.month.values]).sum().reset_index()
continents = continents.rename(columns={"level_0": "year", "level_1": "month"})
continents["month"] = continents["year"].astype(str) + "-" + continents["month"].astype(str)

continents = continents.drop(columns=["year"])
continents = continents.melt(id_vars="month")

dates = continents["month"].unique()

ranked = []

for date in dates:
  top10 = continents[continents["month"] == date].sort_values('value', ascending=False)
  top10["rank"] = range(1, len(top10)+1)
  ranked.append(top10)

all_ranked = pd.concat(ranked)

display(all_ranked)

Unnamed: 0,month,location,value,rank
52,2020-2,Europe,1.917,1
26,2020-2,Asia,1.633,2
104,2020-2,Oceania,0.254,3
78,2020-2,North America,0.057,4
130,2020-2,South America,0.009,5
...,...,...,...,...
77,2022-3,Europe,11884.995,2
155,2022-3,South America,2237.993,3
51,2022-3,Asia,2060.476,4
103,2022-3,North America,1201.804,5


In [None]:
pivot = pd.pivot(all_ranked, columns="month", index="location", values="rank")

display(pivot)
pivot.to_csv('/content/drive/MyDrive/Data/exports/covid/continents_rank.csv')

month,2020-10,2020-11,2020-12,2020-2,2020-3,2020-4,2020-5,2020-6,2020-7,2020-8,2020-9,2021-1,2021-10,2021-11,2021-12,2021-2,2021-3,2021-4,2021-5,2021-6,2021-7,2021-8,2021-9,2022-1
location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
Africa,5,5,5,6,6,6,5,5,5,5,5,4,6,6,5,5,5,5,5,5,5,6,6,6
Asia,4,4,4,2,5,4,4,4,4,4,4,5,5,5,6,4,4,4,3,4,4,4,5,5
Europe,1,1,2,1,1,2,3,3,3,3,3,2,1,1,1,1,1,2,2,2,2,2,2,2
North America,2,2,1,4,2,1,2,2,2,2,2,1,2,2,2,3,3,3,4,3,3,1,1,3
Oceania,6,6,6,3,3,5,6,6,6,6,6,6,3,4,3,6,6,6,6,6,6,5,4,1
South America,3,3,3,5,4,3,1,1,1,1,1,3,4,3,4,2,2,1,1,1,1,3,3,4


## 2022/01/28 -  Reuter's arrows

[This](https://www.datawrapper.de/_/nDOb1/) is the result of this analysis: 

In [None]:
arrows = raw[raw["date"].isin(["2021-01-28", "2021-01-14"])]
arrows = arrows[arrows["population"] > 1000000]
arrows = arrows.dropna(subset=["continent"])
arrows = pd.pivot_table(arrows, index=["location", "continent"], columns="date", values="new_cases_per_million")
arrows["diff"] = arrows["2021-01-28"] - arrows["2021-01-14"]
arrows["diff"] = arrows["diff"].astype(int)
arrows = arrows.reset_index()

data = pd.DataFrame({"countries": ["", "", ""]}, index=["Positives", "Negatives", "No change"])

arrows_pos = arrows[arrows['diff'] > 0]
arrows_pos["text"] = arrows_pos["location"] + " (+" + arrows_pos["diff"].astype(int).astype(str) + ")"
data.at["Positives", "countries"] = ', '.join(arrows_pos["text"])

arrows_neg = arrows[arrows['diff'] < 0]
arrows_neg["text"] = arrows_neg["location"] + " (" + arrows_neg["diff"].astype(int).astype(str) + ")"
data.at["Negatives", "countries"] = ', '.join(arrows_neg["text"])

arrows_none = arrows[arrows['diff'] == 0]
data.at["No change", "countries"] = ', '.join(arrows_none["location"])

display(data)
data.to_csv('/content/drive/MyDrive/Data/exports/covid/country_incdec.csv')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if sys.path[0] == '':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  app.launch_new_instance()


Unnamed: 0,countries
Positives,"Albania (+79), Bahrain (+25), Belgium (+11), B..."
Negatives,"Afghanistan (-1), Angola (-2), Argentina (-83)..."
No change,"Algeria, Australia, Benin, Burundi, Cambodia, ..."


## 2022/02/07 -  COVID brief, booster rates

In [7]:
countries = ["Canada", "United States", "Italy", "France", "Germany", "Spain", "United Kingdom", "Japan", "Israel", "World", "Chile"]

target = raw[raw["location"].isin(countries)]
target = target.sort_values("total_boosters_per_hundred", ascending=False).drop_duplicates("location")
target = target[["location", "date", "total_boosters_per_hundred"]].set_index("location")

display(target)

Unnamed: 0_level_0,date,total_boosters_per_hundred
location,Unnamed: 1_level_1,Unnamed: 2_level_1
Chile,2022-02-03,66.77
Italy,2022-02-06,58.14
Israel,2022-02-06,55.05
United Kingdom,2022-02-05,55.02
Germany,2022-02-04,53.71
France,2022-02-03,48.98
Spain,2022-02-03,47.65
Canada,2022-02-06,42.45
United States,2022-02-04,27.01
World,2022-02-06,12.93


## 2022/03/10 - COVID brief, 6 countries


In [7]:
six = raw[raw["location"].isin(["South Korea", "Hong Kong", "Singapore", "Vietnam", "Malaysia", "Japan"])]
six["new_cases_per_million"] = six["new_cases_per_million"].rolling(7).mean()
six = six[six["date"] >= "2022-01-01"]

six = pd.pivot(six, index="date", columns="location", values="new_cases_per_million")
display(six)
six.to_clipboard()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  six["new_cases_per_million"] = six["new_cases_per_million"].rolling(7).mean()


location,Hong Kong,Japan,Malaysia,Singapore,South Korea,Vietnam
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2022-01-01,1.645714,2.725571,100.556429,60.405714,88.044571,159.546000
2022-01-02,1.967143,3.011143,101.009714,66.168714,85.029000,162.063429
2022-01-03,2.345429,3.588000,100.717714,70.988571,82.681714,163.611857
2022-01-04,2.969714,4.535429,100.478000,83.483571,80.000286,174.217429
2022-01-05,3.423571,6.906429,98.678000,95.638000,77.477571,178.769286
...,...,...,...,...,...,...
2022-03-05,5705.623000,510.003143,871.721429,3222.275143,4069.271286,1474.636000
2022-03-06,5680.069571,498.980429,884.662000,3194.246429,4267.240714,1620.670143
2022-03-07,5503.862429,483.097857,901.032714,3193.617714,4444.669286,1656.873000
2022-03-08,5425.897000,470.143857,925.597571,2562.841714,4787.686714,1774.266857
