# Our World in Data analysis notebook

This notebook contains analysis done largely for CBC News and the COVID Brief newsletter. It uses [one dataset from Our World in Data](https://ourworldindata.org/explorers/coronavirus-data-explorer?zoomToSelection=true&time=2020-03-01..latest&country=USA~GBR~CAN~DEU~ITA~IND&region=World&pickerMetric=location&pickerSort=asc&Interval=7-day+rolling+average&Relative+to+Population=true&Metric=Confirmed+cases&Color+by+test+positivity=false).

This is not meant to be as narrative as the other notebooks here, but rather just a bunch of different analyses done to create charts for the newsletter.

First, we'll important pandas and numpy, both of which I use in analysis down below. Then, we'll read in the data we're using straight from OWID's servers.

In [2]:
import pandas as pd
import numpy as  np

list_of_continents = ["Africa", "Europe", "Asia", "North America", "South America", "Oceania"]

raw = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv')

### 2021/11/15 - Germany deaths

In [4]:
germany = (raw
           .loc[raw["location"].isin(["Germany"]), ["iso_code", "date", "total_deaths"]]
           )

Unnamed: 0,iso_code,date,total_deaths
58457,DEU,2020-01-27,
58458,DEU,2020-01-28,
58459,DEU,2020-01-29,
58460,DEU,2020-01-30,
58461,DEU,2020-01-31,
...,...,...,...
59233,DEU,2022-03-13,125596.0
59234,DEU,2022-03-14,125878.0
59235,DEU,2022-03-15,126146.0
59236,DEU,2022-03-16,126424.0


### 2021/11/15 - Which countries have surpassed 100K deaths?

Find when countries passed the 100K deaths marker.

In [19]:
thousand_deaths = (raw
                   .loc[raw["total_deaths"] >= 100000]
                   .sort_values("date", ascending=True)
                   .drop_duplicates('iso_code')
                   .loc[~raw["iso_code"].str.contains("OWID")]
                   .loc[:,["iso_code", "location", "date", "total_deaths"]]
                   )

world_deaths = (raw[(raw["location"] == "World") & (raw["date"].isin(dates))]
                .loc[:,["date", "total_deaths"]]
                )

dates = thousand_deaths['date'].to_list()

export = thousand_deaths.merge(world_deaths, on="date")

display(export)

Unnamed: 0,iso_code,location,date,total_deaths_x,total_deaths_y
0,USA,United States,2020-05-23,100524.0,358817.0
1,BRA,Brazil,2020-08-08,100657.0,761943.0
2,IND,India,2020-10-02,100842.0,1079012.0
3,MEX,Mexico,2020-11-19,100104.0,1416252.0
4,PER,Peru,2021-01-25,100377.0,2209583.0
5,GBR,United Kingdom,2021-01-26,100241.0,2226994.0
6,ITA,Italy,2021-03-08,100103.0,2681960.0
7,RUS,Russia,2021-04-08,100158.0,3000548.0
8,FRA,France,2021-04-15,100087.0,3088430.0
9,COL,Colombia,2021-06-21,100582.0,3877879.0


### 2021/11/15 - Deaths around the world

In [5]:
world_timeline = raw.loc[raw["location"] == "World", ["date", 'total_deaths']]

world_timeline["total_deaths"] = world_timeline["total_deaths"].dropna().astype(int, errors="ignore")

display(world_timeline)

Unnamed: 0,date,total_deaths
166140,2020-01-22,17
166141,2020-01-23,18
166142,2020-01-24,26
166143,2020-01-25,42
166144,2020-01-26,56
...,...,...
166921,2022-03-13,6044265
166922,2022-03-14,6045860
166923,2022-03-15,6051360
166924,2022-03-16,6058272


### 2021/11/25 - PAHO countries

A look at how PAHO countries are faring.

In [2]:
paho_countries = [
    "United States",
    "Brazil",
    "Argentina",
    "Colombia",
    "Mexico",
    "Peru",
    "Canada",
    "Chile",
    "Cuba",
    "Guatamala",
    "Costa Rica",
    "Bolivia",
    "Ecuador",
    "Panama",
    "Paraguay",
    "Venezuela",
    "Dominican Republic",
    "Uruguay",
    "Honduras",
    "Puerto Rico"
]

paho = (raw
        .loc[raw["location"].isin(paho_countries), ['iso_code', 'location', 'date', 'total_cases', 'total_deaths']]
        .sort_values(["location", "date"], ascending=False)
        .drop_duplicates("location")
        )
paho["CFR %"] = paho["total_deaths"] / paho["total_cases"] * 100
paho = paho.sort_values('CFR %', ascending=False)

### 2021/12/14 - Belgium and the Netherlands

In [None]:
netherlands = raw[raw["location"] == "Netherlands"]
belgium = raw[raw["location"] == "Belgium"]

countries = [belgium, netherlands]

for country in countries:
  country["new_deaths_7day"] = country["new_deaths"].rolling(7).mean()
  country["new_cases_7day"] = country["new_cases"].rolling(7).mean()
  country = (country
             .reset_index().loc[:, ['location', 'date', 'new_cases', 'new_cases_7day', 'new_deaths', 'new_deaths_7day', 'total_cases_per_million', "total_deaths_per_million"]]
             .dropna()
             )

### 2021/12/16 - Canada compared to world

In [11]:
top_g7 = (raw
            .loc[raw["location"].isin([
                "Canada",
                "United States",
                "United Kingdom",
                "France",
                "Italy",
                "Japan",
                "Germany"
                ]) & (raw["date"] >= "2021-06-01"), :]
            .pivot(columns="location", index="date", values="new_cases_per_million")
            .rolling(7).mean()
            .dropna()
            )

### 2021/12/17 - Canada new cases and deaths

In [14]:
canada = (raw.loc[(raw["location"] == "Canada"), ["date", "new_cases", "new_deaths", "hosp_patients"]].set_index("date").rolling(7).mean())

canada["new_cases"] = canada["new_cases"] / canada["new_cases"].max() *100
canada["hosp_patients"] = canada["hosp_patients"] / canada["hosp_patients"].max() *100
canada["new_deaths"] = canada["new_deaths"] / canada["new_deaths"].max() *100

### 2022/01/06 - Worldwide new case rates

In [19]:
today = (raw
         .sort_values("new_cases_per_million", ascending=False)
         .drop_duplicates("location")
         .dropna(subset=["continent"])
         .sort_values("new_cases_per_million", ascending=False)
         .loc[(raw["population"] > 1000000), ["location", "date", "new_cases_per_million"]]
         )

today.index = np.arange(1, len(today) + 1)

canada = today[today["location"] == "Canada"]

all = pd.concat([today.head(50), canada])

In [20]:
g7 = (raw
      .loc[raw["location"].isin([
          "Canada",
          "United States",
          "United Kingdom",
          "France",
          "Italy",
          "Japan",
          "Germany"
          ]), ["location", "date", "people_vaccinated_per_hundred"]]
      .sort_values("people_vaccinated_per_hundred", ascending=False)
      .drop_duplicates("location")
      .reset_index()
)

### 2022/01/04 - Positive test rate

In [22]:
positivity = (raw
              .dropna(subset=["continent"])
              .dropna(subset=["positive_rate"])
              .sort_values("date", ascending=False)
              .drop_duplicates("location")
              .loc[:,["location", "date", "positive_rate"]]
              .sort_values("positive_rate", ascending=False)
              .set_index("location")
              )

### 2022/01/06 - Sweden

In [9]:
today = (raw
         .sort_values("new_deaths_per_million", ascending=False)
         .drop_duplicates("location")
         .dropna(subset=["continent"])
         )

today = today[today["population"] > 1000000, ["location", "date", "new_deaths_per_million"]]
today.index = np.arange(1, len(today) + 1)

canada = today[today["location"] == "Canada"]

all = pd.concat([today.head(50), canada])

TypeError: '(20024     False
60777     False
23266     False
63256     False
105395    False
          ...  
138491    False
154484    False
157169     True
158085    False
164879    False
Name: population, Length: 225, dtype: bool, ['location', 'date', 'new_deaths_per_million'])' is an invalid key

### 2022/01/06 - Continent/income comparison

In [6]:
continents = (raw
              .sort_values("new_cases_per_million", ascending=False)
              .drop_duplicates("location")
              .loc[(raw["location"].isin(["Africa", "Europe", "Asia", "North America", "South America", "Oceania"])) & (raw["population"] > 1000000), ["location", "date", "new_cases_per_million"]]
              .sort_values("new_cases_per_million", ascending=False)
              )

continents.index = np.arange(1, len(continents) + 1)

canada = continents[continents["location"] == "Canada"]

all = pd.concat([continents.head(50), canada])

Unnamed: 0,location,date,new_cases_per_million
1,Oceania,2022-01-12,4074.229
2,Europe,2022-01-25,2481.051
3,North America,2022-01-10,2455.828
4,South America,2022-01-27,1116.338
5,Asia,2022-03-16,235.329
6,Africa,2021-12-30,44.065


### 2022/01/12 - Commonwealth countries

In [None]:
countries = ["Australia", "New Zealand", "United Kingdom", "United States", "Canada"]

subset = (raw
        .loc[(raw["location"].isin(countries)) & (raw["date"] >= "2021-01-13"), :]
        .pivot(columns="location", index="date", values="new_cases_per_million")
        .rolling(7).mean()
        )

### 2022/01/14 - Canada

In [None]:
canada = (raw
          .loc[raw["location"] == "Canada", ["date", "new_cases", "new_deaths", "hosp_patients"]]
          .set_index("date")
          )

### 2022/01/20 - Austria

In [None]:
austria = (raw
            .loc[raw["location"].isin(["Austria", "World"]), ["location", "date", "hosp_patients_per_million"]]
           .pivot(index="date", columns="location", values="hosp_patients_per_million")
            )

### 2022/01/28 - Sweden and the world over time

Sweden news cases over time.

In [27]:
sweden = (raw
          .loc[raw["location"] == "Sweden", ["date", "new_cases"]]
          .set_index("date")
          .rolling(7).mean()
          .dropna()
          )

The world, new cases over time.

In [28]:
world = (raw
         .loc[raw["location"] == "World", ["date", "new_cases"]]
        .set_index("date")
        .rolling(7).mean()
        )

### 2022/01/27 - Ranking continents

Resulted in [this](https://www.datawrapper.de/_/UpKbt/) visualization.

In [4]:
continents = (raw[raw["location"].isin(list_of_continents)]
              .pivot_table(columns="location", index="date", values="new_cases_per_million")
              .dropna()
              )

continents.index = pd.to_datetime(continents.index)

continents = (continents
              .groupby([continents.index.year.values,continents.index.month.values])
              .sum()
              .reset_index()
              .rename(columns={"level_0": "year", "level_1": "month"})
              )

continents["month"] = continents["year"].astype(str) + "-" + continents["month"].astype(str)

continents = (continents
              .drop(columns=["year"])
              .melt(id_vars="month")
              )

dates = continents["month"].unique()

ranked = []

for date in dates:
  top10 = continents[continents["month"] == date].sort_values('value', ascending=False)
  top10["rank"] = range(1, len(top10)+1)
  ranked.append(top10)

all_ranked = (pd
              .concat(ranked)
              .pivot(columns="month", index="location", values="rank")
              )

### 2022/02/07 -  COVID brief, booster rates

In [3]:
countries = ["Canada", "United States", "Italy", "France", "Germany", "Spain", "United Kingdom", "Japan", "Israel", "World", "Chile"]

target = (raw.loc[raw["location"].isin(countries), ["location", "date", "total_boosters_per_hundred"]]
          .sort_values("total_boosters_per_hundred", ascending=False)
          .drop_duplicates("location")
          .set_index("location")
          )

### 2022/03/10 - COVID brief, 6 countries


In [None]:
six = raw[raw["location"].isin(["South Korea", "Hong Kong", "Singapore", "Vietnam", "Malaysia", "Japan"])]

six["new_cases_per_million"] = (six["new_cases_per_million"]
                                .rolling(7).mean()
                                )
six = (six[six["date"] >= "2022-01-01"]
       .pivot(index="date", columns="location", values="new_cases_per_million")
       )

### 2022/03/18 - Hong Kong cases

In [14]:
hongkong = (raw
            .loc[(raw["location"] == "Hong Kong") & (raw["date"] >= "2021-01-01"), ["date", "new_cases"]]
            .set_index("date")
            )

hongkong["cases_smoothed"] = hongkong["new_cases"].rolling(7).mean()

before_jan = (hongkong[hongkong.index >= "2022-02-01"]
              .dropna()
              .loc[:,"new_cases"]
              .mean()
              )

hongkong

Unnamed: 0_level_0,new_cases,cases_smoothed
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2021-01-01,42.0,
2021-01-02,35.0,
2021-01-03,41.0,
2021-01-04,53.0,
2021-01-05,32.0,
...,...,...
2022-03-19,16597.0,24957.714286
2022-03-20,14149.0,22346.142857
2022-03-21,14068.0,20511.857143
2022-03-22,14152.0,18567.142857
