# Practise Case 01 - Covid-19 data analysis with pandas

First we want to practise loading online csv files.
The file is provided by [ourworldindata.org](https://covid.ourworldindata.org/data/owid-covid-data.csv) and updated daily. The following analysis could be run as a daily job. 

**CASE GOAL:**
Find the 10 countries with highest case numbers per million inhabitants. Give the output as json.

## 0. Import necessary packages

In [1]:
import pandas as pd
from functools import reduce
from datetime import datetime, timedelta

## 1. Load data

As the data may become increasingly larger, I consider to load it in chunks instead of all together. I want to compare the time it takes to load it this way vs. in-built pandas functionality.


In [2]:
url = "https://covid.ourworldindata.org/data/owid-covid-data.csv"

In [3]:
# read the full file 
raw = pd.read_csv(url)

In [4]:
time pd.read_csv(url)

CPU times: user 623 ms, sys: 154 ms, total: 777 ms
Wall time: 3.37 s


Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,...,extreme_poverty,cardiovasc_death_rate,diabetes_prevalence,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality
0,AFG,Asia,Afghanistan,2020-02-24,1.0,1.0,,,,,...,,597.029,9.59,,,37.746,0.5,64.83,0.511,
1,AFG,Asia,Afghanistan,2020-02-25,1.0,0.0,,,,,...,,597.029,9.59,,,37.746,0.5,64.83,0.511,
2,AFG,Asia,Afghanistan,2020-02-26,1.0,0.0,,,,,...,,597.029,9.59,,,37.746,0.5,64.83,0.511,
3,AFG,Asia,Afghanistan,2020-02-27,1.0,0.0,,,,,...,,597.029,9.59,,,37.746,0.5,64.83,0.511,
4,AFG,Asia,Afghanistan,2020-02-28,1.0,0.0,,,,,...,,597.029,9.59,,,37.746,0.5,64.83,0.511,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
101542,ZWE,Africa,Zimbabwe,2021-07-06,57963.0,1949.0,1347.143,1939.0,28.0,25.429,...,21.4,307.846,1.82,1.6,30.7,36.791,1.7,61.49,0.571,
101543,ZWE,Africa,Zimbabwe,2021-07-07,60227.0,2264.0,1480.429,1973.0,34.0,26.286,...,21.4,307.846,1.82,1.6,30.7,36.791,1.7,61.49,0.571,
101544,ZWE,Africa,Zimbabwe,2021-07-08,62383.0,2156.0,1594.571,2029.0,56.0,31.571,...,21.4,307.846,1.82,1.6,30.7,36.791,1.7,61.49,0.571,
101545,ZWE,Africa,Zimbabwe,2021-07-09,65066.0,2683.0,1771.857,2084.0,55.0,34.714,...,21.4,307.846,1.82,1.6,30.7,36.791,1.7,61.49,0.571,


## 2. Transform & Load in chunks

In [5]:
today = datetime.now().date()
yesterday = today + timedelta(days=-1)

In [6]:
# read the file in chunks and do some basic transformation

cols = ['iso_code', 'continent', 'location', 'date', 'total_cases', 
        'new_cases', 'new_cases_per_million', 'population']

def read_chunks(url, cols):
    raw_TL = None
    for chunk in pd.read_csv(url, usecols=cols, chunksize=1000):
        chunk['date'] = pd.to_datetime(chunk['date'])
        chunk_result = chunk[chunk['date']== pd.to_datetime(yesterday)]
        chunk_result['assert_new_cases_per_million'] = chunk_result['new_cases'] / (chunk_result['population']/1e6)
        if raw_TL is None:
            raw_TL = chunk_result
        else:
            raw_TL = raw_TL.append(chunk_result)
    return raw_TL

raw_TL = read_chunks(url, cols)

  iso_code continent     location        date  total_cases  new_cases  \
0      AFG      Asia  Afghanistan  2020-02-24          1.0        1.0   
1      AFG      Asia  Afghanistan  2020-02-25          1.0        0.0   

   new_cases_per_million  population  
0                  0.026  38928341.0  
1                  0.000  38928341.0  
2021-07-10 00:00:00
True
    iso_code continent     location       date  total_cases  new_cases  \
502      AFG      Asia  Afghanistan 2021-07-10     131586.0        0.0   

     new_cases_per_million  population  assert_new_cases_per_million  
502                    0.0  38928341.0                           0.0  
      iso_code continent location        date  total_cases  new_cases  \
1000  OWID_AFR       NaN   Africa  2021-06-24    5314808.0    33699.0   
1001  OWID_AFR       NaN   Africa  2021-06-25    5350331.0    35523.0   

      new_cases_per_million    population  
1000                 25.137  1.340598e+09  
1001                 26.498  1.340598e+

  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value

      iso_code      continent location       date  total_cases  new_cases  \
11460      BTN           Asia   Bhutan 2021-07-10       2266.0        8.0   
11947      BOL  South America  Bolivia 2021-07-10     453595.0     2371.0   

       new_cases_per_million  population  assert_new_cases_per_million  
11460                 10.368    771612.0                     10.367905  
11947                203.118  11673029.0                    203.117803  
      iso_code continent                location        date  total_cases  \
12000      BIH    Europe  Bosnia and Herzegovina  2020-04-25       1486.0   
12001      BIH    Europe  Bosnia and Herzegovina  2020-04-26       1516.0   

       new_cases  new_cases_per_million  population  
12000       65.0                 19.812   3280815.0  
12001       30.0                  9.144   3280815.0  
2021-07-10 00:00:00
True
      iso_code continent                location       date  total_cases  \
12441      BIH    Europe  Bosnia and Herzegovina 2021-

  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value


True
      iso_code      continent    location       date  total_cases  new_cases  \
21029      COG         Africa       Congo 2021-07-10      12790.0        0.0   
21557      CRI  North America  Costa Rica 2021-07-10     380482.0        0.0   

       new_cases_per_million  population  assert_new_cases_per_million  
21029                    0.0   5518092.0                           0.0  
21557                    0.0   5094114.0                           0.0  
      iso_code continent       location        date  total_cases  new_cases  \
22000      CIV    Africa  Cote d'Ivoire  2021-05-27      47146.0       61.0   
22001      CIV    Africa  Cote d'Ivoire  2021-05-28      47195.0       49.0   

       new_cases_per_million  population  
22000                  2.313  26378275.0  
22001                  1.858  26378275.0  
2021-07-10 00:00:00
True
      iso_code continent       location       date  total_cases  new_cases  \
22044      CIV    Africa  Cote d'Ivoire 2021-07-10      48776.0 

  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value

      iso_code continent location        date  total_cases  new_cases  \
33000      FRA    Europe   France  2020-05-18     182147.0      444.0   
33001      FRA    Europe   France  2020-05-19     182648.0      501.0   

       new_cases_per_million  population  
33000                  6.572  67564251.0  
33001                  7.415  67564251.0  
2021-07-10 00:00:00
True
      iso_code continent location       date  total_cases  new_cases  \
33418      FRA    Europe   France 2021-07-10    5870463.0     4696.0   

       new_cases_per_million  population  assert_new_cases_per_million  
33418                 69.504  67564251.0                     69.504212  
      iso_code continent location        date  total_cases  new_cases  \
34000      GAB    Africa    Gabon  2021-05-21      24107.0       68.0   
34001      GAB    Africa    Gabon  2021-05-22      24107.0        0.0   

       new_cases_per_million  population  
34000                 30.552   2225728.0  
34001                  0.000 

  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value


      iso_code continent location        date  total_cases  new_cases  \
45000      IRL    Europe  Ireland  2021-05-30     261673.0      367.0   
45001      IRL    Europe  Ireland  2021-05-31     262043.0      370.0   

       new_cases_per_million  population  
45000                 74.325   4937796.0  
45001                 74.932   4937796.0  
2021-07-10 00:00:00
True
      iso_code continent     location       date  total_cases  new_cases  \
45041      IRL    Europe      Ireland 2021-07-10     277316.0      581.0   
45212      IMN    Europe  Isle of Man 2021-07-10          NaN        NaN   

       new_cases_per_million  population  assert_new_cases_per_million  
45041                117.664   4937796.0                    117.663832  
45212                    NaN     85032.0                           NaN  
      iso_code continent location        date  total_cases  new_cases  \
46000      ITA    Europe    Italy  2020-11-06     862681.0    37802.0   
46001      ITA    Europe    Ita

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max()

True
      iso_code continent  location       date  total_cases  new_cases  \
56387      MYS      Asia  Malaysia 2021-07-10     827191.0     9353.0   
56877      MDV      Asia  Maldives 2021-07-10      74993.0       90.0   

       new_cases_per_million  population  assert_new_cases_per_million  
56387                288.976  32365998.0                    288.976104  
56877                166.500    540542.0                    166.499550  
      iso_code continent location        date  total_cases  new_cases  \
57000      MLI    Africa     Mali  2020-07-25       2503.0        0.0   
57001      MLI    Africa     Mali  2020-07-26       2510.0        7.0   

       new_cases_per_million  population  
57000                  0.000  20250834.0  
57001                  0.346  20250834.0  
2021-07-10 00:00:00
True
      iso_code continent location       date  total_cases  new_cases  \
57350      MLI    Africa     Mali 2021-07-10      14461.0        4.0   
57871      MLT    Europe    Malta 2021

  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value

       iso_code continent location        date  total_cases  new_cases  \
69000  OWID_OCE       NaN  Oceania  2020-08-24      27172.0      145.0   
69001  OWID_OCE       NaN  Oceania  2020-08-25      27346.0      174.0   

       new_cases_per_million  population  
69000                  3.398  42677809.0  
69001                  4.077  42677809.0  
2021-07-10 00:00:00
True
       iso_code continent location       date  total_cases  new_cases  \
69320  OWID_OCE       NaN  Oceania 2021-07-10      61221.0      593.0   
69823       OMN      Asia     Oman 2021-07-10     281688.0        0.0   

       new_cases_per_million  population  assert_new_cases_per_million  
69320                 13.895  42677809.0                     13.894809  
69823                  0.000   5106622.0                      0.000000  
      iso_code continent  location        date  total_cases  new_cases  \
70000      PAK      Asia  Pakistan  2020-08-19     290445.0      613.0   
70001      PAK      Asia  Pakistan  

  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value

      iso_code continent    location        date  total_cases  new_cases  \
81000      SYC    Africa  Seychelles  2021-04-21       5016.0        4.0   
81001      SYC    Africa  Seychelles  2021-04-22       5170.0      154.0   

       new_cases_per_million  population  
81000                 40.675     98340.0  
81001               1565.996     98340.0  
2021-07-10 00:00:00
True
      iso_code continent      location       date  total_cases  new_cases  \
81080      SYC    Africa    Seychelles 2021-07-10      16679.0        0.0   
81547      SLE    Africa  Sierra Leone 2021-07-10       6003.0       40.0   

       new_cases_per_million  population  assert_new_cases_per_million  
81080                  0.000     98340.0                      0.000000  
81547                  5.014   7976985.0                      5.014426  
      iso_code continent   location        date  total_cases  new_cases  \
82000      SGP      Asia  Singapore  2021-04-19      60851.0       20.0   
82001      SGP  

  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value

      iso_code continent location        date  total_cases  new_cases  \
93000      TUN    Africa  Tunisia  2020-03-06          1.0        0.0   
93001      TUN    Africa  Tunisia  2020-03-07          1.0        0.0   

       new_cases_per_million  population  
93000                    0.0  11818618.0  
93001                    0.0  11818618.0  
2021-07-10 00:00:00
True
      iso_code continent location       date  total_cases  new_cases  \
93491      TUN    Africa  Tunisia 2021-07-10     491021.0     9286.0   
93978      TUR      Asia   Turkey 2021-07-10    5465094.0        0.0   

       new_cases_per_million  population  assert_new_cases_per_million  
93491                785.709  11818618.0                    785.709463  
93978                  0.000  84339067.0                      0.000000  
      iso_code      continent                  location        date  \
94000      TCA  North America  Turks and Caicos Islands  2021-01-30   
94001      TCA  North America  Turks and Caicos 

  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value

In [7]:
time pd.read_csv(url, chunksize=1000)

CPU times: user 153 ms, sys: 105 ms, total: 258 ms
Wall time: 2.97 s


<pandas.io.parsers.readers.TextFileReader at 0x116b4d0d0>

In [8]:
time read_chunks(url, cols)

  iso_code continent     location        date  total_cases  new_cases  \
0      AFG      Asia  Afghanistan  2020-02-24          1.0        1.0   
1      AFG      Asia  Afghanistan  2020-02-25          1.0        0.0   

   new_cases_per_million  population  
0                  0.026  38928341.0  
1                  0.000  38928341.0  
2021-07-10 00:00:00
True
    iso_code continent     location       date  total_cases  new_cases  \
502      AFG      Asia  Afghanistan 2021-07-10     131586.0        0.0   

     new_cases_per_million  population  assert_new_cases_per_million  
502                    0.0  38928341.0                           0.0  
      iso_code continent location        date  total_cases  new_cases  \
1000  OWID_AFR       NaN   Africa  2021-06-24    5314808.0    33699.0   
1001  OWID_AFR       NaN   Africa  2021-06-25    5350331.0    35523.0   

      new_cases_per_million    population  
1000                 25.137  1.340598e+09  
1001                 26.498  1.340598e+

  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value


      iso_code continent                location        date  total_cases  \
12000      BIH    Europe  Bosnia and Herzegovina  2020-04-25       1486.0   
12001      BIH    Europe  Bosnia and Herzegovina  2020-04-26       1516.0   

       new_cases  new_cases_per_million  population  
12000       65.0                 19.812   3280815.0  
12001       30.0                  9.144   3280815.0  
2021-07-10 00:00:00
True
      iso_code continent                location       date  total_cases  \
12441      BIH    Europe  Bosnia and Herzegovina 2021-07-10     205145.0   
12909      BWA    Africa                Botswana 2021-07-10      75388.0   

       new_cases  new_cases_per_million  population  \
12441       98.0                 29.871   3280815.0   
12909        0.0                  0.000   2351625.0   

       assert_new_cases_per_million  
12441                     29.870627  
12909                      0.000000  
      iso_code      continent location        date  total_cases  new_ca

  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value


      iso_code continent       location        date  total_cases  new_cases  \
22000      CIV    Africa  Cote d'Ivoire  2021-05-27      47146.0       61.0   
22001      CIV    Africa  Cote d'Ivoire  2021-05-28      47195.0       49.0   

       new_cases_per_million  population  
22000                  2.313  26378275.0  
22001                  1.858  26378275.0  
2021-07-10 00:00:00
True
      iso_code continent       location       date  total_cases  new_cases  \
22044      CIV    Africa  Cote d'Ivoire 2021-07-10      48776.0        0.0   
22546      HRV    Europe        Croatia 2021-07-10     360768.0       88.0   

       new_cases_per_million  population  assert_new_cases_per_million  
22044                  0.000  26378275.0                      0.000000  
22546                 21.436   4105268.0                     21.435872  
      iso_code      continent location        date  total_cases  new_cases  \
23000      CUB  North America     Cuba  2021-06-08     151259.0     1156.0 

  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value

      iso_code continent location        date  total_cases  new_cases  \
32000      FJI   Oceania     Fiji  2020-07-20         27.0        0.0   
32001      FJI   Oceania     Fiji  2020-07-21         27.0        0.0   

       new_cases_per_million  population  
32000                    0.0    896444.0  
32001                    0.0    896444.0  
2021-07-10 00:00:00
True
      iso_code continent location       date  total_cases  new_cases  \
32355      FJI   Oceania     Fiji 2021-07-10      10027.0      506.0   
32884      FIN    Europe  Finland 2021-07-10      97944.0      293.0   

       new_cases_per_million  population  assert_new_cases_per_million  
32355                564.452    896444.0                    564.452437  
32884                 52.881   5540718.0                     52.881233  
      iso_code continent location        date  total_cases  new_cases  \
33000      FRA    Europe   France  2020-05-18     182147.0      444.0   
33001      FRA    Europe   France  2020-05-1

  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value

      iso_code continent location        date  total_cases  new_cases  \
44000      IRN      Asia     Iran  2021-05-31    2913136.0    11042.0   
44001      IRN      Asia     Iran  2021-06-01    2923823.0    10687.0   

       new_cases_per_million  population  
44000                131.463  83992953.0  
44001                127.237  83992953.0  
2021-07-10 00:00:00
True
      iso_code continent location       date  total_cases  new_cases  \
44040      IRN      Asia     Iran 2021-07-10    3355786.0    11664.0   
44543      IRQ      Asia     Iraq 2021-07-10    1421746.0     6821.0   

       new_cases_per_million  population  assert_new_cases_per_million  
44040                138.869  83992953.0                    138.868793  
44543                169.582  40222503.0                    169.581689  
      iso_code continent location        date  total_cases  new_cases  \
45000      IRL    Europe  Ireland  2021-05-30     261673.0      367.0   
45001      IRL    Europe  Ireland  2021-05-3

  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value

      iso_code continent    location        date  total_cases  new_cases  \
55000      MDG    Africa  Madagascar  2020-06-20       1503.0       60.0   
55001      MDG    Africa  Madagascar  2020-06-21       1596.0       93.0   

       new_cases_per_million  population  
55000                  2.167  27691019.0  
55001                  3.358  27691019.0  
2021-07-10 00:00:00
True
      iso_code continent    location       date  total_cases  new_cases  \
55385      MDG    Africa  Madagascar 2021-07-10      42392.0        0.0   
55854      MWI    Africa      Malawi 2021-07-10      38946.0      512.0   

       new_cases_per_million  population  assert_new_cases_per_million  
55385                  0.000  27691019.0                      0.000000  
55854                 26.764  19129955.0                     26.764308  
      iso_code continent  location        date  total_cases  new_cases  \
56000      MYS      Asia  Malaysia  2020-06-18       8529.0       14.0   
56001      MYS      Asia

  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value


      iso_code      continent   location        date  total_cases  new_cases  \
66000      NIC  North America  Nicaragua  2021-03-11       6537.0        0.0   
66001      NIC  North America  Nicaragua  2021-03-12       6537.0        0.0   

       new_cases_per_million  population  
66000                    0.0   6624554.0  
66001                    0.0   6624554.0  
2021-07-10 00:00:00
True
      iso_code      continent   location       date  total_cases  new_cases  \
66121      NIC  North America  Nicaragua 2021-07-10       8461.0        0.0   
66599      NER         Africa      Niger 2021-07-10       5538.0        6.0   

       new_cases_per_million  population  assert_new_cases_per_million  
66121                  0.000   6624554.0                      0.000000  
66599                  0.248  24206636.0                      0.247866  
      iso_code continent location        date  total_cases  new_cases  \
67000      NGA    Africa  Nigeria  2021-04-03     163113.0       50.0   
6

  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value

      iso_code continent location        date  total_cases  new_cases  \
76000      RWA    Africa   Rwanda  2020-07-04       1092.0       11.0   
76001      RWA    Africa   Rwanda  2020-07-05       1105.0       13.0   

       new_cases_per_million  population  
76000                  0.849  12952209.0  
76001                  1.004  12952209.0  
2021-07-10 00:00:00
True
      iso_code      continent               location       date  total_cases  \
76371      RWA         Africa                 Rwanda 2021-07-10      47667.0   
76936      KNA  North America  Saint Kitts and Nevis 2021-07-10        525.0   

       new_cases  new_cases_per_million  population  \
76371      830.0                 64.082  12952209.0   
76936        5.0                 93.999     53192.0   

       assert_new_cases_per_million  
76371                     64.081733  
76936                     93.999098  
      iso_code      continent     location        date  total_cases  \
77000      LCA  North America  Sai

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max()

      iso_code continent     location       date  total_cases  new_cases  \
88487      SWE    Europe       Sweden 2021-07-10    1092540.0        0.0   
88989      CHE    Europe  Switzerland 2021-07-10     704943.0        0.0   

       new_cases_per_million  population  assert_new_cases_per_million  
88487                    0.0  10099270.0                           0.0  
88989                    0.0   8654618.0                           0.0  
      iso_code continent location        date  total_cases  new_cases  \
89000      SYR      Asia    Syria  2020-04-01         10.0        0.0   
89001      SYR      Asia    Syria  2020-04-02         16.0        6.0   

       new_cases_per_million  population  
89000                  0.000  17500657.0  
89001                  0.343  17500657.0  
2021-07-10 00:00:00
True
      iso_code continent location       date  total_cases  new_cases  \
89465      SYR      Asia    Syria 2021-07-10      25766.0        0.0   

       new_cases_per_million  pop

       iso_code continent location        date  total_cases  new_cases  \
101000      ZMB    Africa   Zambia  2021-05-03      91722.0       29.0   
101001      ZMB    Africa   Zambia  2021-05-04      91804.0       82.0   

        new_cases_per_million  population  
101000                  1.577  18383956.0  
101001                  4.460  18383956.0  
2021-07-10 00:00:00
True
       iso_code continent  location       date  total_cases  new_cases  \
101068      ZMB    Africa    Zambia 2021-07-10     174789.0     2384.0   
101546      ZWE    Africa  Zimbabwe 2021-07-10      66853.0     1787.0   

        new_cases_per_million  population  assert_new_cases_per_million  
101068                129.678  18383956.0                    129.678291  
101546                120.232  14862927.0                    120.232038  
CPU times: user 1.77 s, sys: 401 ms, total: 2.18 s
Wall time: 4.91 s


  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
  print(yesterday == chunk['date'].max())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_per_million,population,assert_new_cases_per_million
502,AFG,Asia,Afghanistan,2021-07-10,131586.0,0.0,0.000,3.892834e+07,0.000000
1016,OWID_AFR,,Africa,2021-07-10,5914501.0,45374.0,33.846,1.340598e+09,33.846087
1518,ALB,Europe,Albania,2021-07-10,132587.0,7.0,2.432,2.877800e+06,2.432414
2020,DZA,Africa,Algeria,2021-07-10,145296.0,813.0,18.540,4.385104e+07,18.540038
2516,AND,Europe,Andorra,2021-07-10,14075.0,0.0,0.000,7.726500e+04,0.000000
...,...,...,...,...,...,...,...,...,...
99489,VNM,Asia,Vietnam,2021-07-10,28470.0,1862.0,19.129,9.733858e+07,19.129105
100131,OWID_WRL,,World,2021-07-10,186459999.0,399937.0,51.308,7.794799e+09,51.308188
100588,YEM,Asia,Yemen,2021-07-10,6941.0,1.0,0.034,2.982597e+07,0.033528
101068,ZMB,Africa,Zambia,2021-07-10,174789.0,2384.0,129.678,1.838396e+07,129.678291


## 3. Descriptive Analysis

In [9]:
print(raw_TL.describe().T)
print(raw_TL.info())

                              count          mean           std    min  \
total_cases                   201.0  2.947312e+06  1.506004e+07    1.0   
new_cases                     201.0  6.099607e+03  3.196534e+04    0.0   
new_cases_per_million         200.0  7.921545e+01  1.609478e+02    0.0   
population                    203.0  1.171979e+08  6.567391e+08  809.0   
assert_new_cases_per_million  200.0  7.921546e+01  1.609478e+02    0.0   

                                    25%           50%           75%  \
total_cases                     13796.0  1.315860e+05  5.126850e+05   
new_cases                           0.0  9.100000e+01  1.295000e+03   
new_cases_per_million               0.0  1.928150e+01  7.534175e+01   
population                    2112816.0  9.537642e+06  3.291906e+07   
assert_new_cases_per_million        0.0  1.928178e+01  7.534184e+01   

                                       max  
total_cases                   1.864600e+08  
new_cases                     3.999370

## 4. Processing

In [10]:
#sorted(raw_TL[raw_TL['new_cases'].isna()], key=(lambda x: x['date']))
raw_TL['date'] = pd.to_datetime(raw_TL['date'])
raw_TL.info()
raw_TL[raw_TL['new_cases'].isna()].groupby('location')['date'].max()
set(raw_TL[raw_TL['continent'].isna()]['location'])

<class 'pandas.core.frame.DataFrame'>
Int64Index: 204 entries, 502 to 101546
Data columns (total 9 columns):
 #   Column                        Non-Null Count  Dtype         
---  ------                        --------------  -----         
 0   iso_code                      204 non-null    object        
 1   continent                     195 non-null    object        
 2   location                      204 non-null    object        
 3   date                          204 non-null    datetime64[ns]
 4   total_cases                   201 non-null    float64       
 5   new_cases                     201 non-null    float64       
 6   new_cases_per_million         200 non-null    float64       
 7   population                    203 non-null    float64       
 8   assert_new_cases_per_million  200 non-null    float64       
dtypes: datetime64[ns](1), float64(5), object(3)
memory usage: 15.9+ KB


{'Africa',
 'Asia',
 'Europe',
 'European Union',
 'International',
 'North America',
 'Oceania',
 'South America',
 'World'}

In [15]:
raw_TL_countries = raw_TL[raw_TL['continent'].isna() == False ]

In [33]:
result = raw_TL_countries.sort_values('new_cases_per_million', \
                                      ascending = False, \
                                     ignore_index = True)[0:9][['location', 'new_cases_per_million']]
result.set_index('location', inplace = True)

## 5. Output

In [34]:
result.to_json()

'{"new_cases_per_million":{"Cyprus":1081.075,"Mongolia":1073.12,"Tunisia":785.709,"Namibia":679.283,"Netherlands":601.055,"Cuba":595.941,"Fiji":564.452,"United Kingdom":468.948,"Kazakhstan":375.731}}'