# Fetch more data!
We have been using only data from Denmark in the models. In this notebook, the data from other countries is fetched and gathered with the same format, ready to be used in the probabilistic models.

In [1]:
from os.path import join, pardir

import pandas as pd

In [2]:
ROOT = pardir

We are going to be using a big aggregated dataset about COVID19 incidence.

In [3]:
data_covid19 = pd.read_csv(
    "https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv",
    parse_dates=["Date"],
)

In [4]:
cov_sp = data_covid19.loc[data_covid19.Country == "Spain", ["Date", "Confirmed"]].reset_index(drop=True)

Exactly the same for the mobility data (from Google), this time stored in a local file.

In [5]:
df_mob = pd.read_csv(
    join(ROOT, "data", "raw", "Global_Mobility_Report.csv"), parse_dates=["date"]
)
mob_sp = df_mob.loc[
    df_mob.country_region == "Spain",
    [
        "date",
        "retail_and_recreation_percent_change_from_baseline",
        "grocery_and_pharmacy_percent_change_from_baseline",
        "parks_percent_change_from_baseline",
        "transit_stations_percent_change_from_baseline",
    ]
].rename(columns={"date": "Date"}).reset_index(drop=True)

  interactivity=interactivity, compiler=compiler, result=result)


In [6]:
cov_sp.merge(mob_sp, on="Date").head()

Unnamed: 0,Date,Confirmed,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline
0,2020-02-15,2,2.0,-1.0,26.0,8.0
1,2020-02-15,2,5.0,-1.0,33.0,15.0
2,2020-02-15,2,3.0,7.0,42.0,10.0
3,2020-02-15,2,0.0,-1.0,20.0,8.0
4,2020-02-15,2,5.0,-2.0,11.0,9.0


### CLI interface
This exact pipeline was expressed as a CLI.

In [7]:
!python ../src/data/make_dataset.py --help

Usage: make_dataset.py [OPTIONS] INPUT_MOBILITY OUTPUT

  Fetch and turn raw data into cleaned data.

  From (../raw) into cleaned data ready to be analyzed (in ../processed).

Options:
  --country TEXT  Country to filter from.
  --feat TEXT     Feature to extract.
  --help          Show this message and exit.


In [8]:
!python ../src/data/make_dataset.py ../data/raw/Global_Mobility_Report.csv \
    ../data/processed/data_niger_sixcol.csv --country Niger

2020-05-21 20:33:04,892 - __main__ - INFO - Fetching COVID19 data from from GitHub.
2020-05-21 20:33:06,105 - __main__ - INFO - Fetching mobility data from from file ../data/raw/Global_Mobility_Report.csv.
  return ctx.invoke(self.callback, **ctx.params)
2020-05-21 20:33:06,434 - __main__ - INFO - File generated at ../data/processed/data_niger_sixcol.csv


Shall we get more countries?

In [9]:
!python ../src/data/make_dataset.py ../data/raw/Global_Mobility_Report.csv \
    ../data/processed/data_italy_sixcol.csv --country Italy

2020-05-21 20:33:07,280 - __main__ - INFO - Fetching COVID19 data from from GitHub.
2020-05-21 20:33:08,156 - __main__ - INFO - Fetching mobility data from from file ../data/raw/Global_Mobility_Report.csv.
  return ctx.invoke(self.callback, **ctx.params)
2020-05-21 20:33:08,517 - __main__ - INFO - File generated at ../data/processed/data_italy_sixcol.csv


In [10]:
!python ../src/data/make_dataset.py ../data/raw/Global_Mobility_Report.csv \
    ../data/processed/data_germany_sixcol.csv --country Germany

2020-05-21 20:33:09,405 - __main__ - INFO - Fetching COVID19 data from from GitHub.
2020-05-21 20:33:10,215 - __main__ - INFO - Fetching mobility data from from file ../data/raw/Global_Mobility_Report.csv.
  return ctx.invoke(self.callback, **ctx.params)
2020-05-21 20:33:10,549 - __main__ - INFO - File generated at ../data/processed/data_germany_sixcol.csv
