# Predictor submission


The COVID-19 crisis is proving to be one of the world’s most critical challenges — a challenge bigger than any one government or organization can tackle on its own. Right now, countries around the world are not equipped to implement health and safety interventions and policies that effectively protect both their citizens and economies.
 
In order to fight this pandemic, we need access to localized, data-driven planning systems and the latest in artificial intelligence (AI) to help decision-makers develop and implement robust Intervention Plans (IPs) that successfully reduce infection cases and minimize economic impact.

**Intervention Plan (IP)**: A plan of action or schedule for setting and resetting various intervention policies at various strengths or stringency.

**Predictor Model**: Given a time sequence of IPs in effect, and other data like a time sequence of number of cases, a predictor model will estimate the number of cases in the future.

## Intervention Plan

An intervention plan consists of a set of [containment and closure policies](https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/codebook.md#containment-and-closure-policies), as well as [health system policies](https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/codebook.md#health-system-policies). Checkout the links to understand what these policies correspond to and how they are coded.

For instance the **C1_School closing** policy, which records closings of schools and universities, is coded like that:

| Code      | Meaning     |
| :-------- | :---------- |
|  0        | no measures |
|  1        | recommend closing|
|  2        | require closing (only some levels or categories, eg just high school, or just public schools) |
|  3        | require closing all levels |
| Blank     | no data |

Interventions plans are recorded daily for each countries and sometimes for regions. For this competition, the following policies are considered:

In [None]:
IP_COLUMNS = ['C1_School closing',
              'C2_Workplace closing',
              'C3_Cancel public events',
              'C4_Restrictions on gatherings',
              'C5_Close public transport',
              'C6_Stay at home requirements',
              'C7_Restrictions on internal movement',
              'C8_International travel controls',
              'H1_Public information campaigns',
              'H2_Testing policy',
              'H3_Contact tracing']

## Data
The university of Oxford Blavatnik School of Government is [tracking coronavirus government responses](https://www.bsg.ox.ac.uk/research/research-projects/coronavirus-government-response-tracker). They have assembled a [data set](https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv) containing historical data since January 1st, 2020 for the number of cases and IPs for most countries in the world.

In [None]:
import pandas as pd

In [None]:
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
df = pd.read_csv(DATA_URL,
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 error_bad_lines=False)

In [None]:
df.sample(3)

### Listing the number of cases and IPs

In [None]:
CASES_COLUMNS = ["CountryName", "RegionName", "Date", "ConfirmedCases"]

In [None]:
df[CASES_COLUMNS + IP_COLUMNS].sample(3)

### Computing the daily change in cases
The **ConfirmedCases** column reports the total number of cases since the beginning of the epidemic for each country, region and day. From this number we can compute the daily change in confirmed cases by doing:

\begin{equation*}
DailyChangeConfirmedCases_t = ConfirmedCases_t - ConfirmedCases_{t-1}
\end{equation*}

Like this:

In [None]:
df["DailyChangeConfirmedCases"] = df.groupby(["CountryName", "RegionName"]).ConfirmedCases.diff().fillna(0)

### Listing the latest historical daily new cases for a given country and region
For instance, for country **United States**, region **California**, the latest available changes in confirmed cases are:

In [None]:
country = "United States"
region = "California"
country_region_df = df[(df.CountryName == country) & (df.RegionName == region)]
country_region_df[["CountryName", "RegionName", "Date", "ConfirmedCases", "DailyChangeConfirmedCases"]].tail(7)

## Predictor input
The goal of a predictor is to predict the expected number of daily cases for countries and regions for a list of days, assumging the given daily IPs are in place:

In [None]:
EXAMPLE_INPUT_FILE = "validation/data/2020-08-01_2020-08-04_ip.csv"
prediction_input_df = pd.read_csv(EXAMPLE_INPUT_FILE,
                                  parse_dates=['Date'],
                                  encoding="ISO-8859-1")
prediction_input_df.head()

## Predictor expected output
The output produced by the predictor should look like that:

In [None]:
EXAMPLE_OUTPUT_FILE = "2020-08-01_2020-08-04_predictions_example.csv"
prediction_output_df = pd.read_csv(EXAMPLE_OUTPUT_FILE,
                                   parse_dates=['Date'],
                                   encoding="ISO-8859-1")
prediction_output_df.head()

## Train a model
Train a predictor model that can produce the output file given the input file.

Use additional data source if needed.

Predictors do not have to predict for all regions. They can ignore them and return a row in the csv file only for regions for which they want to make a prediction. Note that a predictor submission can consist of multiple models, for example those specializing in different regions, that are accessed through the same call. A predictor must return a prediction in less than 30 seconds per region.


In [None]:
# WRITE YOUR CODE HERE

In [None]:
# It's a good idea to save the trained model so it can be used to make predictions in the future

In [None]:
# For examples, check out the examples folder:
# - examples/zero/Example-ZeroPredictor.ipynb
# - examples/linear/Example-Train-Linear-Model.ipynb
# - examples/lstm/Example-LSTM-Predicxtor.ipynb

## Make predictions
Making predictions means saving a .csv file called "start_date_end_date.csv" to the root folder.
For instance, if:

```
start_date = "2020-08-01"
end_date = "2020-08-04"
```

Then the expected output file is **2020-08-01_2020-08-04.csv**


In [None]:
def predict(start_date: str, end_date: str, path_to_ips_file: str):
    """
    Generates a file with daily new cases predictions for the given countries, regions and npis, between
    start_date and end_date, included.
    :param start_date: day from which to start making predictions, as a string, format YYYY-MM-DDD
    :param end_date: day on which to stop making predictions, as a string, format YYYY-MM-DDD
    :param path_to_ips_file: path to a csv file containing the intervention plans between start_date and end_date
    :return: Nothing. Saves a csv file called 'start_date_end_date.csv'
    with columns "CountryName,RegionName,Date,PredictedDailyNewCases"
    """
    output_file_name = start_date + "_" + end_date + ".csv"
    # WRITE YOUR CODE HERE
    # Save output to path_to_output_file

In [None]:
start_date = "2020-08-01"
end_date = "2020-08-04"
predict(start_date, end_date, EXAMPLE_INPUT_FILE)

## Display predictions

In [None]:
# If prediction worked ok, it generated the following file:
output_file = start_date + "_" + end_date + ".csv"
# That we can readd like this:
prediction_output_df = pd.read_csv(output_file,
                                   parse_dates=['Date'],
                                   encoding="ISO-8859-1")
prediction_output_df.head()

# Submission
Update `predict.py` to call the trained predictor and generate a predictor file.

# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [None]:
!python predict.py -s 2020-08-01 -e 2020-08-04 -ip ../../validation/data/2020-08-01_2020-08-04_ip.csv

In [None]:
# Check the pediction file is valid
from validation.validation import validate_submission

errors = validate_submission("2020-08-01", "2020-08-04", "2020-08-01_2020-08-04.csv")
if errors:
    for error in errors:
        print(error)
else:
    print("All good!")

In [None]:
!python predict.py -s 2020-08-01 -e 2020-08-31 -ip ../../validation/data/2020-08-01_2020-08-31_ip.csv

In [None]:
# Check the pediction file is valid
errors = validate_submission("2020-08-01", "2020-08-31", "2020-08-01_2020-08-31.csv")
if errors:
    for error in errors:
        print(error)
else:
    print("All good!")