# Dataset [COVID-19 in Ukraine: daily data](https://www.kaggle.com/vbmokin/covid19-in-ukraine-daily-data)

<a class="anchor" id="0"></a>
# COVID-19 in Ukraine: EDA & Forecasting with holidays impact for confirmed cases. Prophet with holidays and pseudo-holidays - 11 parameters tuning:
* lower_window
* upper_window
* prior_scale
* mode
* changepoint_prior_scale
* weekly_fourier_order
* mode_seasonality_weekly
* weekly_seasonality_prior_scale
* several_days_fourier_order (for period = n days, n = 2, 3, ... 6)
* mode_seasonality_several_days
* several_days_seasonality_prior_scale

# Acknowledgements

### Datasets:
- my dataset [COVID-19 in Ukraine: daily data](https://www.kaggle.com/vbmokin/covid19-in-ukraine-daily-data) - only up to commit 11
- official data of Ukraine (https://covid19.rnbo.gov.ua/) - from commit 12 via API
- dataset [COVID-19 Open Data](https://github.com/GoogleCloudPlatform/covid-19-open-data) (including dataset [Oxford COVID-19 government response tracker](https://www.bsg.ox.ac.uk/research/research-projects/oxford-covid-19-government-response-tracker) and dataset [NOAA](https://www.ncei.noaa.gov/)) : @article{Wahltinez2020,author = "Oscar Wahltinez and Matt Lee and Anthony Erlinger and Mayank Daswani and Pranali Yawalkar and Kevin Murphy and Michael Brenner", year = 2020, title = "COVID-19 Open-Data: curating a fine-grained, global-scale data repository for SARS-CoV-2", note = "Work in progress",  url = {https://github.com/GoogleCloudPlatform/covid-19-open-data},} 
- my dataset with holidays data [COVID-19: Holidays of countries](https://www.kaggle.com/vbmokin/covid19-holidays-of-countries) - it is recommended to follow the updates

### Notebooks:
- notebook [COVID-19-in-Ukraine: Prophet & holidays tuning](https://www.kaggle.com/vbmokin/covid-19-in-ukraine-prophet-holidays-tuning)
- notebook [COVID-19 Novel Coronavirus EDA & Forecasting Cases](https://www.kaggle.com/khoongweihao/covid-19-novel-coronavirus-eda-forecasting-cases) from [@Wei Hao Khoong](https://www.kaggle.com/khoongweihao)

### Libraries from GitHub:
- https://facebook.github.io/prophet/
- https://facebook.github.io/prophet/docs/
- https://github.com/facebook/prophet
- https://github.com/dr-prodigy/python-holidays

The model uses FB Prophet with holidays from my **my dataset [COVID-19: Holidays of countries](https://www.kaggle.com/vbmokin/covid19-holidays-of-countries) has holidays for 70 countries** and more adapted for use in the prediction of coronavirus diseases.

Holidays and pseudo-holidays (**anomalies dates**) are defined in three ways:
- dates of official public holidays;
- the weakening of quarantine according to open data;
- dates of very comfortable conditions for rest (there are more 95% quantile on average temperature and not more 5% quantile of rainfall) - for each country it should be adapted individually (open data NOAA are used)

The model is **tuning in two stages** - makes a complete search of values from 4 possible for each feature at first for one part of parameters, then - for another. In the second stage, the optimal parameters determined in the first stage are used. Each stage ends with an interactive graph (library "plotly"), which clearly shows the WAPE for each combination of parameters.

The Prophet model with all optimized parameters and holidays is used for **forecasting** future data for the next days and visualization of forecasting results. The data is taken from [COVID-19 Open Data](https://github.com/GoogleCloudPlatform/covid-19-open-data) (usually this dataset are updated there daily and are available as of yesterday), so the next days are counted from the date of the last committee of this notebook.

<a class="anchor" id="0.1"></a>
## Table of Contents

1. [Import libraries](#1)
1. [Download data](#2)
1. [Selection data with holidays](#3)
    - [Holidays with a shift](#3.1)
    - [Additional dates of anomalies as holidays](#3.2)    
        - [The weakening of quarantine](#3.2.1)
        - [Very comfortable conditions for rest](#3.2.2)
        - [Holidays as days of less efficient work of laboratories](#3.2.3)
        - [Weekend quarantine as holidays](#3.2.4)        
1. [EDA](#4)
    - [Plots - Confirmed cases over time](#4.1)
    - [Plots - Hospitalizations](#4.2)    
    - [Statistics](#4.3)
    - [Set initial values for tuning](#4.4)
1. [Tuning Prophet model and holidays parameters](#5)
    - [Stage 1 - Tuning holiday parameters](#5.1)
        - [Model training, forecasting and evaluation](#5.1.1)
        - [Results visualization](#5.1.2)
    - [Stage 2 - Tuning seasonality parameters](#5.2)
        - [Model training, forecasting and evaluation](#5.2.1)
        - [Results visualization](#5.2.2)
    - [Results of all tuning](#5.3)
1. [Prediction](#6)
1. [Visualization](#7)

In [None]:
country_main = 'Ukraine'
data_name = 'Hospitalizations'

## 1. Import libraries<a class="anchor" id="1"></a>

[Back to Table of Contents](#0.1)

Import libraries

In [None]:
!pip install openpyxl==3.0.7

In [None]:
import os
import io
import openpyxl
import pandas as pd
import numpy as np
import pywt
import requests
import seaborn as sns
import matplotlib
from matplotlib import pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from PIL import Image
from IPython.display import FileLink

from datetime import date, timedelta, datetime
from fbprophet import Prophet
from fbprophet.make_holidays import make_holidays_df
from fbprophet.diagnostics import cross_validation, performance_metrics
from fbprophet.plot import plot_cross_validation_metric
import holidays
from collections import Counter
import pycountry

from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

import warnings
warnings.simplefilter('ignore')

## 2. Download data<a class="anchor" id="2"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Thanks to https://api-covid19.rnbo.gov.ua/
# https://api-covid19.rnbo.gov.ua/charts/main-data?mode=ukraine
print(f'Download confirmed daily data from RNBO of Ukraine')
myfile = requests.get('https://api-covid19.rnbo.gov.ua/charts/main-data?mode=ukraine')
open('data', 'wb').write(myfile.content)
data = pd.read_json('data')
data

In [None]:
# Thanks to https://health-security.rnbo.gov.ua/api
# https://health-security.rnbo.gov.ua/api/beds/hospitalization/chart?bedType=hospitalized&regionId=&hospitalType=
print(f'Download hospitalization daily data from RNBO of Ukraine')
myfile = requests.get('https://health-security.rnbo.gov.ua/api/beds/hospitalization/chart?bedType=hospitalized&regionId=&hospitalType=')
open('data', 'wb').write(myfile.content)
data_hosp = pd.read_json('data', typ='series')
df = pd.DataFrame({'dates': data_hosp['dates'], 'hosp':data_hosp['count']['i1']})
df

In [None]:
# For certain region of Ukraine for example Vinnytsia region (regionId=105)
# Thanks to https://health-security.rnbo.gov.ua/api
# https://health-security.rnbo.gov.ua/api/beds/hospitalization/chart?bedType=hospitalized&regionId=105&hospitalType=
regionId = 105  # Vinnytsia region
print(f'Download hospitalization daily data from RNBO of Ukraine')
myfile = requests.get(f'https://health-security.rnbo.gov.ua/api/beds/hospitalization/chart?bedType=hospitalized&regionId={regionId}&hospitalType=')
open('data', 'wb').write(myfile.content)
data_hosp = pd.read_json('data', typ='series')
#df = pd.DataFrame({'dates': data_hosp['dates'], 'hosp':data_hosp['count']['i1']})
#df

In [None]:
data['n_confirmed'] = data['confirmed'].diff()
data = data[299:].reset_index(drop=True)
data

In [None]:
data = pd.merge(data, df[['dates','hosp']], on='dates', how='left')
data

In [None]:
# Delete the last zero value
data = data[:-1]
data['n_confirmed'] = data['n_confirmed'].astype('int')
data.tail(3)

In [None]:
data['n_confirmed'].plot()

In [None]:
df2 = data[['dates','n_confirmed', 'hosp']].dropna()
df2 = df2[df2['n_confirmed'] > 0].reset_index(drop=True)
df2['n_confirmed'].plot()

In [None]:
# As in my original notebook https://www.kaggle.com/vbmokin/covid-19-in-ukraine-prophet-holidays-tuning
df2.columns = ['Date', 'Confirmed', 'Hospitalizations']
df2['Country'] = 'Ukraine'

In [None]:
df2.tail(8)

In [None]:
latest_date = df2['Date'].max()
latest_date

In [None]:
df2.tail(5)

## 3. Selection data with holidays<a class="anchor" id="3"></a>

[Back to Table of Contents](#0.1)

In [None]:
def cut_df(date0: str, 
           df:pd.DataFrame(), 
           col: str):
    # Deletes all rows of dataframe df with df[col] < date0
    format0 = '%Y-%m-%d'
    df_temp = df.copy()
    df_temp['col_dt'] = pd.to_datetime(df_temp[col], format=format0, errors='coerce')
    date0_dt = datetime.strptime(date0, format0)
    df_temp = df_temp[df_temp['col_dt'] >= date0_dt]
    
    return df_temp

## 3.1. Holidays with a shift<a class="anchor" id="3.1"></a>

[Back to Table of Contents](#0.1)

### Thank to dataset [COVID-19: Holidays of countries](https://www.kaggle.com/vbmokin/covid19-holidays-of-countries)

In [None]:
# Thanks to dataset https://www.kaggle.com/vbmokin/covid19-holidays-of-countries
holidays_df = pd.read_csv('../input/covid19-holidays-of-countries/holidays_df_of_70_countries_for_covid_19_2021.csv')
holidays_df[holidays_df['country'] == country_main]

In [None]:
holidays_df_code_countries = holidays_df['code'].unique()
holidays_df_code_countries

In [None]:
# From notebook https://www.kaggle.com/vbmokin/covid-19-prophet-forecast-next-2-weeks
def dict_code_countries_with_holidays(list_name_countries: list,
                                      holidays_df: pd.DataFrame()):
        
    """
    Defines a dictionary with the names of user countries and their two-letter codes (ISO 3166) 
    in the dataset "COVID-19: Holidays of countries" 
    
    Returns: 
    - countries: dictionary with the names of user countries and their two-letter codes (ISO 3166) 
    - holidays_df_identificated: DataFrame with holidays data for countries from dictionary 'countries'
    
    Args: 
    - list_name_countries: list of the name of countries (name or common_name or official_name or alha2 or alpha3 codes from ISO 3166)
    - holidays_df: DataFrame with holidays "COVID-19: Holidays of countries"
    """
    
    import pycountry
    
    # Identification of countries for which there are names according to ISO
    countries = {}
    dataset_all_countries = list(holidays_df['code'].unique())
    list_name_countries_identificated = []
    list_name_countries_not_identificated = []
    for country in list_name_countries:
        try: 
            country_id = pycountry.countries.get(alpha_2=country)
            if country_id.alpha_2 in dataset_all_countries:
                countries[country] = country_id.alpha_2
        except AttributeError:
            try: 
                country_id = pycountry.countries.get(name=country)
                if country_id.alpha_2 in dataset_all_countries:
                    countries[country] = country_id.alpha_2
            except AttributeError:
                try: 
                    country_id = pycountry.countries.get(official_name=country)
                    if country_id.alpha_2 in dataset_all_countries:
                        countries[country] = country_id.alpha_2
                except AttributeError:
                    try: 
                        country_id = pycountry.countries.get(common_name=country)
                        if country_id.alpha_2 in dataset_all_countries:
                            countries[country] = country_id.alpha_2
                    except AttributeError:
                        try: 
                            country_id = pycountry.countries.get(alpha_3=country)
                            if country_id.alpha_2 in dataset_all_countries:
                                countries[country] = country_id.alpha_2
                        except AttributeError:
                            list_name_countries_not_identificated.append(country)
    holidays_df_identificated = holidays_df[holidays_df['code'].isin(countries.values())]
    
    print(f'Thus, the dataset has holidays in {len(countries)} countries from your list with {len(list_name_countries)} countries')
#     if len(countries) == len(dataset_all_countries):
#         print('All available in this dataset holiday data is used')
#     else:
#         print("Holidays are available in the dataset for such countries (if there are countries from your list, then it's recommended making changes to the list)")
#         print(np.array(holidays_df[~holidays_df['code'].isin(countries.values())].country_official_name.unique()))
        
    return countries, holidays_df_identificated.reset_index(drop=True)

In [None]:
countries_dict, holidays_df_base = dict_code_countries_with_holidays([country_main],holidays_df)
countries_dict

In [None]:
holidays_df_base['type'] = 'holiday'
holidays_df = holidays_df_base.copy()
holidays_df

In [None]:
# From https://www.kaggle.com/vbmokin/covid-19-prophet-forecast-next-2-weeks
def adaption_df_to_holidays_df_for_prophet(df, col, countries_dict):
    # Adaptation the dataframe df (by column=col) to holidays_df by list of countries in dictionary countries_dict
    
    # Filter df for countries which there are in the dataset with holidays
    df = df[df[col].isin(list(countries_dict.keys()))].reset_index(drop=True)
    
    # Add alpha_2 (code from ISO 3166) for each country
    df['iso_alpha'] = None
    for key, value in countries_dict.items():
        df.loc[df[col] == key, 'iso_alpha'] = value    
    
    return df

In [None]:
df2 = adaption_df_to_holidays_df_for_prophet(df2, 'Country', countries_dict)
df2.columns = ['Date', 'Confirmed', 'Hospitalizations', 'Country', 'iso_alpha']
df2

In [None]:
country_iso_alpha = df2.loc[0,'iso_alpha']
country_iso_alpha

## 3.2. Additional dates of anomalies as holidays<a class="anchor" id="3.2"></a>

[Back to Table of Contents](#0.1)

In [None]:
def aux_holidays_df_generator(holidays_df, dates_list, name, source, window_size, shift7=True):
    # Add dates from dates_list with anomalies of various kinds to the holiday dataset holidays_df
    # name - the name of the anomaly
    # source - the source of the primary information used for processing
    
    last_row = len(holidays_df)
    if shift7:
        holidays_dates = holidays_df['ds_holidays'].tolist()
    else: holidays_dates = holidays_df['ds'].tolist()
    common_dates = list(set(holidays_dates).intersection(set(dates_list)))
    dates_list = list(set(dates_list).difference(set(common_dates)))
        
    for i in range(len(dates_list)):
        holidays_df = holidays_df.append([holidays_df.loc[last_row-1,:]], ignore_index=True)
        ds_dt = datetime.strptime(dates_list[i], '%Y-%m-%d')
        holidays_df.loc[last_row+i, 'ds_holidays'] = dates_list[i]
        holidays_df.loc[last_row+i, 'holiday'] = name
        holidays_df.loc[last_row+i, 'source'] = source
        holidays_df.loc[last_row+i, 'lower_window'] = -window_size
        holidays_df.loc[last_row+i, 'upper_window'] = window_size
    
        # Type of holidays or pseudo-holidays
        if name == 'the weakening of quarantine':
            holidays_df.loc[last_row+i, 'type'] = 'SI'
        elif name == 'Very comfortable conditions for rest':
            holidays_df.loc[last_row+i, 'type'] = 'meteo'
        elif name == 'Holidays as days of less efficient work of laboratories':
            holidays_df.loc[last_row+i, 'type'] = 'lab'
            holidays_df.loc[last_row+i, 'ds'] = (ds_dt + timedelta(days=2)).strftime('%Y-%m-%d')
        elif name == 'Weekend quarantine as holidays':
            holidays_df.loc[last_row+i, 'type'] = 'weekend'
            
        if shift7:
            # Make shift with 7 day ahead
            holidays_df.loc[last_row+i, 'ds'] = (ds_dt + timedelta(days=7)).strftime('%Y-%m-%d')
        else:
            # Don't make shift with 7 day ahead
            holidays_df.loc[last_row+i, 'ds'] = ds_dt.strftime('%Y-%m-%d')
                    
    return holidays_df.sort_values(by=['ds'])

In [None]:
def plot_with_anomalies(df, cols_y_list, cols_y_list_name, dates_x, col_anomalies, val_anomal, log_y=False):
    # Draws a plot with title - the features cols_y_list (y) and dates_x (x) from the dataframe df
    # and with vertical lines in the date with col_anomalies == 1 
    # with the length between the minimum and maximum of feature cols_y_list[0]
    # with log_y = False or True
    # cols_y_list - dictionary of the names of cols from cols_y_list (keys - name of feature, value - it's name for the plot legend), 
    # name of cols_y_list[0] is the title of the all plot
    
    fig = px.line(df, x=dates_x, y=cols_y_list[0], title=cols_y_list_name[cols_y_list[0]], log_y=log_y, template='gridon',width=700, height=800)
    y_max = df[cols_y_list[0]].max()
    for i in range(len(cols_y_list)-1):
        fig.add_trace(go.Scatter(x=df[dates_x], y=df[cols_y_list[i+1]], mode='lines', name=cols_y_list_name[cols_y_list[i+1]]))
        max_i = df[cols_y_list[i+1]].max()
        y_max = max_i if max_i > y_max else y_max
    
    anomal_dates_list = df[df[col_anomalies] == val_anomal][dates_x].tolist()
    y_min = min(df[cols_y_list[0]].min(),0)
    for i in range(len(anomal_dates_list)):
        anomal_date = anomal_dates_list[i]
        fig.add_shape(dict(type="line", x0=anomal_date, y0=y_min, x1=anomal_date, y1=y_max, line=dict(color="red", width=1)))
    fig.show()

In [None]:
# Thank to https://github.com/GoogleCloudPlatform/covid-19-open-data
data = pd.read_csv(f"https://storage.googleapis.com/covid19-open-data/v2/UA/main.csv")

### 3.2.1. The weakening of quarantine<a class="anchor" id="3.2.1"></a>

[Back to Table of Contents](#0.1)

#### Thanks to [Oxford COVID-19 government response tracker](https://www.bsg.ox.ac.uk/research/research-projects/oxford-covid-19-government-response-tracker)

In [None]:
data['stringency_index_jump'] = 0
for i in range(len(data)-1):
    if (data.loc[i+1,'stringency_index'] is not None) and (data.loc[i,'stringency_index'] is not None) and \
    (data.loc[i+1,'stringency_index'] < data.loc[i,'stringency_index']):
        data.loc[i+1, 'stringency_index_jump'] = 1
source_gov = 'https://www.bsg.ox.ac.uk/research/research-projects/oxford-covid-19-government-response-tracker'
dates_gov_list = data[data['stringency_index_jump'] == 1]['date'].tolist()
holidays_df = aux_holidays_df_generator(holidays_df, dates_gov_list, 'the weakening of quarantine', source_gov, 2)
plot_with_anomalies(data, ["stringency_index"], {"stringency_index" : "Stringency index and dates of the weakening of quarantine in " + country_main}, 'date', 'stringency_index_jump', 1)

### 3.2.2. Very comfortable conditions for rest <a class="anchor" id="3.2.2"></a>

[Back to Table of Contents](#0.1)

#### Thanks to:
* [COVID-19 Open Data](https://github.com/GoogleCloudPlatform/covid-19-open-data)
* [NOAA](https://www.ncei.noaa.gov/)

In [None]:
data.columns.tolist()

In [None]:
data['rest_comfort'] = 0
data.loc[(data['average_temperature'] >= data['average_temperature'].quantile(.95)) & (data['rainfall'] <= data['rainfall'].quantile(.05)), 'rest_comfort'] = 1
dates_weather_list = data[data['rest_comfort'] == 1]['date'].tolist()
holidays_df = aux_holidays_df_generator(holidays_df, dates_weather_list, 'Very comfortable conditions for rest', 'https://www.ncei.noaa.gov/', 2)
plot_with_anomalies(data, ["average_temperature", "rainfall"], {"average_temperature" : "Average temperature over time in " + country_main, "rainfall" : "rainfall"}, 'date', 'rest_comfort', 1)

In [None]:
df2.info()

### 3.2.3. Holidays as days of less efficient work of laboratories <a class="anchor" id="3.2.3"></a>

[Back to Table of Contents](#0.1)

In [None]:
holidays_dates = holidays_df_base['ds_holidays'].tolist()
holidays_dates

In [None]:
# Weekdays during which few tests were performed (less than 20 thousand per day) - an additional anomaly 
holidays_dates += ['2021-01-04', '2021-01-08']

In [None]:
data['holidays_date'] = 0
holidays_df = aux_holidays_df_generator(holidays_df, holidays_dates, 'Holidays as days of less efficient work of laboratories', 
                                        'https://github.com/dr-prodigy/python-holidays', 0, False)

### 3.2.4. Weekend quarantine as holidays<a class="anchor" id="3.2.4"></a>

[Back to Table of Contents](#0.1)

In [None]:
holidays_weekend_quarantine = ['2020-11-14', '2020-11-15',
                               '2020-11-21', '2020-11-22',
                               '2020-11-28', '2020-11-29']

In [None]:
holidays_lockdown = ['2021-01-08', '2021-01-09','2021-01-10','2021-01-11','2021-01-12',
                     '2021-01-13', '2021-01-14','2021-01-15','2021-01-16','2021-01-17',
                     '2021-01-18', '2021-01-19','2021-01-20','2021-01-21','2021-01-22',
                     '2021-01-23', '2021-01-24']

In [None]:
data['holidays_date'] = 0
holidays_df = aux_holidays_df_generator(holidays_df, holidays_weekend_quarantine + holidays_lockdown, 
                                        'Weekend quarantine as holidays', 
                                        'https://www.kmu.gov.ua/', 0, False)

In [None]:
holidays_df = cut_df(df2.loc[0, 'Date'], holidays_df, 'ds')
holidays_df

## 4. EDA<a class="anchor" id="4"></a>

[Back to Table of Contents](#0.1)

## 4.1. Plots - Confirmed cases over time<a class="anchor" id="4.1"></a>

[Back to Table of Contents](#0.1)

In [None]:
fig = px.line(df2, x="Date", y="Confirmed", 
              title="Confirmed cases in " + country_main, 
              log_y=False,template='gridon',width=700, height=600)
fig.show()

In [None]:
fig = px.line(df2, x="Date", y="Confirmed", 
              title="Confirmed cases (logarithmic scale) in " + country_main, 
              log_y=True,template='gridon',width=700, height=600)
fig.show()

In [None]:
df2['holiday'] = 0
holidays_df_dates = holidays_df['ds'].tolist()
df2.loc[df2['Date'].isin(holidays_df_dates), 'holiday'] = 1
plot_with_anomalies(df2, ["Confirmed"], {"Confirmed" : "Confirmed cases and holidays data in " + country_main}, 'Date', 'holiday', 1)
df2 = df2.drop(columns=['holiday'])

In [None]:
holidays_df_dates

## 4.2. Plots - Hospitalizations<a class="anchor" id="4.2"></a>

[Back to Table of Contents](#0.1)

In [None]:
fig = px.line(df2, x="Date", y="Hospitalizations", 
              title="Hospitalizations in " + country_main, 
              log_y=False,template='gridon',width=700, height=600)
fig.show()

In [None]:
fig = px.line(df2, x="Date", y="Hospitalizations", 
              title="Hospitalizations (logarithmic scale) in " + country_main, 
              log_y=True,template='gridon',width=700, height=600)
fig.show()

In [None]:
df2['holiday'] = 0
holidays_df_dates = holidays_df['ds'].tolist()
df2.loc[df2['Date'].isin(holidays_df_dates), 'holiday'] = 1
plot_with_anomalies(df2, ["Hospitalizations"], {"Hospitalizations" : "Hospitalizations and holidays data in " + country_main}, 'Date', 'holiday', 1)
df2 = df2.drop(columns=['holiday'])

In [None]:
df2['holiday'] = 0
holidays_df_dates = holidays_df['ds'].tolist()
df2.loc[df2['Date'].isin(holidays_df_dates), 'holiday'] = 1
plot_with_anomalies(df2, ["Hospitalizations", "Confirmed"], {"Hospitalizations" : "Hospitalizations and holidays data in " + country_main, "Confirmed" : "Confirmed cases"}, 'Date', 'holiday', 1)
df2 = df2.drop(columns=['holiday'])

## 4.3. Statistics<a class="anchor" id="4.3"></a>

[Back to Table of Contents](#0.1)

## Describe statistics

In [None]:
df2.describe()

In [None]:
fig = px.box(df2, y="Hospitalizations")
fig.show()

## Earliest Cases

In [None]:
df2.head()

## Latest Cases

In [None]:
df2.tail()

## Waves analysis

In [None]:
df2['wave'] = 0
df2.loc[df2['Date'] > "2021-02-01", "wave"] = 1
df2

In [None]:
# Wave 0
fig = px.box(df2[df2['wave']==0], y="Hospitalizations")
fig.show()

In [None]:
# Wave 1
fig = px.box(df2[df2['wave']==1], y="Hospitalizations")
fig.show()

## Wavelet denoising

Wavelet denoising is a way to remove the unnecessary noise from a signal. This method calculates coefficients called the "wavelet coefficients". These coefficients decide which pieces of information to keep (signal) and which ones to discard (noise).

We make use of the MAD value (mean absolute deviation) to understand the randomness in the signal and accordingly decide the minimum threshold for the wavelet coefficients in the time series. We filter out the low coefficients from the wavelet coefficients and reconstruct the electric signal from the remaining coefficients and that's it; we have successfully removed noise from the electric signal.

Thanks to [Ion Switching Competition : Signal EDA 🧪](https://www.kaggle.com/tarunpaparaju/ion-switching-competition-signal-eda)

In [None]:
def maddest(d, axis=None):
    return np.mean(np.absolute(d - np.mean(d, axis)), axis)

def denoise_signal(x, wavelet='db4', level=1):
    coeff = pywt.wavedec(x, wavelet, mode="per")
    sigma = (1/0.6745) * maddest(coeff[-level])

    uthresh = sigma * np.sqrt(2*np.log(len(x)))
    coeff[1:] = (pywt.threshold(i, value=uthresh, mode='hard') for i in coeff[1:])

    return pywt.waverec(coeff, wavelet, mode='per')

In [None]:
x0 = df2[df2['wave']==0]["Date"]
y0 = df2[df2['wave']==0]["Hospitalizations"]
y_w0 = denoise_signal(df2[df2['wave']==0]["Hospitalizations"])
y_roll0 = df2[df2['wave']==0]["Hospitalizations"].rolling(7).mean()
x1 = df2[df2['wave']==1]["Date"]
y1 = df2[df2['wave']==1]["Hospitalizations"]
y_w1 = denoise_signal(df2[df2['wave']==1]["Hospitalizations"])
y_roll1 = df2[df2['wave']==1]["Hospitalizations"].rolling(7).mean()

In [None]:
fig = make_subplots(rows=2, cols=1)

fig.add_trace(
    go.Scatter(x=x0, mode='lines+markers', y=y0, marker=dict(color="lightskyblue"), showlegend=False,
               name="Original signal"),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x=x0, y=y_w0, mode='lines', marker=dict(color="navy"), showlegend=False,
               name="Denoised signal"),
    row=1, col=1
)
fig.add_trace(
    go.Scatter(x=x0, y=y_roll0, mode='lines', marker=dict(color="red"), showlegend=False,
               name="Rolling mean"),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x=x1, mode='lines+markers', y=y1, marker=dict(color="mediumaquamarine"), showlegend=False),
    row=2, col=1
)

fig.add_trace(
    go.Scatter(x=x1, y=y_w1, mode='lines', marker=dict(color="darkgreen"), showlegend=False),
    row=2, col=1
)
fig.add_trace(
    go.Scatter(x=x1, y=y_roll1, mode='lines', marker=dict(color="red"), showlegend=False,
               name="Rolling mean"),
    row=2, col=1
)
fig.update_layout(height=1200, width=800, title_text="Original (pale), Denoised (dark) and Rolling mean (red) data for waves 0 and 1")
fig.show()

## 4.4. Set initial values for tuning<a class="anchor" id="4.4"></a>

[Back to Table of Contents](#0.1)

In [None]:
# For stage 1 of tuning
changepoint_prior_scale_initial_level = 0.15   # try to change
weekly_season_reg_coef = 1   # try to change
lower_window_list = [0, -1, -2, -3] # must be exactly 4 values (identical allowed)
upper_window_list = [0, 1, 2, 3] # must be exactly 4 values (identical allowed)
prior_scale_list = [1.5, 2, 2.5, 3] # must be exactly 4 values (identical allowed)    # try to change
holidays_adaptive = ['holiday', 'SI', 'meteo'] # holidays with adaptive window

# For stage 2 of tuning
several_days_period = 620   # try to change
several_days_season_reg_coef = 2  # try to change
several_days_short_period = 4  # try to change
several_days_short_days_fourier_order = 10  # try to change
several_days_short_days_season_reg_coef = 0.5  # 0.4 - try to change
changepoint_prior_scale_list = [0.15, 0.2, 0.25, 0.3] # must be exactly 4 values (identical allowed) - try to change
weekly_fourier_order_list = [2, 4, 6, 8] # must be exactly 4 values (identical allowed) - try to change
several_days_fourier_order_list = [3, 4, 5, 6] # must be exactly 4 values (identical allowed) - try to change
# 0 in fourier_order lists means the absence of this component

# Check length of lists
if (len(lower_window_list) != 4) or (len(upper_window_list) != 4) or \
   (len(prior_scale_list) != 4) or (len(weekly_fourier_order_list) != 4) or (len(several_days_fourier_order_list) != 4):
    print('Number of data is wrong!')

In [None]:
df2 = df2.drop(columns = ['Country', 'iso_alpha', 'Confirmed', 'wave'])
df2.columns = ['ds','y']
df2['y'] = df2['y'].astype('int')
df2.tail(14)

In [None]:
days_to_forecast = 7 # in future (after training data)
days_to_forecast_for_evalution = 7 # on the latest training data - for model training
first_forecasted_date = sorted(list(set(df2['ds'].values)))[-days_to_forecast_for_evalution]
end_forecasted_date = (datetime.strptime(df2['ds'].max(), "%Y-%m-%d")+timedelta(days = days_to_forecast)).strftime("%Y-%m-%d")
first_data_date = df2['ds'].min()

print('The first date of data for modeling is: ' + first_data_date)
print('The first date to perform forecasts for evaluation is: ' + first_forecasted_date)
print('The end date to perform forecasts in future for is: ' + end_forecasted_date)

## 5. Tuning Prophet model and holidays parameters<a class="anchor" id="5"></a>

[Back to Table of Contents](#0.1)

In [None]:
def convert10_base4(n):
    # convert decimal to base 4
    alphabet = "0123"
    if n < 4:
        return alphabet[n]
    else:
        return (convert10_base4(n // 4) + alphabet[n % 4]).format('4f')

In [None]:
def export_df_to_excel(df, sheet_name):
    sheet_name = "{}.xlsx".format(sheet_name)
    with pd.ExcelWriter(sheet_name, engine='openpyxl', date_format='yyyy-mm-dd') as writer:
        df.to_excel(writer, index=False)
    return FileLink(sheet_name)

In [None]:
def export_forecast_to_excel(df, sheet_name):
    df.ds = df.ds.apply(lambda row: row.strftime("%Y-%m-%d"))
    df.rename(columns={
        "ds": "Дата",
        "yhat_lower": "Нижня межа довірчого інтервалу",
        "yhat": "Прогнозоване значення",
        "yhat_upper":  "Верхня межа довірчого інтервалу"
    }, inplace=True)
    return export_df_to_excel(df, sheet_name)

## 5.1. Stage 1 - Tuning holiday parameters<a class="anchor" id="5.1"></a>

[Back to Table of Contents](#0.1)

In [None]:
first_eval_index = len(df2)-days_to_forecast_for_evalution
second_eval_index = len(df2)
y_real = df2.tail(days_to_forecast_for_evalution)['y']
y_real_sum = df2.tail(days_to_forecast_for_evalution)['y'].sum()
country_df_val = df2.copy()
country_df_val['ds'] = pd.to_datetime(country_df_val['ds'])
country_df_val = country_df_val[(country_df_val['ds'] >= pd.to_datetime(first_forecasted_date))]
country_df_val

In [None]:
def eval_error(forecast_df, title):
    # Evaluate forecasts with validation set val_df and calculaction and printing with title the relative error
    forecast_df[forecast_df['yhat'] < 0]['yhat'] = 0
    result_df = forecast_df[(forecast_df['ds'] >= pd.to_datetime(first_forecasted_date))]
    result_val_df = result_df.merge(country_df_val, on=['ds'])
    result_val_df['rel_diff'] = (result_val_df['y'] - result_val_df['yhat'].round()).abs()
    return (result_val_df['rel_diff'].sum())*100/y_real_sum

## 5.1.1. Model training, forecasting and evaluation<a class="anchor" id="5.1.1"></a>

[Back to Table of Contents](#0.1)

In [None]:
def make_forecasts(country_df, holidays_df, days_to_forecast, days_to_forecast_for_evalution, first_forecasted_date):
    
    def model_training_forecasting(df, forecast_days, holidays_df=None, mode_main='multiplicative'):
        # Prophet model training and forecasting
        
        model = Prophet(daily_seasonality=False, weekly_seasonality=False, yearly_seasonality=False,
                        holidays=holidays_df, changepoint_range=1, changepoint_prior_scale = changepoint_prior_scale_initial_level,
                        seasonality_mode = mode_main)
        model.add_seasonality(name='weekly', period=7, fourier_order=8, mode = 'multiplicative', 
                              prior_scale = changepoint_prior_scale_initial_level/weekly_season_reg_coef)
        model.add_seasonality(name='several_days', period=620, fourier_order=4, mode = 'multiplicative', prior_scale = 0.3)
        #model.add_seasonality(name='2 weeks', period=14, fourier_order=1, mode = 'multiplicative', prior_scale = 0.15)
        model.fit(df)
        future = model.make_future_dataframe(periods=forecast_days)
        forecast = model.predict(future)
        forecast[forecast['yhat'] < 0]['yhat'] = 0
        return model, forecast

    cols_w = ['ds', 'trend', 'yhat', 'yhat_lower', 'yhat_upper', 'trend_lower', 'trend_upper', 'additive_terms', 'additive_terms_lower', 'additive_terms_upper',
              'multiplicative_terms','multiplicative_terms_lower', 'multiplicative_terms_upper', 'weekly', 'weekly_lower', 'weekly_upper']
    cols_h = ['ds', 'trend', 'yhat', 'yhat_lower', 'yhat_upper', 'trend_lower', 'trend_upper', 'additive_terms', 'additive_terms_lower', 'additive_terms_upper',
              'holidays', 'holidays_lower', 'holidays_upper', 'multiplicative_terms','multiplicative_terms_lower', 'multiplicative_terms_upper', 'weekly',
              'weekly_lower', 'weekly_upper']
    #mode_main_list = ['additive', 'multiplicative']
    mode_main_list = ['multiplicative'] # take only this mode
    relative_errors_holidays = []
    counter = 0
    results = pd.DataFrame(columns=['Conf_real', 'Conf_pred', 'Conf_pred_h', 'mode', 'n_h', 'err', 'err_h', 'prior_scale', 'how_less, %'])
    
    country_holidays_df = holidays_df[holidays_df['code'] == country_iso_alpha][['ds', 'holiday', 'lower_window', 'upper_window', 'prior_scale', 'type']].reset_index(drop=True)
    country_dfs = []            

    # Data preparation for forecast with Prophet
    country_df['ds'] = pd.to_datetime(country_df['ds'])

    # Set training and validation datasets
    country_df_future = country_df.copy()
    #country_df_val = country_df[(country_df['ds'] >= pd.to_datetime(first_forecasted_date))].copy()
    country_df = country_df[(country_df['ds'] < pd.to_datetime(first_forecasted_date))]

    n = 64 # number of combination of parameters lower_window / upper_window / prior_scale
    for k in range(len(mode_main_list)):
        # 'additive' and 'multiplicative' mode tuning
        # Without holidays
        # Model training and forecasting without holidays
        model, forecast = model_training_forecasting(country_df, days_to_forecast_for_evalution, mode_main=mode_main_list[k])
        #fig = model.plot_components(forecast)

        # Evaluate forecasts with validation set val_df and calculaction and printing the relative error
        forecast_df = forecast[['ds', 'yhat']].copy()
        relative_error = eval_error(forecast_df, 'without holidays')

        # With holidays
        # Model training with tuning prior_scale and forecasting
        for i in range(n):
            parameters_iter = convert10_base4(i).zfill(3)
            lower_window_i = lower_window_list[int(parameters_iter[0])]
            upper_window_i = upper_window_list[int(parameters_iter[1])]
            prior_scale_i = prior_scale_list[int(parameters_iter[2])]
            country_holidays_df.loc[country_holidays_df['type'].isin(holidays_adaptive), 'lower_window'] = lower_window_i
            country_holidays_df.loc[country_holidays_df['type'].isin(holidays_adaptive), 'upper_window'] = upper_window_i
            country_holidays_df.loc[country_holidays_df['type'].isin(holidays_adaptive), 'prior_scale'] = prior_scale_i
            country_holidays_df.loc[country_holidays_df['type'] == 'lab', 'upper_window'] = upper_window_i
            number_holidays = len(country_holidays_df[(country_holidays_df['ds'] > first_data_date) & (country_holidays_df['ds'] < end_forecasted_date)])
            model_holidays, forecast_holidays = model_training_forecasting(country_df, days_to_forecast_for_evalution, country_holidays_df, 
                                                                           mode_main=mode_main_list[k])

            # Evaluate forecasts with validation set val_df and calculaction and printing the relative error
            forecast_holidays_df = forecast_holidays[['ds', 'yhat']].copy()
            relative_error_holidays = eval_error(forecast_holidays_df, 'with holidays impact')

            # Save results
            if (k == 0) and (i == 0):
                relative_error_holidays_min = relative_error_holidays
                forecast_holidays_df_best = forecast_holidays[cols_h]
                model_holidays_best = model_holidays
                lower_window_best = lower_window_i
                upper_window_best = upper_window_i
                prior_scale_best = prior_scale_i
                mode_best = mode_main_list[k]

            elif (relative_error_holidays < relative_error_holidays_min):
                relative_error_holidays_min = relative_error_holidays
                forecast_holidays_df_best = forecast_holidays[cols_h]
                model_holidays_best = model_holidays
                lower_window_best = lower_window_i
                upper_window_best = upper_window_i
                prior_scale_best = prior_scale_i
                mode_best = mode_main_list[k]

            # Save results to dataframe with result for the last date
            confirmed_real_last = country_df.tail(1)['y'].values[0].astype('int')
            results.loc[i+n*k,'Conf_real'] = confirmed_real_last if confirmed_real_last > 0 else 0
            confirmed_pred_last = int(round(forecast_df.tail(1)['yhat'].values[0]))
            results.loc[i+n*k,'Conf_pred'] = confirmed_pred_last if confirmed_pred_last > 0 else 0
            confirmed_pred_holidays_last = int(round(forecast_holidays_df_best.tail(1)['yhat'].values[0],0))
            results.loc[i+n*k,'Conf_pred_h'] = confirmed_pred_holidays_last if confirmed_pred_holidays_last > 0 else 0
            results.loc[i+n*k,'mode'] = mode_main_list[k]
            results.loc[i+n*k,'n_h'] = number_holidays
            results.loc[i+n*k,'err'] = relative_error
            results.loc[i+n*k,'err_h'] = relative_error_holidays
            results.loc[i+n*k,'lower_window'] = lower_window_i
            results.loc[i+n*k,'upper_window'] = upper_window_i
            results.loc[i+n*k,'prior_scale'] = prior_scale_i
            results.loc[i+n*k,'how_less, %'] = round((relative_error-relative_error_holidays)*100/relative_error,1)

            print('i =',i+n*k,' from',len(mode_main_list)*n-1,':  lower_window =', lower_window_i, 'upper_window =',upper_window_i, 'prior_scale =', prior_scale_i)
            print('mse_error_holidays =',relative_error_holidays, 'mse_error_holidays_min =',relative_error_holidays_min, '\n')

        # Results visualization
        print('Seasonality mode is', mode_main_list[k])
        print('The best errors of model with holidays is', relative_error_holidays_min, 'with lower_window =', str(lower_window_best),
              ' upper_window =', str(upper_window_best), ' prior_scale =', str(prior_scale_best))
        print('The error of model without holidays is', relative_error, '\n')

    # Save results to dataframe with all dates
    forecast_holidays_df_best['country'] = country_main
    forecast_holidays_df_best.rename(columns={'yhat':'confirmed'}, inplace=True)
    forecast_holidays_dfs = forecast_holidays_df_best.tail(days_to_forecast_for_evalution)

    # Forecasting the future
    if relative_error < relative_error_holidays_min:
        # The forecast without taking into account the holidays is the best
        model_future_best, forecast_future_best = model_training_forecasting(country_df_future, days_to_forecast, mode_main=mode_best)
        forecast_plot = model_future_best.plot(forecast_future_best, ylabel='Confirmed in '+ country_main + ' (forecasting without holidays) - ' + mode_main_list[k])
        cols = cols_w
        print('The best model is model without holidays')
    else:
        # The forecast taking into account the holidays is the best
        print('The best model is model with holidays')
        model_future_best, forecast_future_best = model_training_forecasting(country_df_future, days_to_forecast, holidays_df,
                                                                             mode_main=mode_best)
        forecast_plot = model_future_best.plot(forecast_future_best, ylabel='Confirmed in '+ country_main + ' (forecasting with holidays) - ' + mode_best)
        cols = cols_h
    # Save forecasting results 
    forecast_future_df_best = forecast_future_best[cols]
    forecast_future_df_best['country'] = country_main
    forecast_future_df_best.rename(columns={'yhat':'confirmed'}, inplace=True)    
    forecast_future_dfs = forecast_future_df_best.tail(days_to_forecast)
    fig = model_future_best.plot_components(forecast_future_best)
    return forecast_holidays_dfs, relative_errors_holidays, forecast_future_dfs, results

In [None]:
%%time
forecast_holidays_dfs, relative_errors_holidays, \
            forecast_future_dfs, results = make_forecasts(df2, holidays_df, 
                                                          days_to_forecast, 
                                                          days_to_forecast_for_evalution, 
                                                          first_forecasted_date)

In [None]:
forecast_future_dfs.head(3)

In [None]:
forecast_holidays_dfs.head(3)

## 5.1.2. Results visualization<a class="anchor" id="5.1.2"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Visualization or results
print(f'5D plot of Prophet model parameters and COVID-19 error of forecasting to {str(days_to_forecast_for_evalution)} days')

In [None]:
# Determination of the best parameters
results['err_h'] = results['err_h'].astype('float')
results['lower_window'] = results['lower_window'].astype('int')
results['upper_window'] = results['upper_window'].astype('int')
results_m = results[results['mode'] == 'multiplicative']

In [None]:
# Interactive plot with results of parameters tuning
fig = px.scatter_3d(results_m, x='lower_window', y='upper_window', z='err_h',
                     color='prior_scale', color_discrete_sequence= px.colors.sequential.Plasma_r, opacity=1,
                    title='Interactive plot with results of parameters tuning for multiplicative mode')
fig.update(layout=dict(title=dict(x=0.5)))

In [None]:
#display(results_a.nsmallest(5, 'err_h'))
display(results_m.nsmallest(5, 'err_h'))

In [None]:
# The smallest error:
best_result = results.nsmallest(1, 'err_h').reset_index(drop=True)
lower_window_opt = best_result.lower_window[0]
upper_window_opt = best_result.upper_window[0]
prior_scale_opt = best_result['prior_scale'][0]
mode_opt = best_result['mode'][0]

In [None]:
print(f"Thus, for {country_main} the optimal parameters of Prophet model that gave an error = {best_result['err_h'][0]} are:")
print("* lower_window =", lower_window_opt)
print("* upper_window =", upper_window_opt)
print("* prior_scale =", prior_scale_opt)
print("* mode_opt =", mode_opt)

In [None]:
holidays_df.loc[holidays_df['type'].isin(holidays_adaptive), 'lower_window'] = lower_window_opt
holidays_df.loc[holidays_df['type'].isin(holidays_adaptive), 'upper_window'] = upper_window_opt
holidays_df.loc[holidays_df['type'].isin(holidays_adaptive), 'prior_scale'] = prior_scale_opt
holidays_df.loc[holidays_df['type'] == 'lab', 'upper_window'] = upper_window_opt

In [None]:
holidays_df

In [None]:
# The smallest error:
display(best_result)

## 5.2. Stage 2 - Tuning seasonality parameters<a class="anchor" id="5.2"></a>

[Back to Table of Contents](#0.1)

## 5.2.1. Model training, forecasting and evaluation<a class="anchor" id="5.2.1"></a>

[Back to Table of Contents](#0.1)

In [None]:
def make_forecasts_stage2(country_df, holidays_df, days_to_forecast, days_to_forecast_for_evalution, first_forecasted_date,
                          mode_main='multiplicative'):
    
    def model_training_forecasting(df, forecast_days, holidays_df=None, mode_main='multiplicative', 
                                  weekly_fourier_order=10, several_days_fourier_order=10,
                                  changepoint_prior_scale = changepoint_prior_scale_initial_level, mode_seasonality = 'additive'):
        # Prophet model training and forecasting
        
        model = Prophet(daily_seasonality=False, weekly_seasonality=False, yearly_seasonality=False, interval_width=0.9,
                        holidays=holidays_df, changepoint_range=1, changepoint_prior_scale = changepoint_prior_scale,
                        seasonality_mode = mode_main)
        if weekly_fourier_order > 0:
            model.add_seasonality(name='weekly', period=7, fourier_order=weekly_fourier_order, mode = mode_seasonality, 
                                  prior_scale = changepoint_prior_scale/weekly_season_reg_coef)
        if several_days_fourier_order > 0:
            model.add_seasonality(name='several_days', period=several_days_period-(several_days_fourier_order-3)*7,
                                  fourier_order=several_days_fourier_order, mode = mode_seasonality, 
                                  prior_scale = changepoint_prior_scale/several_days_season_reg_coef)
        model.add_seasonality(name='4 days', period=several_days_short_period, fourier_order=several_days_short_days_fourier_order, 
                              mode = 'multiplicative', prior_scale = several_days_short_days_season_reg_coef)
        #model.add_seasonality(name='2 weeks', period=14, fourier_order=1, mode = 'multiplicative', prior_scale = 0.15)
        model.fit(df)
        future = model.make_future_dataframe(periods=forecast_days)
        forecast = model.predict(future)
        forecast[forecast['yhat'] < 0]['yhat'] = 0
        return model, forecast

    
    cols_w = ['ds', 'trend', 'yhat', 'yhat_lower', 'yhat_upper', 'trend_lower', 'trend_upper', 'additive_terms', 'additive_terms_lower', 'additive_terms_upper',
              'multiplicative_terms','multiplicative_terms_lower', 'multiplicative_terms_upper']
    cols_h = ['ds', 'trend', 'yhat', 'yhat_lower', 'yhat_upper', 'trend_lower', 'trend_upper', 'additive_terms', 'additive_terms_lower', 'additive_terms_upper',
              'holidays', 'holidays_lower', 'holidays_upper', 'multiplicative_terms','multiplicative_terms_lower', 'multiplicative_terms_upper']
    #mode_seasonality_list = ['additive', 'multiplicative']
    mode_seasonality_list = ['multiplicative'] # take only this mode
    relative_errors_holidays = []
    counter = 0
    results = pd.DataFrame(columns=['Conf_real', 'Conf_pred', 'Conf_pred_h', 'mode_s', 'err', 'err_h', 'weekly_fn', 'several_days_fn', 'ch_p_s_fn', 'how_less, %'])
    
    country_dfs = []
    # Data preparation for forecast with Prophet
    country_df['ds'] = pd.to_datetime(country_df['ds'])

    # Set training and validation datasets
    country_df_future = country_df.copy()
    #country_df_val = country_df[(country_df['ds'] >= pd.to_datetime(first_forecasted_date))].copy()
    country_df = country_df[(country_df['ds'] < pd.to_datetime(first_forecasted_date))]

    n = 64 # number of combination of parameters weekly_fourier_order / several_days_fourier_order
    relative_error_min = 100
    for k in range(len(mode_seasonality_list)):
        # 'additive' and 'multiplicative' mode tuning
        # Without holidays
        # Model training and forecasting without holidays
        model, forecast = model_training_forecasting(country_df, days_to_forecast_for_evalution, mode_main=mode_main,
                                                     mode_seasonality = mode_seasonality_list[k])
        #fig = model.plot_components(forecast)

        # Evaluate forecasts with validation set val_df and calculaction and printing the relative error
        forecast_df = forecast[['ds', 'yhat']].copy()
        relative_error = eval_error(forecast_df, 'without holidays')
        #mode_seasonality_w_best = mode_seasonality_list[1] if relative_error < relative_error_min else mode_seasonality_list[0]
        mode_seasonality_w_best = mode_seasonality_list[0]

        # With holidays
        # Model training with tuning prior_scale and forecasting
        for i in range(n):
            parameters_iter = convert10_base4(i).zfill(3)
            weekly_fourier_order_i = weekly_fourier_order_list[int(parameters_iter[0])]
            several_days_fourier_order_i = several_days_fourier_order_list[int(parameters_iter[1])]
            changepoint_prior_scale_i = changepoint_prior_scale_list[int(parameters_iter[2])]
            model_holidays, forecast_holidays = model_training_forecasting(country_df, days_to_forecast_for_evalution, 
                                                                           holidays_df, mode_main=mode_main,
                                                                           weekly_fourier_order = weekly_fourier_order_i, 
                                                                           several_days_fourier_order = several_days_fourier_order_i,
                                                                           changepoint_prior_scale = changepoint_prior_scale_i,
                                                                           mode_seasonality = mode_seasonality_list[k])
            
            # Evaluate forecasts with validation set val_df and calculaction and printing the relative error
            forecast_holidays_df = forecast_holidays[['ds', 'yhat']].copy()
            relative_error_holidays = eval_error(forecast_holidays_df, 'with holidays impact')

            # Save results
            if (k == 0) and (i == 0):
                relative_error_holidays_min = relative_error_holidays
                forecast_holidays_df_best = forecast_holidays[cols_h]
                model_holidays_best = model_holidays
                weekly_fourier_order_best = weekly_fourier_order_i
                several_days_fourier_order_best = several_days_fourier_order_i
                changepoint_prior_scale_best = changepoint_prior_scale_i
                mode_seasonality_best = mode_seasonality_list[k]

            elif (relative_error_holidays < relative_error_holidays_min):
                relative_error_holidays_min = relative_error_holidays
                forecast_holidays_df_best = forecast_holidays[cols_h]
                model_holidays_best = model_holidays
                weekly_fourier_order_best = weekly_fourier_order_i
                several_days_fourier_order_best = several_days_fourier_order_i
                changepoint_prior_scale_best = changepoint_prior_scale_i
                mode_seasonality_best = mode_seasonality_list[k]

            # Save results to dataframe with result for the last date
            confirmed_real_last = country_df.tail(1)['y'].values[0].astype('int')
            results.loc[i+n*k,'Conf_real'] = confirmed_real_last if confirmed_real_last > 0 else 0
            confirmed_pred_last = int(round(forecast_df.tail(1)['yhat'].values[0]))
            results.loc[i+n*k,'Conf_pred'] = confirmed_pred_last if confirmed_pred_last > 0 else 0
            confirmed_pred_holidays_last = int(round(forecast_holidays_df_best.tail(1)['yhat'].values[0],0))
            results.loc[i+n*k,'Conf_pred_h'] = confirmed_pred_holidays_last if confirmed_pred_holidays_last > 0 else 0
            results.loc[i+n*k,'mode_s'] = mode_seasonality_list[k]
            results.loc[i+n*k,'err'] = relative_error
            results.loc[i+n*k,'err_h'] = relative_error_holidays
            results.loc[i+n*k,'weekly_fn'] = weekly_fourier_order_i
            results.loc[i+n*k,'several_days_fn'] = several_days_fourier_order_i
            results.loc[i+n*k,'ch_p_s_fn'] = changepoint_prior_scale_i
            results.loc[i+n*k,'how_less, %'] = round((relative_error-relative_error_holidays)*100/relative_error,1)

            print('i =',i+n*k,' from',len(mode_seasonality_list)*n-1,':  weekly_fourier_order =', weekly_fourier_order_i, 'several_days_fourier_order =', several_days_fourier_order_i,
                  'changepoint_prior_scale =', changepoint_prior_scale_i)
            print('relative_error_holidays =',relative_error_holidays, 'relative_error_holidays_min =',relative_error_holidays_min, '\n')

        # Results visualization
        print('Seasonality mode is', mode_seasonality_list[k])
        print('The best errors of model with holidays is', relative_error_holidays_min,
              'weekly_fourier_order =', weekly_fourier_order_i, 'several_days_fourier_order =', several_days_fourier_order_i,
              'changepoint_prior_scale =', changepoint_prior_scale_i)
        print('The error of model without holidays is', relative_error, '\n')

    # Save results to dataframe with all dates
    forecast_holidays_df_best['country'] = country_main
    forecast_holidays_df_best.rename(columns={'yhat':'confirmed'}, inplace=True)
    forecast_holidays_dfs = forecast_holidays_df_best.tail(days_to_forecast_for_evalution)

    # Forecasting the future
    if relative_error < relative_error_holidays_min:
        # The forecast without taking into account the holidays is the best
        model_future_best, forecast_future_best = model_training_forecasting(country_df, days_to_forecast_for_evalution, mode_main=mode_main,
                                                                             mode_seasonality = mode_seasonality_w_best)
        forecast_plot = model_future_best.plot(forecast_future_best, ylabel='Confirmed in '+ country_main + ' (forecasting without holidays) - ' + mode_seasonality_w_best)
        cols = cols_w
        print('The best model is model without holidays')
    else:
        # The forecast taking into account the holidays is the best
        print('The best model is model with holidays')
        model_future_best, forecast_future_best = model_training_forecasting(country_df, days_to_forecast_for_evalution, 
                                                                             holidays_df, mode_main=mode_main,
                                                                             weekly_fourier_order = weekly_fourier_order_best, 
                                                                             several_days_fourier_order = several_days_fourier_order_best,
                                                                             changepoint_prior_scale = changepoint_prior_scale_i,
                                                                             mode_seasonality = mode_seasonality_best)
        forecast_plot = model_future_best.plot(forecast_future_best, ylabel='Confirmed in '+ country_main + ' (forecasting with holidays) - ' + mode_seasonality_best)
        cols = cols_h
    # Save forecasting results 
    forecast_future_df_best = forecast_future_best[cols]
    forecast_future_df_best['country'] = country_main
    forecast_future_df_best.rename(columns={'yhat':'confirmed'}, inplace=True)    
    forecast_future_dfs = forecast_future_df_best.tail(days_to_forecast)
    fig = model_future_best.plot_components(forecast_future_best)
    return forecast_future_df_best, forecast_holidays_dfs, relative_errors_holidays, forecast_future_dfs, results

In [None]:
%%time
forecast_future_df_best, forecast_holidays_dfs, relative_errors_holidays, forecast_future_dfs, results = make_forecasts_stage2(df2, holidays_df, days_to_forecast, days_to_forecast_for_evalution, first_forecasted_date, mode_main=mode_opt)

In [None]:
results.to_csv('results.csv', index=False)

## 5.2.2. Results visualization<a class="anchor" id="5.2.2"></a>

[Back to Table of Contents](#0.1)

In [None]:
results

In [None]:
# Visualization or results
print(f'3D plot of Prophet model parameters and COVID-19 error of forecasting to {str(days_to_forecast_for_evalution)} days')

In [None]:
# Determination of the best parameters
results['err_h'] = results['err_h'].astype('float')
results['weekly_fn'] = results['weekly_fn'].astype('int')
results['several_days_fn'] = results['several_days_fn'].astype('int')
results_m = results[results['mode_s'] == 'multiplicative']

In [None]:
# Interactive plot with results of parameters tuning - multiplicative
fig = px.scatter_3d(results_m, x='weekly_fn', y='several_days_fn', z='err_h',
                    color='ch_p_s_fn', color_discrete_sequence= px.colors.sequential.Plasma_r, opacity=1,
                    title='Interactive plot with results of parameters tuning for multiplicative mode')
fig.update(layout=dict(title=dict(x=0.5)))

In [None]:
#display(results_a.nsmallest(5, 'err_h'))
display(results_m.nsmallest(5, 'err_h'))

In [None]:
# The smallest error:
best_result2 = results.nsmallest(1, 'err_h').reset_index(drop=True)
weekly_fourier_order_opt = best_result2.weekly_fn[0]
several_days_fourier_order_opt = best_result2.several_days_fn[0]
mode_seasonality_opt = mode_seasonality_weekly_opt = mode_seasonality_several_days_opt = best_result2['mode_s'][0]
changepoint_prior_scale_opt = best_result2['ch_p_s_fn'][0]
weekly_seasonality_prior_scale_opt = changepoint_prior_scale_opt/weekly_season_reg_coef
several_days_seasonality_prior_scale_opt = changepoint_prior_scale_opt/several_days_season_reg_coef

## 5.3. Results of all tuning<a class="anchor" id="5.3"></a>

[Back to Table of Contents](#0.1)

In [None]:
# The smallest error:
display(best_result2)

In [None]:
best_result_all = round(best_result2.err_h[0], 2)
print(f"Thus, for {country_main} the optimal 11 parameters of Prophet model that gave an error = {best_result_all}% are:")
print("* lower_window =", lower_window_opt)
print("* upper_window =", upper_window_opt)
print("* prior_scale =", prior_scale_opt)
print("* changepoint_prior_scale =", changepoint_prior_scale_opt)
print("* mode_opt =", mode_opt)
print("* weekly_fourier_order =", weekly_fourier_order_opt)
print("* mode_seasonality_weekly =", mode_seasonality_weekly_opt)
print("* weekly_seasonality_prior_scale =", weekly_seasonality_prior_scale_opt)
print("* several_days_fourier_order =", several_days_fourier_order_opt)
print("* mode_seasonality_several_days =", mode_seasonality_several_days_opt)
print("* several_days_seasonality_prior_scale =", several_days_seasonality_prior_scale_opt)

## 6. Prediction <a class="anchor" id="6"></a>

[Back to Table of Contents](#0.1)

In [None]:
def model_training_forecasting(df, forecast_days, holidays_df=None, mode_main='multiplicative', 
                               weekly_fourier_order=10, several_days_fourier_order=10, 
                               changepoint_prior_scale = changepoint_prior_scale_initial_level, mode_seasonality = 'additive'):
    # Optimal Prophet model training and forecasting

    model = Prophet(daily_seasonality=False, weekly_seasonality=False, yearly_seasonality=False, interval_width=0.9,
                    holidays=holidays_df, changepoint_range=1, changepoint_prior_scale = changepoint_prior_scale,
                    seasonality_mode = mode_main)
    if weekly_fourier_order > 0:
        model.add_seasonality(name='weekly', period=7, fourier_order=weekly_fourier_order, mode = mode_seasonality, 
                              prior_scale = changepoint_prior_scale/weekly_season_reg_coef)
    if several_days_fourier_order > 0:
        model.add_seasonality(name='several_days', period=several_days_period-(several_days_fourier_order-3)*7,
                              fourier_order=several_days_fourier_order, mode = mode_seasonality, 
                              prior_scale = changepoint_prior_scale/several_days_season_reg_coef)
    model.add_seasonality(name='4 days', period=several_days_short_period, fourier_order=several_days_short_days_fourier_order, 
                          mode = 'multiplicative', prior_scale = several_days_short_days_season_reg_coef)
    #model.add_seasonality(name='2 weeks', period=14, fourier_order=1, mode = 'multiplicative', prior_scale = 0.15)
    model.fit(df)
    future = model.make_future_dataframe(periods=forecast_days)
    forecast = model.predict(future)
    
    # Make values integer, and replace negative values with zero
    feature_all = ['yhat_lower', 'yhat', 'yhat_upper']
    forecast[feature_all] = forecast[feature_all].round().astype('int')
    for feature in feature_all:
        forecast.loc[forecast[feature] < 0, feature] = 0
        
    return model, forecast

In [None]:
model_future_opt, forecast_future_opt = model_training_forecasting(df2, days_to_forecast, holidays_df, mode_main=mode_opt,
                                                                   weekly_fourier_order = weekly_fourier_order_opt, 
                                                                   several_days_fourier_order = several_days_fourier_order_opt,
                                                                   changepoint_prior_scale = changepoint_prior_scale_opt,
                                                                   mode_seasonality = mode_seasonality_opt)

In [None]:
fig_opt = model_future_opt.plot(forecast_future_opt)

In [None]:
fig_opt_components = model_future_opt.plot_components(forecast_future_opt)

In [None]:
forecast_future_opt_future = forecast_future_opt[['ds', 'yhat_lower', 'yhat', 'yhat_upper']]
forecast_future_opt_future_days = forecast_future_opt_future.tail(days_to_forecast)
forecast_future_opt_future_days

### Calculation of forecasting errors

In [None]:
forecast_future_opt_future_len=len(forecast_future_opt_future)
forecast_future_opt_future[len(df2)-days_to_forecast_for_evalution:len(df2)][['ds', 'yhat']]

In [None]:
df2.tail(days_to_forecast_for_evalution)

In [None]:
y_val = forecast_future_opt_future[len(df2)-days_to_forecast_for_evalution:len(df2)]['yhat']
y = df2.tail(days_to_forecast_for_evalution)['y']
print(f"r2_score - {r2_score(y, y_val)}, mean_absolute_error - {mean_absolute_error(y, y_val)}, root_mean_squared_error - {(mean_squared_error(y, y_val))**(.5)}")

In [None]:
export_forecast_to_excel(forecast_future_opt_future_days, 'forecast_future_opt_future_7_days')

In [None]:
forecast_future_opt_future.to_csv('forecast_future_opt_future.csv', index=False)
best_result2.to_csv('best_result2.csv', index=False)
holidays_df.to_csv('holidays_df_all.csv', index=False)

## 7. Visualization <a class="anchor" id="7"></a>


[Back to Table of Contents](#0.1)

In [None]:
df2.head(3)

In [None]:
prev_df_files = []
dates_prev = []
colors = ['r', 'gold', 'magenta', 'darkgreen', 'brown', 'navy', 'yellow']    # except for c='#0072B2', 'gray', 'k'

In [None]:
def prev_df_reading(df_file_list, dates_prev):
    # Take data with previous forecasts
    
    prev_df = []
    for i in range(len(df_file_list)):
        prev_df_i = pd.read_csv(df_file_list[i])
        print(f"Forecast at {dates_prev[i]}:")
        display(prev_df_i.tail(3))
        prev_df.append(prev_df_i)
    
    return prev_df

In [None]:
prev_df = prev_df_reading(prev_df_files, dates_prev)

In [None]:
forecast_future_opt_future.tail(days_to_forecast)

In [None]:
def comparing_plot(df_new, prev_df, dates_prev, df, num, name_plot_start):
    # Drawing plots for comparison of previous forecasts in English and Ukranian languages
        
    if num > 0:
        df_new = df_new[num:]
        for i in range(len(prev_df)):
            prev_df[i] = prev_df[i][num:]
        df = df[num:]
    else:
        for i in range(len(prev_df)):
            prev_df[i] = prev_df[i][123:]
            
    def plot_lang(prev_df, labels_list, name_plot):
        # Drawing plot for comparison of previous forecasts in given language
        
        fig = plt.figure(facecolor='w', figsize=(16,8))

        # New forecast
        t_new = pd.to_datetime(df_new['ds'].tolist())
        plt.plot(t_new, df_new['yhat'], ls='-', c='#0072B2', label = labels_list[0])
        plt.fill_between(t_new, df_new['yhat_lower'], df_new['yhat_upper'], color='#0072B2', alpha=0.2)
        
        for i in range(len(prev_df)):
            # Old forecast
            df_old = prev_df[i]
            t_old = pd.to_datetime(df_old['ds'].tolist())
            plt.plot(t_old, df_old['yhat'], ls='-', c=colors[i], label = labels_list[i+1])
            plt.fill_between(t_old, df_old['yhat_lower'], df_old['yhat_upper'], color=colors[i], alpha=0.2)

        # Observation data
        t = pd.to_datetime(df['ds'].tolist())
        plt.scatter(t, df['y'], c='k', label = labels_list[-1])

        plt.legend(loc='best')
        plt.grid(True, which='major', c='gray', ls='-', lw=1, alpha=0.2)
        fig.tight_layout()
        
    
    # Prepare dates for plots
    date_today = date.today().strftime("%d.%m.%Y")
    forecast_today_en = f"Forecast at {date_today}"
    forecast_today_ua = f"Прогноз від {date_today}"
    labels_list_en = [forecast_today_en]
    labels_list_ua = [forecast_today_ua]
    for dates in dates_prev:
        labels_list_en.append(f"Forecast at {dates}")
        labels_list_ua.append(f"Прогноз від {dates}")
    labels_list_en.append("Official data")
    labels_list_ua.append("Дані РНБО України")
    
    # English version
    plot_lang(prev_df, labels_list_en, name_plot_start + 'forecast_today_all_en')
    
    # Ukranian version
    plot_lang(prev_df, labels_list_ua, name_plot_start + 'forecast_today_all_ua')

In [None]:
# Comparing for all data
comparing_plot(forecast_future_opt_future, prev_df.copy(), dates_prev, df2, 0, 'All_')

In [None]:
print(f"Thus, for {country_main} the optimal Prophet model has the forecasting errors:")
print(f"- r2_score - {round(r2_score(y, y_val),2)} (the best value - 1.0)")
print(f"- MSE (mean_absolute_error) - {int(round(mean_absolute_error(y, y_val),0))} cases")
print(f"- RMSE (root_mean_squared_error) - {int(round((mean_squared_error(y, y_val))**(.5), 0))} cases")
print(f"- the main error (WAPE - relative error) = {best_result_all} % (the best value - 0.0 %).")

I hope you find this notebook useful and enjoyable.

Your comments and feedback are most welcome.

[Go to Top](#0)