# Final Project- AI Algorithms - Deliverable 2
Team Members
- Somya Sachan ( 100901887 )
- Hisham Afzal ( 100905717 )
- Gurnish Kaur ( 100901544 )
- Shahabuddin Syed ( 100895169 )

## Forecasting COVID-19 Deaths

This Jupyter notebook presents two forecasting models to predict the number of COVID-19 deaths in the coming days. The models are based on machine learning and time series analysis using FB Prophet.

## Overview

The COVID-19 pandemic has had a devastating impact on the world, and understanding its trajectory is critical in managing the outbreak. Accurate forecasts of the number of deaths can help governments and healthcare systems prepare for the future and make informed decisions.

## Methodology

We have developed two models to forecast the number of COVID-19 deaths:

- A machine learning-based model
- A time series-based model using FB Prophet

## Conclusion

Our forecasting models provide insights into the potential trajectory of the COVID-19 pandemic and can help inform decision-making around managing the outbreak. We hope that our analysis will be useful in predicting the impact of the pandemic and mitigating its effects.


## Machine Learning Model Description

The machine learning model used in this app is a Random Forest model, a popular ensemble learning method that uses decision trees to build a predictive model.

The model is trained on a dataset containing COVID deaths data for various countries, with the following features:

- dayoftheyear: the day of the year (1-365)
- week: the week number (1-52)
- season: the season (1=Winter, 2=Spring, 3=Summer, 4=Fall)
- month: the month (1-12)
- ndeaths_lag1: the number of deaths on the previous day
- ndeaths_lag7: the number of deaths 7 days ago
- ndeaths_lag14: the number of deaths 14 days ago
- ndeaths_lag30: the number of deaths 30 days ago
- ndeaths_mean7: the mean number of deaths over the past 7 days
- ndeaths_mean14: the mean number of deaths over the past 14 days
- ndeaths_mean30: the mean number of deaths over the past 30 days

Given a set of these features for a particular country, the model predicts the number of COVID deaths for the next day.

The Random Forest model is chosen because it is well-suited for datasets with many features and a large number of samples, and tends to produce accurate predictions. It works by training multiple decision trees on different subsets of the data and combining their predictions to reduce overfitting and improve generalization performance.


In [17]:
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

from prophet import Prophet

import plotly.graph_objs as go

In [18]:
# data load and initial preprocessing
# Load the data
covid_data = pd.read_csv('COVID-19-global-data.csv')
covid_data.columns = covid_data.columns.str.strip().str.lower()

# Convert date column to datetime
covid_data['date_reported'] = pd.to_datetime(covid_data['date_reported'])

# Create new columns for week, month, and season
covid_data['day'] = covid_data['date_reported'].apply(lambda x: x.timetuple().tm_yday)
covid_data['week'] = covid_data['date_reported'].dt.week
covid_data['month'] = covid_data['date_reported'].dt.month
covid_data['season'] = (covid_data['month'] % 12 + 3) // 3


Series.dt.weekofyear and Series.dt.week have been deprecated.  Please use Series.dt.isocalendar().week instead.



In [19]:

def get_forecasts_ml(country):
    global covid_data
    india_data = covid_data.loc[covid_data['country']==country,['date_reported', 'day','week', 'month', 'season', 'deaths'] ].copy()
    india_data = india_data.reset_index(drop=True)

    dates = india_data['date_reported']
    india_data = india_data.drop('date_reported', axis=1)


    india_data = india_data.loc[india_data['deaths']!=0]

    # adding lag variables
    india_data['lag_1'] = india_data['deaths'].shift(1)
    india_data['lag_7'] = india_data['deaths'].shift(7)
    india_data['lag_14'] = india_data['deaths'].shift(14)
    india_data['lag_30'] = india_data['deaths'].shift(30)

    # adding lag variables
    # Add weekly and monthly average variables
    india_data['weekly_avg'] = india_data['deaths'].rolling(window=7).mean().shift(1)
    india_data['monthly_avg'] = india_data['deaths'].rolling(window=30).mean().shift(1)

    india_data = india_data.dropna()


    # Define the independent and dependent variables
    X = india_data.drop('deaths', axis=1)
    y = india_data['deaths']


    # Split the data into training and test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

    # Create the regression model
    model = RandomForestRegressor()

    # Fit the model to the training data
    model.fit(X_train, y_train)

    # Make predictions on the test data
    y_pred = model.predict(X_test)

    # Evaluate the model
    r2 = r2_score(y_test, y_pred)
    # print('R-squared:', r2)
    
    # add dates back
    X_test['date'] = dates
    X_train['date'] = dates
    X['date'] = dates
    
    trace_actual = go.Scatter(
        x=X['date'],
        y=y,
        name='Actual Deaths',
        mode='lines+markers',
        line=dict(color='red', width=2),
        marker=dict(color='red', size=8)
    )
    
    trace_predicted = go.Scatter(
        x=X_test['date'],
        y=y_pred,
        name='Predicted Deaths',
        mode='lines',
        line=dict(color='blue', width=2)
    )
    
    # Create layout
    layout = go.Layout(
        title='COVID-19 Deaths Forecast - {}'.format(country),
        xaxis=dict(title='Date'),
        yaxis=dict(title='Number of Deaths')
    )
    
    # Add both traces to a data list
    fig_data = [trace_actual, trace_predicted]
    
    # Create the figure object
    fig = go.Figure(data=fig_data, layout=layout)
    fig.update_layout(plot_bgcolor='#F3F3F3', paper_bgcolor='#F3F3F3')
    return fig


In [23]:
# lets get forecasts for india

country = 'Kuwait'
get_forecasts_ml(country).show()

## FB Prophet Model Description

The FB Prophet model is a time series forecasting model developed by Facebook.

This model uses a decomposable time series model with three main components: trend, seasonality, and holidays.

- The trend component models non-periodic changes in the time series
- The seasonality component models periodic changes
- The holidays component models the impact of holidays and special events on the time series

We have trained this model on historical COVID deaths data to forecast future deaths.


In [37]:

def get_forecasts_fbprophet(country):
    global covid_data
    india_data = covid_data.loc[covid_data['country']==country,['date_reported', 'deaths'] ].copy()
    india_data = india_data.reset_index(drop=True)

    

    # Run fb prophet model
    india_data_fr = india_data[['date_reported', 'deaths']]
    india_data_fr.columns = ['ds', 'y']

    train_data, test_data = train_test_split(india_data_fr, test_size=0.2, shuffle=False)

    m = Prophet(growth = 'linear')
    m.fit(train_data)

    
    
    x_future = m.make_future_dataframe(100)

    forecast_future = m.predict(x_future)
    forecast_future.loc[forecast_future['yhat']<0, 'yhat'] = 0

    forecast = m.predict(test_data[['ds']])
    forecast.loc[forecast['yhat']<0, 'yhat'] = 0

    test_data = test_data.reset_index(drop=True)
    test_data['model_forecasts'] = forecast['yhat']

    trace_actual = go.Scatter(
        x=india_data_fr['ds'],
        y=india_data_fr['y'],
        name='Actual Deaths',
        mode='lines+markers',
        line=dict(color='red', width=2),
        marker=dict(color='red', size=8)
    )
    
    trace_predicted = go.Scatter(
        x=test_data['ds'],
        y=test_data['model_forecasts'],
        name='Predicted Deaths',
        mode='lines',
        line=dict(color='blue', width=2)
    )

    trace_future = go.Scatter(
        x=x_future['ds'],
        y=forecast_future['yhat'],
        name='Future Forecasts',
        mode='lines',
        line=dict(color='blue', width=2)
    )
    
    # Create layout
    layout = go.Layout(
        title='COVID-19 Deaths Forecast - {}'.format(country),
        xaxis=dict(title='Date'),
        yaxis=dict(title='Number of Deaths')
    )
    
    # Add both traces to a data list
    fig_data = [trace_actual, trace_predicted, trace_future]
    
    # Create the figure object
    fig = go.Figure(data=fig_data, layout=layout)
    fig.update_layout(plot_bgcolor='#F3F3F3', paper_bgcolor='#F3F3F3')
    return fig

In [38]:
# lets get forecasts for india

country = 'India'
get_forecasts_fbprophet(country).show()

INFO:prophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
