
<div class="alert alert-block alert-success">
    <h1 align="center">Covid-19 Cases</h1>
    
</div>

### Introduction
**Coronavirus disease 2019 (COVID-19)** is a contagious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first known case was identified in *Wuhan, China*, in December 2019. The disease has since spread worldwide, leading to an ongoing pandemic.

### Symptoms
Symptoms of COVID-19 are variable, but often include *fever, cough, headache, fatigue, breathing difficulties, and loss of smell and taste*.Symptoms may begin *one to fourteen days* after exposure to the virus. At least a third of people who are infected do not develop noticeable symptoms. Of those people who develop symptoms noticeable enough to be classed as patients, most (81%) develop mild to moderate symptoms (up to mild pneumonia), while 14% develop severe symptoms (dyspnea, hypoxia, or more than 50% lung involvement on imaging), and 5% suffer critical symptoms (respiratory failure, shock, or multiorgan dysfunction). Older people are at a higher risk of developing severe symptoms. Some people continue to experience a range of effects (long COVID) for months after recovery, and damage to organs has been observed. Multi-year studies are underway to further investigate the long-term effects of the disease.

### Transmutation
COVID-19 transmits when people breathe in air contaminated by droplets and small airborne particles containing the virus. The risk of breathing these in is highest when people are in close proximity, but they can be inhaled over longer distances, particularly indoors. Transmission can also occur if splashed or sprayed with contaminated fluids in the eyes, nose or mouth, and, rarely, via contaminated surfaces. People remain contagious for up to 20 days, and can spread the virus even if they do not develop symptoms.

Several testing methods have been developed to diagnose the disease. The standard diagnostic method is by detection of the virus' nucleic acid by real-time reverse transcription polymerase chain reaction (rRT-PCR), transcription-mediated amplification (TMA), or by reverse transcription loop-mediated isothermal amplification (RT-LAMP) from a nasopharyngeal swab.

Several COVID-19 vaccines have been approved and distributed in various countries, which have initiated mass vaccination campaigns. Other preventive measures include physical or social distancing, quarantining, ventilation of indoor spaces, covering coughs and sneezes, hand washing, and keeping unwashed hands away from the face. The use of face masks or coverings has been recommended in public settings to minimize the risk of transmissions. While work is underway to develop drugs that inhibit the virus, the primary treatment is symptomatic. Management involves the treatment of symptoms, supportive care, isolation, and experimental measures.

## Import Libraries

In [2]:
!pip install comet_ml

Collecting comet_ml
  Using cached comet_ml-3.19.0-py2.py3-none-any.whl (299 kB)
Collecting requests-toolbelt>=0.8.0
  Using cached requests_toolbelt-0.9.1-py2.py3-none-any.whl (54 kB)
Collecting websocket-client>=0.55.0
  Using cached websocket_client-1.2.1-py2.py3-none-any.whl (52 kB)
Collecting dulwich>=0.20.6
  Using cached dulwich-0.20.25-cp38-cp38-win_amd64.whl (488 kB)
Collecting everett[ini]>=1.0.1
  Using cached everett-2.0.1-py2.py3-none-any.whl (33 kB)
Collecting wurlitzer>=1.0.2
  Using cached wurlitzer-3.0.2-py3-none-any.whl (7.3 kB)
Collecting semantic-version>=2.8.0
  Using cached semantic_version-2.8.5-py2.py3-none-any.whl (15 kB)
Collecting nvidia-ml-py3>=7.352.0
  Using cached nvidia_ml_py3-7.352.0-py3-none-any.whl
Collecting configobj
  Using cached configobj-5.0.6-py3-none-any.whl
Installing collected packages: everett, configobj, wurlitzer, websocket-client, semantic-version, requests-toolbelt, nvidia-ml-py3, dulwich, comet-ml
Successfully installed comet-ml-3.19.0

In [3]:
# # import comet_ml at the top
from comet_ml import Experiment

# Create an experiment with api key
experiment = Experiment(
    api_key="Z0oOb8S6C70IJ7b2FUcs31MnP",
    project_name='covid_19_cases',
    workspace='parvezsohail'
)

COMET INFO: Experiment is live on comet.ml https://www.comet.ml/parvezsohail/covid-19-cases/439c5da3c78b4caebf01c8cde54b795f



In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import plotly.express as px
import seaborn as  sns
from sklearn.metrics import r2_score





ModuleNotFoundError: No module named 'plotly.express'

## Import Data

In [None]:
df0 = pd.read_csv("Dataset/data/CONVENIENT_global_confirmed_cases.csv")
df1 = pd.read_csv("Dataset/data/CONVENIENT_global_deaths.csv")

## Data Preparation

In [None]:
countries = df0.iloc[:,1:].columns
countries


In [None]:
world = pd.DataFrame({"Country":[],"Cases":[]})
world['Country'] = df0.iloc[:,1:].columns
cases = []
for i in world['Country']:
    cases.append(pd.to_numeric(df0[i][1:]).sum())
world['Cases'] = cases

country_list = list(world['Country'].values)
idx = 0
for i in country_list:
    sayac = 0
    for j in i:
        if j==".":
            i = i[:sayac]
            country_list[idx]=i
        elif j=="(":
            i = i[:sayac-1]
            country_list[idx]=i
        else:
            sayac+=1
    idx += 1
world['Country'] = country_list
world = world.groupby('Country')['Cases'].sum().reset_index()
world.head()

In [None]:
continent = pd.read_csv("Dataset/continents/continents2.csv")
continent["name"] = continent["name"].str.upper()
continent.head()

## Data Viualization

In [None]:
world.head()

In [None]:
world['Cases Range'] = pd.cut(world['Cases'],[10000,50000,200000,800000,1500000,15000000],labels=["U50K","50kto200k","200kto800k","800kto1.5M","1.5M+"])


In [None]:
alpha = []
for i in world['Country'].str.upper().values:
    if i == "BRUNEI":
        i = "BRUNEI DARUSSALAM"
    elif i == "US":
        i = "UNITED STATES"
    if len(continent[continent["name"] == i]["alpha-3"].values)==0:
        alpha.append(np.nan)
    else:
        alpha.append(continent[continent["name"]==i]["alpha-3"].values[0])
world["Alpha3"]=alpha

In [None]:
world.head()

In [None]:
world['Country'] = world['Country'].str.upper()
world.head()

In [None]:
world.isna().sum()

In [None]:
fig = px.choropleth(world.dropna(),
                   locations='Alpha3',
                   color='Cases Range',
                   projection='mercator',
                   color_discrete_sequence=['khaki','yellow','lightblue','red','orange'])
fig.update_geos(fitbounds='locations',visible=False)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
plt.show()

In [None]:
# Daily cases all around the world
count = []
for i in range(1,len(df0)):
    count.append(sum(pd.to_numeric(df0.iloc[i,1:].values)))

df = pd.DataFrame()
df['Date'] = df0['Country/Region'][1:]
df['Cases'] = count
df = df.set_index('Date')

# Daily death cases all around the world
count = []
for i in range(1,len(df1)):
    count.append(sum(pd.to_numeric(df1.iloc[i,1:].values)))

df['Deaths'] = count

df.head()

In [None]:
# remove decimal values
pd.set_option('precision',0)
df.head()

In [None]:
# Daily covid19 cases

plt.ticklabel_format(style='plain')
df.Cases.plot(title='Daily Covid19 Cases in World',marker=".",figsize=(10,8),label="Daily cases")
df.Cases.rolling(window=5).mean().plot(figsize=(25,5),label='MovingAverage(5)')
plt.ylabel("Cases",fontsize=15)
plt.xlabel("Date",fontsize=15)
plt.legend()
plt.show();

In [None]:
fig = px.line(df, y='Cases',title='Daily Covid 19 Cases in World')
fig.show();

In [None]:
# Daily covid19 Death Cases
df.Deaths.plot(title='Daily Covid19 Deaths in World', marker=".",label="Daily Deaths")
df.Deaths.rolling(window=5).mean().plot(figsize=(25,5),label='MovingAverage(5)')
plt.ylabel("Deaths",fontsize=15)
plt.xlabel("Date",fontsize=15)
plt.xticks(fontstyle='oblique',fontsize=10)
plt.legend()
plt.show();

In [None]:
fig = px.line(df, y='Deaths',title='Daily Covid 19 Death Cases in World')
fig.show();

In [None]:
# parse dates from 'df' dataframe
set_date = pd.to_datetime(df.index)
df.index = set_date

In [None]:
df.head()

In [None]:
# Get  data array
timesteps = df.index.to_numpy()
cases = df['Cases'].to_numpy()
deaths = df['Deaths'].to_numpy()

timesteps[:10],cases[:10],deaths[:10]

## Split dataset into Train and Test

The best way to split the time series data is to avoid the random_split


In [None]:
# Create train and test splits the right way for time series
split_size = int(0.8 * len(df))

# Create train data splits (everything before the split)
X_train, y_train = timesteps[:split_size], cases[:split_size]

# Create test data splits (everything after the split)
X_test, y_test = timesteps[split_size:], cases[split_size:]

len(X_train),len(X_test), len(y_train), len(y_test)

In [None]:
# Plot correctly made splits
plt.figure(figsize=(10, 7))
plt.ticklabel_format(style='plain')
plt.scatter(X_train, y_train, s=5, label="Train data")
plt.scatter(X_test, y_test, s=5, label="Test data")
plt.xlabel("Date")
plt.ylabel("Cases")
plt.legend(fontsize=14)
plt.show();

## Baseline Model : Naive Forecast

As usual, let's start with a baseline

One of the most common baseline models for time series forecasting, the naive model (also called the`naive forecast`), requires no training at all.

That's because all the naive model does is use the previous timestep value to predict the next timestep value

The formula looks like this


In [None]:
# Create a naive forecast
naive_forecast = y_test[:-1] # Naive forecast every value excluding the last value
naive_forecast[:10], naive_forecast[-10:]

In [None]:
# Create a function to plot time series data
def plot_time_series(timesteps, values, format='.', start=0, end=None, label=None):
    """
    Plots a timesteps (a series of points in time) against values (a series of values across timesteps).

    Parameters
    ---------
    timesteps : array of timesteps
    values : array of values across time
    format : style of plot, default "."
    start : where to start the plot (setting a value will index from start of timesteps & values)
    end : where to end the plot (setting a value will index from end of timesteps & values)
    label : label to show on plot of values
    """
    # Plot the series

    plt.plot(timesteps[start:end], values[start:end], format, label=label)
    plt.xlabel("Time")
    plt.ylabel("Cases")
    if label:
        plt.legend(fontsize=14) # make label bigger
    plt.grid(True)

In [None]:
# Plot naive forecast
plt.figure(figsize=(10, 7))
plt.ticklabel_format(style='plain')
plot_time_series(timesteps=X_train, values=y_train, label="Train data")
plot_time_series(timesteps=X_test, values=y_test, label="Test data")
plot_time_series(timesteps=X_test[1:], values=naive_forecast, format="-", label="Naive forecast");

In [None]:
import tensorflow as tf


In [None]:
# MASE implemented
def mean_absolute_scaled_error(y_true, y_pred):
    mae = tf.reduce_mean(tf.abs(y_true - y_pred))

    mae_naive_no_season = tf.reduce_mean(tf.abs(y_true[1:] - y_true[:-1]))

    return mae/mae_naive_no_season

In [None]:
# evaluate metrics function
def evaluate_preds(y_true, y_pred):
    # Make sure float32 (for metric calculations)
    y_pred = tf.cast(y_pred, dtype=tf.float64)

    # Calculate various metrics
    mae = tf.keras.metrics.mean_absolute_error(y_true, y_pred)
    mse = tf.keras.metrics.mean_squared_error(y_true, y_pred) # puts and emphasis on outliers (all errors get squared)
    rmse = tf.sqrt(mse)
    mape = tf.keras.metrics.mean_absolute_percentage_error(y_true, y_pred)
    mase = mean_absolute_scaled_error(y_true, y_pred)

    return {"mae": mae.numpy(),
            "mse": mse.numpy(),
            "rmse": rmse.numpy(),
            "mape": mape.numpy(),
            "mase": mase.numpy()}

In [None]:
naive_results = evaluate_preds(y_true=y_test[1:],
                               y_pred=naive_forecast)
naive_results

### Format Data : Windowing Dataset

Windowing is a method to turn a time series dataset into **supervised learning problem**

In other words, we want to use windows of the past to predict the future

```
Window for one month (univariate time series)

[0, 1, 2, 3, 4, 5, 6] -> [7]
[1, 2, 3, 4, 5, 6, 7] -> [8]
[2, 3, 4, 5, 6, 7, 8] -> [9]

```

In [None]:
HORIZON = 7
WINDOW_SIZE = 30


In [None]:
# Create function to label windowed data
def get_labelled_windows(x, horizon=7):
    
    return x[:,:-horizon], x[:,-horizon:]

In [None]:
# Test out the window labelling function
test_window, test_label = get_labelled_windows(tf.expand_dims(tf.range(30)+1,axis=0),horizon=HORIZON)
print(f"Window: {tf.squeeze(test_window).numpy()} -> Label: {tf.squeeze(test_label).numpy()}")

In [None]:
# Create function to view NumPy arrays as windows
def make_windows(x, window_size=30,horizon=7):
    """
    Turns a 1D array into a 2D array of sequential windows of window size
    """
    # Create a window of specific window_size(add the horizon on the end for later labelling)
    window_step = np.expand_dims(np.arange(window_size+horizon),axis=0)
    
    # Create 2D array of multiple window steps (minus 1 to account for 0 indexing)
    window_indexes = window_step + np.expand_dims(np.arange(len(x)-(window_size+horizon-1)),axis=0).T
    
    # Index on the target array(time series) with 2D array of multiple window steps
    #
    windowed_array = x[window_indexes]
    
    # Get the labelled windows
    windows, labels = get_labelled_windows(windowed_array, horizon=horizon)
    
    return windows, labels

In [None]:
full_windows, full_labels = make_windows(cases, window_size=WINDOW_SIZE, horizon=HORIZON)
len(full_windows), len(full_labels)

In [None]:
# View the first 3 windows/labels
pd.set_option('precision',0)
for i in range(3):
    print(f"Window: {full_windows[i]} -> Label: {full_labels[i]}")

In [None]:
# make the train/test splits
def make_train_test_splits(windows, labels, test_split=0.2):
    
    """
    Splits matching pairs of windows and labels into train and test splits
    """
    split_size=int(len(windows)*(1-test_split))
    train_windows = windows[:split_size]
    train_labels = labels[:split_size]
    test_windows = windows[split_size:]
    test_labels = labels[split_size:]
    return train_windows, test_windows, train_labels,test_labels

In [None]:
train_windows, test_windows, train_labels, test_labels = make_train_test_splits(full_windows, full_labels)
len(train_windows), len(test_windows), len(train_labels), len(test_labels)

## Make a modelling checkpoint

In order for a fair comparison, we want to compare each model's best performance against each model's best performance against each model's best performance.

For example, if `model_1` performed incredibly well on epoch 55 but its performance fell off toward epoch 100, we want the version of the model's from epoch 55 to compare to other model's rahter than the version of the model from epoch 100.

And the same goes for each of our other models:compare the best agoinst the best.

To take of this, we'll implement a `ModelCheckpoint` callback.

The `ModelCheckpoint callback` will monitor our model's performance during training and save the best model to file by setting `save_best_only=True`.

That way when evaluating our model we could restore its best performing configuration from file.

🔑 **Note:** Because of the size of the dataset (smaller than usual), you'll notice our modelling experiment results fluctuate quite a bit during training (hence the implementation of the **ModelCheckpoint** callback to save the best model).

In [None]:
import os

# Create a function to implement a ModelCheckpoint callback with a specific filename
def create_model_checkpoint(model_name, save_path='model_checkpoint'):
    return tf.keras.callbacks.ModelCheckpoint(filepath=os.path.join(save_path, model_name),
                                             verbose=0,
                                             save_best_only=True)

## Model 1: Dense Model(Window=30, horizon =7)

In [None]:
from tensorflow.keras import layers, Sequential

# set random seed for as reproducile results as possible
tf.random.set_seed(42)

# Model
model_1 = Sequential(name='model_1_dense')
model_1.add(layers.Dense(128, activation='relu'))
model_1.add(layers.Dense(HORIZON,activation='linear'))


# compile model
model_1.compile(loss='mae',
               optimizer=tf.keras.optimizers.Adam(),
               metrics=['mae'])

# Fit the model
model_1.fit(x=train_windows,
           y=train_labels,
           epochs=100,
           verbose=1,
           batch_size=128,
           validation_data=(test_windows,test_labels),
           callbacks=[create_model_checkpoint(model_name=model_1.name)])

In [None]:
# Evaluate modelon test data
model_1.evaluate(test_windows, test_labels)

In [None]:
# make preds
def make_preds(model,input_data):
    
    forecast = model.predict(input_data)
    return tf.squeeze(forecast)

In [None]:
# make prediction using model_1 on the test dataset
model_1_preds = make_preds(model_1,test_windows)
len(model_1_preds), model_1_preds[:10]

In [None]:
# Evaluate preds
model_1_results = evaluate_preds(y_true=tf.squeeze(test_labels), # reduce to right shape
                                 y_pred=model_1_preds)
model_1_results

## Make our evaluation function work for larger horizons

In [None]:
def evaluate_preds(y_true, y_pred):
    y_true = tf.cast(y_true,dtype=tf.float32)
    y_pred = tf.cast(y_pred,dtype=tf.float32)
    
   # calculate various metric
    mae = tf.keras.metrics.mean_absolute_error(y_true,y_pred)
    mse = tf.keras.metrics.mean_squared_error(y_true,y_pred)
    rmse = tf.sqrt(mse)
    mape = tf.keras.metrics.mean_absolute_percentage_error(y_true, y_pred)
    mase = mean_absolute_scaled_error(y_true,y_pred)
    
    # Account for different sized metrics (for longer horizons, reduce to single number)
    if mae.ndim > 0:
        mae = tf.reduce_mean(mae)
        mse = tf.reduce_mean(mse)
        rmse = tf.reduce_mean(rmse)
        mape = tf.reduce_mean(mape)
        mase = tf.reduce_mean(mase)
    return {'mae':mae.numpy(),
           'mse':mse.numpy(),
           'rmse':rmse.numpy(),
           'mape':mape.numpy(),
           'mase':mase.numpy()}

In [None]:
# Get model_3 results aggregated to single values
model_1_results = evaluate_preds(y_true=tf.squeeze(test_labels),
                                 y_pred=model_1_preds)
model_1_results

## Model 2 : Conv1D (WINDOW=30, HORIZON=7)

In [None]:
from tensorflow.keras.layers import Conv1D

# set random seed
tf.random.set_seed(42)

# model_2
model_2 = Sequential(name='model_2_conv1d')
model_2.add(layers.Lambda(lambda x : tf.expand_dims(x,axis=1)) )
model_2.add(Conv1D(128,kernel_size=3,padding='same',activation='relu'))
model_2.add(layers.Dense(HORIZON))

# compile
model_2.compile(loss='mae',
               optimizer='adam',
               metrics=['mae'])

# fit
model_2.fit(train_windows,
           train_labels,
           epochs=100,
           validation_data=(test_windows,test_labels),
           callbacks=[create_model_checkpoint(model_name=model_2.name)])

In [None]:
# evaluate on test data
model_2.evaluate(test_windows,test_labels)

In [None]:
# make prediction
model_2_preds = make_preds(model_2,test_windows)
model_2_preds[:10]

In [None]:
# Evaluate metrics
model_2_results = evaluate_preds(y_true=test_labels,
                                y_pred=model_2_preds)
model_2_results

## Model 3 : RNN(WINDOW=30,HORIZON=7)

In [None]:
# set random seed
tf.random.set_seed(42)

# Let's build an LSTM model with the Functional API
inputs = layers.Input(shape=(WINDOW_SIZE))
x = layers.Lambda(lambda x: tf.expand_dims(x, axis=1))(inputs)
x = layers.LSTM(128, activation="relu")(x) 
output = layers.Dense(HORIZON)(x)
model_3 = tf.keras.Model(inputs=inputs, outputs=output, name="model_3_lstm")

# Compile model
model_3.compile(loss="mae",
                optimizer=tf.keras.optimizers.Adam())

# Seems when saving the model several warnings are appearing: https://github.com/tensorflow/tensorflow/issues/47554 
model_3.fit(train_windows,
            train_labels,
            epochs=100,
            verbose=0,
            batch_size=128,
            validation_data=(test_windows, test_labels),
            callbacks=[create_model_checkpoint(model_name=model_3.name)])


In [None]:
import tensorflow_decision_forests