# The Model 

#### Author: Michael Okeyode

The model is based on a Logistic Growth function. The max number of deaths over a specified period is first obtained from the function. It is then passed into Prophet as a 'cap'

## **$y(t) = \frac{c}{1 + ae^{-bt}}$**  




In [1]:
#Load Libraries


import pandas as pd
import numpy as np
from fbprophet import Prophet
import pickle
import math
import scipy.optimize as optim
import matplotlib.pyplot as plt
%matplotlib inline
from datetime import datetime, timedelta
import logging
logging.getLogger('fbprophet').setLevel(logging.WARNING)

In [2]:
#Load Train File

train = pd.read_csv('Train as at 18th April.csv')

In [3]:
# Define funcion with the coefficients to estimate
def func_logistic(t, a, b, c):
    return c / (1 + a * np.exp(-b*t))

In [4]:
def death_cap():
    
    cap =[]
    
    for r in train['Territory'].unique():
        data = train[train['Territory']==r][['target']]   
        data = data.reset_index(drop=False)
        data.columns = ['Timestep', 'Total Cases']
        if any(data['Total Cases'] > 0):
            data1 = data[data['Total Cases']>0]
            if len(data1['Total Cases'].unique()) < 5:
                cap.append(data1['Total Cases'].max()+51)
            else:
                data1['Timestep'] = range(0, len(data1['Timestep']))
        
                # Randomly initialize the coefficients
                np.random.seed(0)
                p0 = np.random.exponential(size=3)

                # Set min bound 0 on all coefficients, and set different max bounds for each coefficient
                bounds = (0, [100000., 1000., 1000000000.])

                # Convert pd.Series to np.Array and use Scipy's curve fit to find the best Nonlinear Least Squares coefficients
                x = np.array(data1['Timestep']) + 1
                y = np.array(data1['Total Cases'])
            
                try:
                    x = x.argsort()
                    (a,b,c),cov = optim.curve_fit(func_logistic, x, y, bounds=bounds, p0=p0, maxfev=1000000)
                
                    # The time step at which the growth is fastest
                    t_fastest = np.log(a) / b
                    i_fastest = func_logistic(t_fastest, a, b, c)
                
                    res_df = data1[['Timestep', 'Total Cases']].reset_index(drop=True)
                    res_df['fastest_grow_day'] = t_fastest
                    res_df['fastest_grow_value'] = i_fastest
                    res_df['growth_stabilized'] = t_fastest <= x[-1]
                    res_df['timestep'] = x
                    res_df['res_func_logistic'] = func_logistic(x, a, b, c)
            
                    if t_fastest <= x[-1]:
                        print('Death stabilized:', r, '| Fastest grow day:', t_fastest, '| Death:', i_fastest, '| Total Days:', x[-1])
                        res_df['cap'] = func_logistic(x[-1] + 60, a, b, c)
                        print(res_df['cap'][0])
                        
                    else:
                        print('Death increasing:', r, '| Fastest grow day:', t_fastest, '| Infections:', i_fastest)
                        res_df['cap'] = func_logistic(t_fastest + 60, a, b, c)
                        print(res_df['cap'][0])
                        
                    d = res_df['cap'][0]
                    cap.append(d)    
                    
                except RuntimeError:
                    print('No fit found for: ', r)
        
        else:
            cap.append(0)
         
            
    return cap

In [5]:
cap = death_cap()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Death stabilized: Afghanistan | Fastest grow day: 24.659238814405448 | Death: 26.74577390549263 | Total Days: 27
53.48700800218842
Death stabilized: Albania | Fastest grow day: 19.514114241637504 | Death: 12.736258259074193 | Total Days: 38
25.47251616304267
Death stabilized: Algeria | Fastest grow day: 26.379638324944263 | Death: 197.30461211941414 | Total Days: 37
394.60920464706857
Death stabilized: Andorra | Fastest grow day: 13.61326404931011 | Death: 17.711430134224905 | Total Days: 27
35.42285638834735
Death stabilized: Argentina | Fastest grow day: 33.892325687133564 | Death: 83.55112291397104 | Total Days: 41
167.0992881427881
Death stabilized: Armenia | Fastest grow day: 17.247008754044785 | Death: 13.976234992216913 | Total Days: 23
27.951329950853584
Death stabilized: Australia | Fastest grow day: 35.72287408666674 | Death: 38.60298138550953 | Total Days: 48
77.20560699211829
Death stabilized: Austria | Fastest grow day: 25.486178402019114 | Death: 240.93549938148524 | Tota

Death stabilized: Mali | Fastest grow day: 13.82081305368525 | Death: 9.291686418662936 | Total Days: 20
18.58324379082612
Death stabilized: Mauritius | Fastest grow day: 9.961089908035534 | Death: 4.624018045013427 | Total Days: 28
9.248035886303336
Death stabilized: Mexico | Fastest grow day: 28.05926115228676 | Death: 448.1667077734985 | Total Days: 30
896.3316955601408
Death increasing: Montenegro | Fastest grow day: 163.52307227224813 | Infections: 38105.23810565301
75111.21376149703
Death stabilized: Morocco | Fastest grow day: 26.308889777940703 | Death: 72.11086245916287 | Total Days: 39
144.2217109843603
Death stabilized: Netherlands (the) | Fastest grow day: 31.030282505543518 | Death: 1914.346552678695 | Total Days: 43
3828.6833389521516
Death stabilized: New Zealand | Fastest grow day: 15.322750407455807 | Death: 6.415204759319552 | Total Days: 20
12.830409518628084
Death stabilized: Niger (the) | Fastest grow day: 12.198933070918772 | Death: 8.695075094055445 | Total Days:

In [6]:
uniq_terr = train['Territory'].unique()

In [7]:
cap_check = pd.DataFrame({'Territory':uniq_terr, 'cap': cap})

In [8]:
train = train.merge(cap_check, on='Territory', how='left')

In [9]:
train.head(2)

Unnamed: 0,Territory X Date,target,cases,Territory,Date,cap
0,Afghanistan X 1/22/20,0,0,Afghanistan,1/22/20,53.487008
1,Afghanistan X 1/23/20,0,0,Afghanistan,1/23/20,53.487008


### Model With Prophet

In [10]:
from tqdm import tqdm as tqdm

collect = []

for r in tqdm(train['Territory'].unique()):
    if train[train['Territory']==r]['cap'].iloc[0] > 0:
        try:
            to_check = train[train['Territory']==r][['Date', 'target' ,'cap']]
            to_check = to_check[to_check['target']>0]
            to_check.columns = ['ds', 'y', 'cap']
            to_check['ds'] = pd.to_datetime(to_check['ds'])
            to_check['weekday'] = to_check['ds'].apply(lambda x: pd.Timestamp(x).dayofweek)
            m = Prophet(interval_width=0.95, growth='logistic')
            m.add_regressor('weekday')
            m.fit(to_check)
            future = m.make_future_dataframe(periods=51)
            future['cap'] = to_check['cap'].iloc[0]
            future['weekday'] = future['ds'].apply(lambda x: pd.Timestamp(x).dayofweek)
            forecast = m.predict(future)[['ds', 'yhat']][-51:]
            ter_targ =[]
            for d in forecast['ds']:
                a = r + ' X ' + str(d.strftime('%#m/%#d/%y'))
                ter_targ.append(a)
            forecast['Territory X Date'] = ter_targ
            forecast['target'] = forecast['yhat']
            fin_df = forecast.drop(['ds', 'yhat'], axis=1)
            
            to_targ = []
            for e in to_check['ds']:
                b = r + ' X ' + str(e.strftime('%#m/%#d/%y'))
                to_targ.append(b)
            to_check['Territory X Date'] = to_targ
            to_check['target'] = to_check['y']
            to_df = to_check[['Territory X Date', 'target']]
            final_df = pd.concat([to_df, fin_df], axis=0)
            collect.append(final_df)
            
        except:
            dates = pd.date_range(start='2020-04-19', end='2020-06-08', freq='1d')        
            exc_targ =[]
            for d in dates:
                a = r + ' X ' + str(d.strftime('%#m/%#d/%y'))
                exc_targ.append(a)
            fin_df = pd.DataFrame(exc_targ, columns=['Territory X Date'])
            fin_df['target'] = to_check['cap'].iloc[0]
            to_targ = []
            for e in to_check['ds']:
                b = r + ' X ' + str(e.strftime('%#m/%#d/%y'))
                to_targ.append(b)
            to_check['Territory X Date'] = to_targ
            to_check['target'] = to_check['y']
            to_df = to_check[['Territory X Date', 'target']]
            final_df = pd.concat([to_df, fin_df], axis=0)
            
            collect.append(final_df)
            
    else:
        to_check = train[train['Territory']==r][['Date', 'target']]        
        to_check.columns = ['ds', 'y']
        to_check['ds'] = pd.to_datetime(to_check['ds'])
        m = Prophet(interval_width=0.95)
        m.fit(to_check)
        future = m.make_future_dataframe(periods=51)
        forecast = future[-51:]
        ter_targ =[]
        for d in forecast['ds']:
            a = r + ' X ' + str(d.strftime('%#m/%#d/%y'))
            ter_targ.append(a)
        forecast['Territory X Date'] = ter_targ
        forecast['target'] = 0
        fin_df = forecast.drop(['ds'], axis=1)
        collect.append(fin_df)

  0%|                                                                                          | 0/209 [00:00<?, ?it/s]INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

100%|████████████████████████████████████████████████████████████████████████████████| 209/209 [13:18<00:00,  3.82s/it]


In [11]:
df = collect[0]

for i in range(1,len(collect)):
     df = pd.concat([df, collect[i]], axis=0)

In [12]:
df = df.reset_index(drop=True)

In [13]:
df['Territory'] = df['Territory X Date'].apply(lambda x: x.split(' X ')[0]) 

### Correct Predictions

In [14]:
new_df = []


for r in tqdm(df['Territory'].unique()):
    correct = df[df['Territory']==r][['Territory X Date', 'target', 'Territory']]
    for i in range(1, len(correct)):
        if correct['target'].iloc[i] < correct['target'].iloc[i-1]:
            correct['target'].iloc[i] = correct['target'].iloc[i] + (correct['target'].iloc[i-1] - correct['target'].iloc[i])
    new_df.append(correct)



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

100%|███████████████████████████████████████████████████████████████████████████████| 209/209 [00:01<00:00, 161.86it/s]


In [15]:
pred = new_df[0]

for i in range(1,len(new_df)):
    pred = pd.concat([pred, new_df[i]], axis=0)

In [16]:
pred['target'] = pred['target'].astype('int64')

In [17]:
pred.tail()

Unnamed: 0,Territory X Date,target,Territory
15248,Zimbabwe X 6/4/20,26,Zimbabwe
15249,Zimbabwe X 6/5/20,27,Zimbabwe
15250,Zimbabwe X 6/6/20,27,Zimbabwe
15251,Zimbabwe X 6/7/20,28,Zimbabwe
15252,Zimbabwe X 6/8/20,29,Zimbabwe


In [18]:
pred[['Territory X Date', 'target']].to_csv('Predictions 19-04 to 08-06 Update.csv', index=False)

### Some things that might work better

You can modify the 'cap' for specific territories if you have reasons to believe that it would be way more than predicted from the logistic model

e.g:   train['cap'] = train['cap'].apply(lambda x: x+4500 if x>10000 else (x+500 if x>0 else x))