# Object:
As a optoin, we tried to modeling by using SEIR.  
Many pionners try to interpret the dissemination of COVID-19:  
    https://towardsdatascience.com/modelling-the-coronavirus-epidemic-spreading-in-a-city-with-python-babd14d82fa2  
    https://qiita.com/kotai2003/items/ed28fb723a335a873061 (Japanese)  
    https://arxiv.org/abs/2002.06563  
big thanks to them.
    
# Result:
Our SEIR model cleary overestimated the number of cases.  
We should adjust paramaters and/or further considering.

In [None]:
#Libraried
import pandas as pd
pd.set_option('display.max_columns', None)
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")

import datetime
from time import time
from scipy import stats

from sklearn.model_selection import GroupKFold
import lightgbm as lgb
import xgboost as xgb
from catboost import CatBoostRegressor, CatBoostClassifier
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
import os
import glob

import numpy as np
from scipy.integrate import odeint

# Load Data

In [None]:
ca_train = pd.read_csv('/kaggle/input/covid19-local-us-ca-forecasting-week-1/ca_train.csv')
ca_test = pd.read_csv('/kaggle/input/covid19-local-us-ca-forecasting-week-1/ca_test.csv')
ca_submission = pd.read_csv('/kaggle/input/covid19-local-us-ca-forecasting-week-1/ca_submission.csv')

train_df = ca_train
test_df =  ca_test
submission_df =  ca_submission

In [None]:
train_df.head()

# Simple EDA

In [None]:
x_1 = train_df['Date']
y_1 = train_df['ConfirmedCases']
y_2 = train_df['Fatalities']

fig = make_subplots(rows=1, cols=1)

fig.add_trace(
    go.Scatter(x=x_1, mode='lines+markers', y=y_1, marker=dict(color="mediumaquamarine"), showlegend=False,
               name="Original signal"),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x=x_1, mode='lines+markers', y=y_2, marker=dict(color="darkgreen"), showlegend=False,
               name="Original signal"),
    row=1, col=1
)

fig.update_layout(height=400, width=800, title_text="ConfirmedCases (pale) vs. Fatalities (dark) ")
fig.show()

what happend on Mar 10?  
https://apnews.com/96e87b81f05f7ec54fc3e0ad152bd25c  
AP news says that:  
>In Oakland, California, thousands of restless passengers who have been stuck aboard a cruise ship hit by the coronavirus waited their turn to get off the vessel and go to U.S. military bases or back to their home countries for two weeks of quarantine.

A part of passangers might be included the number of ConfirmedCases.

In [None]:
x_1 = train_df['Date']
y_1 = train_df['Lat']
y_2 = train_df['Long']

fig = make_subplots(rows=1, cols=1)

fig.add_trace(
    go.Scatter(x=x_1, mode='lines+markers', y=y_1, marker=dict(color="mediumaquamarine"), showlegend=False,
               name="Original signal"),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x=x_1, mode='lines+markers', y=y_2, marker=dict(color="darkgreen"), showlegend=False,
               name="Original signal"),
    row=1, col=1
)

fig.update_layout(height=400, width=800, title_text="Lat (pale) vs. Long (dark) ")
fig.show()

These features in training data set don't seem reliable.

# Modeling by SEIR
In this note, we would try to predict the number of ConfirmedCases and Fatalities using SEIR model.  
SEIR model is a analytical model usually used for modeling how a disease spreads through a population.

More about SEIR:  
https://towardsdatascience.com/social-distancing-to-slow-the-coronavirus-768292f04296  
https://scipython.com/book/chapter-8-scipy/additional-examples/the-sir-epidemic-model/  
We owe to:  
https://towardsdatascience.com/modelling-the-coronavirus-epidemic-spreading-in-a-city-with-python-babd14d82fa2  
https://qiita.com/kotai2003/items/ed28fb723a335a873061 (Japanese)  
https://arxiv.org/abs/2002.06563  

In [None]:
#define seir

def SEIR_EQ(v, t, beta, epsilon, gamma, N ):
    return [-beta * v[0] * v[2] / N ,beta * v[0] * v[2] / N - epsilon * v[1],
            epsilon * v[1] - gamma * v[2],gamma * v[2]]

In [None]:
# parameters
t_max = 100 #days
dt = 1

# initial_state
S_0 = 36000000 #population of California
E_0 = 0  #number of Exposed on Mar 10: it can't be 0 actually
I_0 = 144 #number of Infectious on Mar 10
R_0 = 0
N_pop = S_0 + E_0 + I_0 + R_0
ini_state = [S_0, E_0, I_0, R_0]  # [S[0],E,[0], I[0], R[0]]


#infection rate
beta_const = 1 #infection rate

#infection rate　after expose
latency_period = 2 #days
epsilon_const = 1/latency_period


infectious_period = 7.4 #days
gamma_const = 1/infectious_period

#case fatality rate
death_rate = 0.01

In [None]:
# numerical integration
times = np.arange(0, t_max, dt)
args = (beta_const, epsilon_const, gamma_const, N_pop)

# Numerical Solution using scipy.integrate
# Solver SEIR model
result = odeint(SEIR_EQ, ini_state, times, args)
# plot
plt.plot(times, result)
plt.legend(['Susceptible', 'Exposed', 'Infectious', 'Removed'])
plt.title("SEIR model  COVID-19")
plt.xlabel('time(days)')
plt.ylabel('population')
plt.grid()

plt.show()

In [None]:
predicted = pd.DataFrame(result)
predicted.columns = ['Susceptible', 'Exposed', 'Infectious', 'Removed']
predicted['death'] = predicted['Infectious']*death_rate

In [None]:
reported = train_df[train_df['Date']>= '2020-03-10'].reset_index()
reported.head()

# Before submitting
We thought that real numbers of 'Infectious' is counted as 'ConfirmedCases'; in many reasons, a part of cases would not be reported.  
So, we presume the reported of 'ConfirmedCases' and 'Fatalities' respectively.  
They were calculated by simple weight average.

In [None]:
tmp_predicted = predicted[0:len(reported)]

reported_rate = reported['ConfirmedCases']/tmp_predicted['Infectious']
reported_rate_c = np.average(a= reported_rate, weights=reported['ConfirmedCases'])
reported_rate_c

In [None]:
tmp_predicted = predicted[0:len(reported)]

reported_rate = reported['Fatalities']/tmp_predicted['death']
reported_rate_d = np.average(a= reported_rate, weights=reported['Fatalities'])
reported_rate_d

In [None]:
predicted_s = predicted[0:len(submission_df)]
submission_df['ConfirmedCases'] = predicted_s['Infectious']*reported_rate_c
submission_df['Fatalities'] = predicted_s['death']*reported_rate_d
submission_df.head()

In [None]:
x_1 = test_df['Date']
y_1 = submission_df['ConfirmedCases']
y_2 = submission_df['Fatalities']

fig = make_subplots(rows=1, cols=1)

fig.add_trace(
    go.Scatter(x=x_1, mode='lines+markers', y=y_1, marker=dict(color="mediumaquamarine"), showlegend=False,
               name="Original signal"),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x=x_1, mode='lines+markers', y=y_2, marker=dict(color="darkgreen"), showlegend=False,
               name="Original signal"),
    row=1, col=1
)

fig.update_layout(height=400, width=800, title_text="Predicted ConfirmedCases (pale) vs. Fatalities (dark) ")
fig.show()

This result seems cleary overestimating and we believe the actual cases would be less than our prediction.  
From now, we will try to adjust the paramaters and approach to the data.

In [None]:
submission_df.to_csv("submission.csv", index=False)