# Constrained LS estimation of $R_0$ by AR(h+1) model

Estimates the model
$$
    I_{t+1}=\sum_{j=0}^hR_jI_{t-j}
$$
using the time series data in the form $\{I_{T-2h},I_{T-2h+1},\ldots,I_{T-h}\ldots,I_{T},I_{T+1},\ldots\}$, where the parameter $h$ is a time window, we propose $h=14$ for the COVID-19 disease. The data after $I_{T+1}$ is ignored by the method. The estimation is performed by a least-squares approach, solving the optimization problem
$$
    \begin{split}
    \min_R&\ J=\lVert \mathcal{I}_{t+1}-AR\rVert^2\\
    s.t.&\ R_j>0;\ j=0,\ldots,j
    \end{split}
$$
where
$$
    \mathcal{I}_{t}=\begin{pmatrix}
    I_t\\
    I_{t-1}\\
    \vdots\\
    I_{t-h}
    \end{pmatrix}\in\mathbb{R}^{h+1},
$$
$A$ is the block matrix
$$
    A = \begin{pmatrix}
    \mathcal{I}_{t}&\mathcal{I}_{t-1}&\ldots&\mathcal{I}_{t-h}
    \end{pmatrix}\in\mathbb{R}^{(h+1)\times(h+1)},
$$
and $R$ is the coefficients vector
$$
        R = \begin{pmatrix}
        R_0\\
        R_1\\
        \vdots\\
        R_h
        \end{pmatrix}\in\mathbb{R}^{h+1}.
$$
The obtained coefficient vector $R$, solution to this optimization problem is used to estimate the $R_0$ as
$$
\hat{R}_0=\sum_{j=0}^{h}R_j
$$

In [2]:
import json
import matplotlib.pyplot as plt
import numpy as np
from scipy.linalg import norm, solve
from scipy.optimize import Bounds, minimize

def list2col(lst):
    return np.array(lst[::-1]).reshape((len(lst), 1))
data_file = open('data/colombia.json',)
json_data = json.load(data_file)
var = 'Y'

In [3]:
def estimate_r_opt(time_series, h, show=False):
    # Checks if there is enough data for the estimation of an AR(h+1) model
    if 2*h >= len(time_series) - 1:
        return None
    
    # Obtain the index of time T in series data
    T = len(time_series) - 2*h - 2
    
    # Function that returns the \mathcal{I}_t vector
    x = lambda t: list2col(time_series[(t + h):(t + 2*h + 1)])
    
    # Construct the A matrix
    A = []
    for s in range(h + 1):
        A.append(x(T-s))
    A = np.hstack(tuple(A))
    
    # Define the cost functional
    def of(R):
        aux = np.matmul(A, R.reshape((R.size, 1)))
        return np.power(norm(x(T+1) - aux), 2)
    
    # Solve the optimization problem
    init = np.zeros(h+1)
    bnds = Bounds([0.0001 for s in range(h+1)], [np.inf for s in range(h+1)])
    sol = minimize(of, init, bounds=bnds)
    if show:
        print(sol)
    
    # Return the estimated R_0
    return sol.x.sum()

In [4]:
# Select the starting index in the complete dataset for the series data,
# i.e. it uses the series data from index -m to -m + 2*h + 1 for the R_0 estimation.
m = 100

# Time window
h = 14

In [5]:
# Colombia
I_Colombia = json_data['co'][var][-m:]
print(estimate_r_opt(I_Colombia, h))

1.0553669492088702


In [6]:
# Bogotá
I_Bogota = json_data['co_11'][var][-m:]
print(estimate_r_opt(I_Bogota, h))

1.0296651668867103


In [7]:
# Medellín
I_Medellin = json_data['co_05001'][var][-m:]
print(estimate_r_opt(I_Medellin, h))

1.100901415438881
