# Efficient Estimation using 1 simulation

1. This notebook shows how to **estimate** a simple model using Simulated Minimum Distance (SMD)
2. It illustrates how an **efficient** estimator can be constructed using only 1 simulatoin, following the idea proposed by [Kirill Evdokimov](https://www.mit.edu/~kevdokim/ESMSM_sep16.pdf "Efficient Estimation with a Finite Number of Simulation Draws per Observation")

## Recap: Simulated Minimum Distance

**Data:** We assume that we have data available for $N$ households over $T$ periods, collected in $\{w_i\}_i^N$.

**Goal:** We wish to estimate the true, unknown, parameter vector $\theta_0$. We assume our model is correctly specified in the sense that the observed data stems from the model.

The **Simulated Minimum Distance (SMD)** estimator is

$$
\hat{\theta} = \arg\min_{\theta} g(\theta)'Wg(\theta)
$$

where $W$ is a $J\times J$ positive semidefinite **weighting matrix** and

$$
g(\theta)=\Lambda_{data}-\Lambda_{sim}(\theta)
$$

is the distance between $J\times1$ vectors of moments calculated in the data and the simulated data, respectively. Concretely,

$$
\Lambda_{data} = \frac{1}{N}\sum_{i=1}^N m(\theta_0|w_i) \\
\Lambda_{sim}(\theta) = \frac{1}{N_{sim}}\sum_{s=1}^{N_{sim}} m(\theta|w_s) 
$$

are $J\times1$ vectors of moments calculated in the data and the simulated data, respectively. 

**Variance of the estimator:** Recall that  the variance of the estimator was 
$$
\begin{align}
\text{Var}(\hat{\theta})&=(1+S^{-1})\Gamma\Omega\Gamma'/N \\
\Gamma &= -(G'WG)^{-1}G'W \\
\Omega & = \text{Var}(m(\theta_0|w_i))
\end{align}
$$
where we implicitly used that $Var(m(\theta_0|w_i))=Var(m(\theta|w_s))$ and $Cov(m(\theta_0|w_i),m(\theta|w_s))=0$

**Efficient Estimator:** Using the "optimal" weighting matrix, $W=\Omega^{-1}$, gives the *lowest variance* for a given number of simulations, $S$, as
$$
\begin{align}
\text{Var}(\hat{\theta})&=(1+S^{-1})(G'\Omega^{-1}G)^{-1}/N 
\end{align}
$$

> **Observation:** Only as $S\rightarrow\infty$ does the minimum variance of the SMD estimator approach the minimum variance of the GMM estimator.

> **Solution:** [Kirill Evdokimov](https://www.mit.edu/~kevdokim/ESMSM_sep16.pdf "Efficient Estimation with a Finite Number of Simulation Draws per Observation") shows how we can use an augmented set of moments related to the assumptions related to simulation to basically remove the factor $(1+S^{-1})$ on the asymptotic variance of the SMD estimator using only one(!) simulation, $S=1$!

# Model and Estimators
We use the same example as Kirill Evdokimov. Imagine the simple setup where we have the data-generating process (DGP):
$$
\begin{align}
Y_i &= \theta_0 + \varepsilon_i \\
\varepsilon_i &\sim N(0,1) 
\end{align}
$$
**SMD:** We can use the moment function with only $S=1$ simulatin of $\varepsilon$ per individual
$$
g_i(\theta|w_i) = Y_i - \theta -\varepsilon_i
$$
to estimate $\theta$. We will call that $\hat{\theta}_{SMD}$. The moment vector would be
$$
g(\theta) = 
\bigg( \begin{array}{c}
\overline{Y} - \theta -\overline{\varepsilon} \\
\end{array} \bigg)
$$
where $\overline{Y} = \frac{1}{N}\sum_{i=1}^{N} Y_i$ and $\overline{\varepsilon} = \frac{1}{N}\sum_{i=1}^{N} \varepsilon_i$.  

**ES-SMD:** We can use the efficient SMD to augment the moment conditions with the fact that the simulated $\varepsilon$'s should have mean-zero and get the vector of moments in this augmented situation as
$$
g_{aug}(\theta) = 
\bigg( \begin{array}{c}
\overline{Y} - \theta -\overline{\varepsilon} \\
0-\overline{\varepsilon} \\
\end{array} \bigg)
$$
where we use the optimal weighting matrix $W=\Omega^{-1}$ where
$$
\Omega = Var(g_{i,aug}(\theta|w_i)) =
\bigg( \begin{array}{cc}
2 & 1\\
1 & 1 \\
\end{array} \bigg)
$$
and
$$
\Omega^{-1} = \bigg( \begin{array}{cc}
1 & -1\\
-1 & 2 \\
\end{array} \bigg)
$$

We will call this estimator $\hat{\theta}_{ES-SMD}$.

**Asymptotic Variances:** 
1. In the standard SMD estimator, the weighting matrix does not matter and we have 
$$
\begin{align}
AVar(\hat{\theta}_{SMD}) &= Var(g_i(\theta|w_i)) \\
 &= Var(Y_i - \theta -\varepsilon_i)\\
 &= Var(Y_i) +Var(\varepsilon_i) \\
 &= 2
\end{align}
$$
2. In the augmented ES-SMD estmator, we have 
$$
\begin{align}
AVar(\hat{\theta}_{ES-SMD}) &= Var((G'WG)^{-1}G'Wg_{i,aug}(\theta|w_i)) \\
 &= Var(-Y_i + \theta)\\
 &= 1
\end{align}
$$
bacause
$$
(G'WG)^{-1}G'Wg_{i,aug}(\theta|w_i) =  - (Y_i - \theta -\varepsilon) - \varepsilon.
$$
3. We thus have that the asymptotic variance of the ES-SMD estimator is lower that the SMD estimator!

We will now illustrate this result through a **Monte Carlo experiment** too!

# Setup

In [1]:
%load_ext autoreload
%autoreload 2

import numpy as np
from types import SimpleNamespace

import sys
sys.path.append('../')

from SimulatedMinimumDistance import SimulatedMinimumDistanceClass

# Model construction

In [2]:
class ModelClass():
    
    def __init__(self,**kwargs):
        
        self.par = SimpleNamespace()
        self.sim = SimpleNamespace()
        
        self.par.theta = 0.5
        
        self.par.simN = 5000
        
        for key,val in kwargs.items():
            setattr(self.par,key,val)

    def solve(self,do_print=False): pass
    
    def simulate(self,seed=None,do_print=False):
        
        if seed is not None:
            np.random.seed(seed)
        self.sim.e = np.random.normal(size=self.par.simN)
        
        self.sim.Y = self.par.theta + self.sim.e

# Estimation choices

In [3]:
# a. model settings
N = 100_000
N_sim = N
        
par = {'theta':0.2,'simN':N_sim}

par_true = par.copy()
par_true['simN'] = N

# b. parameters to estimate
est_par = {
    'theta': {'guess':0.5,'lower':0.0,'upper':1.0,},
}


# c. moment function used in estimation. 
def mom_func(data,ids=None):
    """ returns the average Y """
    
    if ids is None:
        mean_Y = np.mean(data.Y)
    else:
        mean_Y = np.mean(data.Y[ids])

    return np.array([mean_Y]) # alwaus give a zero

# d. augmented moment function used in efficient estimation. 
def mom_func_aug(data,ids=None):
    """ returns the average Y and the average of the simulations"""
    
    if ids is None:
        mean_Y_e = np.mean([data.Y,data.e],axis=1)
    else:
        mean_Y_e = np.mean([data.Y[ids],data[ids].e],axis=1)

    return mean_Y_e

# Monte Carlo Estimation results

In [4]:
num_boot = 1_000

theta_est = np.empty(num_boot)
theta_est_aug = theta_est.copy()

model = ModelClass(**par)

for b in range(num_boot):
    
    # a. setup model to simulate data
    true = ModelClass(**par_true)
    true.simulate(seed=2050+b) # this seed is different from the default

    # b. data moments
    datamoms = mom_func(true.sim)
    datamoms_aug = np.array([datamoms[0],0.0])
    
    # c. setup estimators
    smd = SimulatedMinimumDistanceClass(est_par,mom_func,datamoms=datamoms)
    smd_aug = SimulatedMinimumDistanceClass(est_par,mom_func_aug,datamoms=datamoms_aug)
    
    # d. weighting matrix
    W = np.ones((datamoms.size,datamoms.size)) # does not matter here
    Omega = np.array([[2.0,1.0],[1.0,1.0]]) # covariance matrix of augmentet moments. 
    W_aug = np.linalg.inv(Omega)
    
    # e. estimate the model (can take several minutes)
    est = smd.estimate(model,W,do_print_initial=False)
    est_aug = smd_aug.estimate(model,W_aug,do_print_initial=False)
    
    # f. store the estimates 
    theta_est[b] = est['theta']
    theta_est_aug[b] = est_aug['theta']


In [5]:
print(f'Variance, SMD:    {np.var(theta_est-par_true["theta"])*N:2.6f}')
print(f'Variance, ES-SMD: {np.var(theta_est_aug-par_true["theta"])*N:2.6f}')

Variance, SMD:    2.036992
Variance, ES-SMD: 1.096781
