# Monte Carlo and Bootstrap: Standard Error Comparison

For the following section, we compare plots for independent observations and clustered observations without bootstrapping and without bootstrapping.  

## Monte Carlo

Using Monte Carlo simulation, we show the effects of clustering and stratifying data on the standard errors of a parameter vector. The plots are the sampling distributions of the main parameter, $\beta_1$ after monte carlo simulation. 

For context, the estimation considers a logit model. The parameter $\beta_1$ is "**average_hours_spent_studying_daily**" and the outcome is whether the student pass the class (=1) or not (=0) at the end of the semester. 

For each sample, the monte carlo simulation pulls 200 observations from the population of 2000 students, and performs 500 monte carlo simulations. Each simulation returns a parameter vector, in this case of length 2 (average_hours_spent_studying_daily + intercept) and it is stored in a numpy array. 

(1) The first plot reports the sampling distribution of $\beta_1$ for independent observations. 

(2) The second plot for clustered observations.

(3) The third plot is for stratified data. Stratification is done using information on income, sex and age.

Explanations on what clustered observations and/or stratified data are can be found [here](). 

(4) The fourth plot reports the plots superimposed to compare standard errors between them.

There are two additional plots to compare the independent observations monte carlo simulation with clustered observations and stratified data.

In [4]:
from IPython.display import HTML
from IPython.display import display

display(
    HTML(
        "<table><tr><td><img src='ind_plot.png'></td><td><img src='clu_plot.png'/table>"
    )
)

In [8]:
display(
    HTML(
        "<table><tr><td><img src='str_plot.png'></td><td><img src='together.png'/table>"
    )
)

In [9]:
display(
    HTML(
        "<table><tr><td><img src='ind_clu_plot.png'></td><td><img src='ind_str.png'/table>"
    )
)

The sampling distributions look fairly similar, but the expected differences are still apparent. To begin, when using clustered data, the distribution is supposed to appear "flatter" and have "fatter" tails as a result. This is more apparent with less clusters, as each cluster can be seen as an independent unit. Here, there are 100 clusters for 200 data points. As the number of clusters approaches the size of the dataset, the similar the sampling distribution of the parameters. Likewise, the fewer clusters you have, the "flatter" you can expect your sampling distribution to be. 

Stratification, on the other hand, is a variance reduction method for monte carlo simulation. As you can see, the last plot shows the stratified data and the independent observations sampling distributions to be both normal. When we extract 200 observations from the stratified population, I must extract the same percent from each strata, and concatenate them to make the 200 observations. For monte carlo simulations, this leads to lesser variance, apparent from the centered, smooth shape of the stratified plot. We used age, sex, and income to determine strata.

## Bootstrapping

For an introduction on bootstrapping, see [here](https://www.ssc.wisc.edu/~bhansen/econometrics/), chapter 10. In sum, it is a resampling method, which takes the data as the population. It resamples the data with replacement, meaning, it replaces all observations with any observations from the same data, implying double observations may exist. For each replacement, we derive a set of parameters. Bootstrapping can extract the sampling distribution of virtually any statistic. Estimates can be means, standard errors, variances, etc. After 2000 draws, we should have 2000 estimates, for which a distribution is constructed. Here, we plot the parameter estimates after each simulation. 

In [32]:
print(
    "             1000 observations, 10 clusters                               400 observations, 10 clusters"
)
display(
    HTML(
        "<table><tr><td><img src='1000n_10clusters_bootstrap.png'></td><td><img src='400n_10clusters_bootstrap.png'></td></tr></table>"
    )
)

             1000 observations, 10 clusters                               400 observations, 10 clusters


For the bootstrapping, we create two plots: 

(1) the first plot contains $1000$ observations for the population with $10$ clusters 

(2) the second plot contains $400$ observations with $10$ clusters. 

The first observation is: for both plots, bootstrapping on the clustered data leads to greater dispersion ("flatter") than uniform bootstrap. (Is this always the case?) The second observation is the observation to cluster ratio. Clusters are groups of observations which are correlated along some dimension. For example, students in one classroom may share some correlation versus students *across* classrooms. The left plot shows a hypothetical situation of 100 students per classroom ($\frac{1000}{10}$) and the right shows 40 students per classroom. The reason the clustered right plot appears to have a proportionally closer variance to the uniform bootstrap than the left plot comes from more information. The clustered bootstrap distribution approaches the uniform bootstrap as the number of clusters approaches the number of observations. Although still fairly a large difference, $10$ is closer to $400$ than $1000$. 

## References

[1] Hansen, Bruce. *Econometrics*. University of Wisconsin. 2020.