# Simulation study


Simulation studies are computer experiments that involve creating data by pseudo-random sampling from known probability distributions. 

There are many ways to use simulation studies in statistics. Some examples are:
- To check algebra (and code), or to provide reassurance that no large error has been made, where a new statistical method has been derived mathematically.
- To assess the relevance of large-sample theory approximations (eg, considering the sampling distribution of an estimator) in finite samples.
- For the absolute evaluation of a new or existing statistical method. Often a new method is checked using simulation to ensure it works in the scenarios for which it was designed.
- **For comparative evaluation of two or more statistical methods.**
- For calculation of sample size or power when designing a study under certain assumptions.

<div align="center"><h2>
The plan
</h2></div>

![sim1.jpg](attachment:sim1.jpg)

## Aims

There is a distinction between simulation studies that offer a proof-of-concept, ie, showing that a method is viable (or fallible) in some settings, and those that aim to stretch or break methods, ie, identifying settings where the method may fail. Both are useful and important in statistical research. For example, one may be faced with two competing methods ofanalysis, both of which are equally easy to implement. Even if the choice is unlikely to materially affect the results, it maybe useful to have unrealistically extreme data-generating mechanisms to understand when and how each method fails.

Alternatively, it may be of interest to compare methods where some or all methods have been shown to work in principle but the methods under scrutiny were designed to address slightly different problems. They may be put head-to-head in realistic scenarios. This could be to investigate properties when one method is correct – *How badly do others fail?* – or when all are incorrect in some way – *Which is most robust?* No method will be perfect, and it is useful to understand how methods are likely to perform in the sort of scenarios that might be expected in practice.

## DGPs

**The choice of DGPs will depend on the aims.**

For example, we might investigate a method under a simple data-generating mechanism, a realistic mechanism, or a completely unrealistic mechanism designed to stretch a method to breaking point.

Simulation studies provide us with empirical results for specific scenarios. For this reason, simulation studies will often involve more than one data-generating mechanism to ensure coverage of different scenarios. 

There is often more than one factor that will vary across specific DGPs. Factors that are frequently varied are sample size (several values) and true parameter values (for example, setting one or more parameters to be zero or nonzero). Varying these factorially is likely to be more informative than one-by-one away from a “base-case” data-generating mechanism, as doing so permits the exploration of interactions between factors.

### What can be varied?

- Sample size.
- Variances of the regressors.
- **Covariance/corellation between the regressors.**
- **Number of correlated regressors.**
- **Dependence in the error terms.**
- Coefficients in the time series errors.
- **Sparcity**.
- **Adding higher order terms to the model.**
- Functional form in general.
- **Distribution of the error terms.**
- Distribution of the regressors.
- Discontinuities.