# Plasmode Simulation

[Franklin et al. 2014](https://pubmed.ncbi.nlm.nih.gov/24587587/): resampling from the observed covariate and exposure data without modification in all simulated datasets to <mark>preserve the associations</mark> among the following variables.

::: column-margin
[@franklin2014plasmode]
::: 

## Variables used

**Original demographic variables (9)**


```{verbatim}
age, sex, education, race, marital status, income, born, cycle 
```


**Original behaviour variables (5)**


```{verbatim}
smoking, diet, high cholesterol, physical activity, sleep
```


::: column-margin
Demographic, behaviour and health history / access variables are all binary or categorical variables.
:::

**Original health history / access variables (2)**


```{verbatim}
diabetes family history, medical access
```


**Transformed lab variables (6)** (complex forms)


```{verbatim}
Tranfored.var.1 = log(globulin)
Tranfored.var.2 = protein*calcium
Tranfored.var.3 = diastolicBP/systolicBP)^2
Tranfored.var.4 = sqrt(uric acid+bilirubin)/2
Tranfored.var.5 = phosphorus^2/(sodium*potassium)
Tranfored.var.6 = log(systolicBP+10)
```


::: column-margin
Original lab variables were: uric acid, protein, bilirubin, phosphorus, sodium, potassium, globulin, calcium, systolicBP, diastolicBP.
:::

::: column-margin
Tranfored var.1 + var.2 + var.3 + var.4 + var.5 + var.6 were used in the model (transformed lab variables instead of the original lab variables).
:::

**Count based prescription codes (1)** (proxies of comorbidity)


```{verbatim}
Simple count = sum of selected ICD-10 CM codes
```


::: column-margin
`Simple count` $= \sum_{i}^{94} R_s$ where $R_s$ are the <mark>selected recurrence covariates</mark>.
:::

::: callout-tip
# Using only partial list of recurrence covariates

We proceeded to select only those binary recurrence covariates that had a relative risk (RR) of <mark>less than 0.8 or greater than 1.2</mark> compared to the outcome. Out of 143 recurrence covariates, 94 of them met this criterion. Therefore, 49 remaining covariates were not used in calculating the `Simple count` variable, and can be considered as noise.
:::

## True outcome model


```{verbatim}
Diabetes (outcome) =  Obese (exposure) + 
  
                      demographic/behaviour/health history variables + 
  
                      transformed lab variables +
  
                      simple count with selected ICD-10 codes
```


::: callout-tip
# Role of variables

The outcome model formula dictates the relationship between exposure and each covariate, but observed association between exposure and covariates are retained from the data. It is possible that some of these covariates may not be associated with the exposure, but at least the association with the outcome is guaranteed by the outcome mode. To mimic the original data associations, we also retained the observed associations between the outcome and the covariates.
:::

::: callout-tip
# Simulation setup

-   **Size** of each cohort = 3,000
-   500 **simulations**/cohort generation
-   **Outcome** rate = 0.4
-   **Exposure** prevalence = 0.2
-   True exposure **Odds ratio** = 1 or **Risk difference** = 0

We maintained all covariate coefficients associated with the remaining covariates to be consistent with the original data. Only the association of the 'simple count' with the outcome was amplified 5 times. By exaggerating the effect of the 'simple count' variable, we aimed to simulate a scenario in which a pronounced and strong unmeasured confounder may exist.
:::


::: column-margin
[@pang2016effect; @karim2018can]
::: 