# Choosing the number of replications

When running a simulation, you need to decide how many replications (runs) are enough. The **confidence interval method** can be used to help guide this choice.

**Note:** The examples below use the `treat-sim` model. If you haven't run it before, see [Using the example `treat-sim` model](treatsim.ipynb) for set-up and basic usage.

## Imports

In [1]:
# pylint: disable=missing-module-docstring
from treat_sim.model import Scenario, multiple_replications
from sim_tools.output_analysis import (
    confidence_interval_method, plotly_confidence_interval_method
)

## Confidence interval method

In this method, you first **run the simulation** for a set number of replications. Due to stochasticity, each will produce slightly different averages for each performance metric.

Once the runs are complete, you go step-by-step (for the first run, then the first two runs, the first three runs, and so on) calculating:

* **Cumulative mean**
* **Confidence interval** around that mean

As the number of replications included increases, you'll typically see the interval narrows. The required number of replications is the point where you feel results are **stable** - i.e. that doing more replications is unlikely to change your conclusions in a meaningful way.

You can decided this by setting a **desired precision**. Precision here means the **percentage deviation** of the confidence interval's half-width from the mean. For example, if the precision is set to `0.1`, it will identify the point where the half-width of the confidence interval is less than or equal to 10% of the mean.

## Example: Single performance metric

The function returns a **tuple** consisting of:

1. The minimum number of replications to achieve the desired precision.
2. A detailed DataFrame of statistics for each stage.

In [2]:
scenario = Scenario()
rep_results = multiple_replications(scenario, n_reps=150)

confint_result = confidence_interval_method(
    replications=rep_results["01a_triage_wait"],
    desired_precision=0.1
)

# View results
print(confint_result[0])
confint_result[1].head()

145


Unnamed: 0_level_0,Mean,Cumulative Mean,Standard Deviation,Lower Interval,Upper Interval,% deviation
replications,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,24.28,24.28,,,,
2,57.12,40.7,,,,
3,28.66,36.69,17.83,-7.61,80.98,1.21
4,24.8,33.72,15.72,8.69,58.74,0.74
5,17.68,30.51,15.39,11.4,49.62,0.63


## Visualise results

You can plot how the confidence interval narrows as you add more runs using the `plotly_confidence_interval_method` function.

In [3]:
plotly_confidence_interval_method(
    n_reps=confint_result[0],
    conf_ints=confint_result[1],
    metric_name="01a_triage_wait"
)

## Running on multiple performance metrics

You can check several outcomes at once. Just pass multiple columns to `confidence_interval_method`.

This will return a dictionary, with a tuple for each metric.

In [4]:
confint_multiple = confidence_interval_method(
    replications=rep_results[
        ["01a_triage_wait", "01b_triage_util", "02a_registration_wait"]
    ],
    desired_precision=0.1
)

# View output dictionary keys
print(confint_multiple.keys())

dict_keys(['01a_triage_wait', '01b_triage_util', '02a_registration_wait'])


In [5]:
# View results from one of the metrics
print(confint_multiple["02a_registration_wait"][0])
confint_multiple["02a_registration_wait"][1].head()

9


Unnamed: 0_level_0,Mean,Cumulative Mean,Standard Deviation,Lower Interval,Upper Interval,% deviation
replications,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,103.24,103.24,,,,
2,90.0,96.62,,,,
3,112.24,101.83,11.19,74.04,129.62,0.27
4,121.54,106.76,13.44,85.38,128.14,0.2
5,103.61,106.13,11.72,91.57,120.68,0.14
