# Post-Simulation Analysis Walk-through

This notebook provides a detailed walk-through for conducting post-simulation analysis on the output data from an `rEpiabm` simulation. We will use the `EpiEstim` R package to calculate the time-varying reproduction number ($R_t$) and then compare this with the $R_t$ value generated internally by the `rEpiabm` simulation.

This guide assumes you have already successfully run a simulation (e.g., by following the `toypop_example.ipynb` walk-through) and have the necessary output files in your `data/toy/simulation_outputs` directory.

## Step 1: Prerequisites

### Install Required R Packages

Before we begin the analysis, you need to ensure that you have all the necessary R packages installed. These packages are used for data manipulation, statistical estimation, and plotting.

**Instructions:**

1. Open your RStudio console.
2. Run the following commands to install the required packages:

```R
install.packages(c("EpiEstim", "ggplot2", "dplyr", "pracma", "readr", "zoo"))
```

<div style="margin: 1em 0; padding: 1em; border-left: 4px solid var(--jp-info-color0); background-color: var(--jp-layout-color2); color: var(--jp-ui-font-color1);">
<strong>Note:</strong> You only need to do this once. If you have already installed these packages, you can skip this step.
</div>

## Step 2: Calculate $R_t$ with EpiEstim

Our first analysis script, `epiestim_toy.r`, processes the raw simulation outputs to estimate the time-varying reproduction number ($R_t$). It calculates the daily incidence of new infections and the generation time distribution from the simulation data, which are the two key inputs for `EpiEstim`.

**Instruction:**

- Run the R script `epiestim_toy.r`. This will read the simulation outputs, perform the calculations, and save the results in a new sub-directory: `data/toy/simulation_outputs/epiestim`.

### Understanding the EpiEstim Outputs

After the script finishes, you will find several new files in the `epiestim` directory, including:
- **`Incidence_plot.png`**: A bar chart showing the number of new infections per day.
- **`Generation_plot.png`**: A distribution of the time between successive infections.
- **`epiestim_detailed_plot_np.png`**: A plot of the estimated $R_t$ over time, with a 95% confidence interval. This is the main output of this step.
- **`R_estimates_np.csv`**: The raw data for the $R_t$ plot.

## Step 3: Compare $R_t$ Estimates

The second script, `compare_r_numbers.r`, is designed to compare the $R_t$ we just calculated with `EpiEstim` against the $R_t$ that was calculated internally by the `rEpiabm` simulation. 

A key challenge is that `EpiEstim` calculates an **instantaneous** $R_t$ (who gets infected *today*), while `rEpiabm` tracks a **case** $R_t$ (who infected whom). The script performs a mathematical conversion to make them comparable.

**Instructions:**

1. Edit the file `compare_r_numbers.r` to ensure both directories point to the correct locations, they should read:
    - epiestim_dir <- "data/toy/simulation_outputs/epiestim"
    - epiabm_dir <- "data/toy/simulation_outputs"

2. Run `compare_r_numbers.r` to execute the comparison script. It will generate and save a final comparison plot.

## Step 4: Evaluate the Final Plot

The final output is the `Rt_comparison_plot.png` file, saved in the `epiestim` directory.

<div style="margin: 1em 0; padding: 1em; border-left: 4px solid var(--jp-warn-color0); background-color: var(--jp-layout-color2); color: var(--jp-ui-font-color1);">
<strong>Interpreting the Plot:</strong> You should see three lines:
    <ul>
        <li><b>Epiabm case R_t (orange, dotted)</b>: The ground truth from our simulation.</li>
        <li><b>Mean Instantaneous R (EpiEstim) (blue, solid)</b>: The initial estimate from EpiEstim.</li>
        <li><b>Case R (Converted) (red, dashed)</b>: The EpiEstim value after being converted to be comparable with the simulation's value.</li>
    </ul>
</div>

Ideally, the **Epiabm case R_t** and the **Case R (Converted)** lines should align closely. A good match indicates that the `EpiEstim` tool, a standard in real-world epidemiology, is able to accurately recover the underlying transmission dynamics of our simulation. This gives us confidence in the results of our `rEpiabm` model.