# Epigeopop Simulation and Analysis Walk-through

This notebook provides a detailed guide for running an `rEpiabm` simulation using a realistic population generated by `EpiGeoPop`. It then walks through the post-simulation analysis, including calculating the time-varying reproduction number ($R_t$) with `EpiEstim` and comparing it to the simulation's internal calculations.

This guide assumes you have already followed the setup instructions in the `README.md` to install `rEpiabm` and its dependencies.

## Step 1: Configure and Run the Simulation

The first step is to run the main simulation using a pre-generated population file (e.g., for Andorra). The `simulation_epigeopop.r` script handles this process.

### Configuration

You can modify the simulation parameters directly within the `simulation_epigeopop.r` script. The most common parameters to change are:

- `country`: The name of the country/region you are simulating (e.g., `"Andorra"`). This determines which data folder to use.
- `simulation_duration`: The total number of days the simulation will run.
- `initial_infected`: The number of individuals who are infectious at the start of the simulation.

**Instructions:**

1. Open  and adjust the parameters as needed.
2. Run `simulation_epigeopop.r` to execute the script. This will generate several output files in `data/<your_country>/simulation_outputs/` (replacing <your_country> with your region of interest).

## Step 2: Run the EpiEstim Post-Simulation Analysis

Next, we will process the raw simulation output with the `epiestim_epigeopop.r` script. This script calculates the daily incidence and generation time distribution, then uses the `EpiEstim` package to estimate $R_t$.

**Instructions:**

Run 'epiestim_epigeopop.r' to execute the EpiEstim analysis. The results will be saved in `data/<your_country>/simulation_outputs/epiestim/`.

### Understanding the EpiEstim Outputs

After the script finishes, you will find several new files in the `epiestim` directory, including:
- **`Incidence_plot.png`**: A bar chart showing the number of new infections per day.
- **`Generation_plot.png`**: A plot showing the probability distribution of the time between successive infections.
- **`epiestim_detailed_plot_np.png`**: A plot of the estimated $R_t$ over time, with a 95% confidence interval. This is the main output of this step.
- **`R_estimates_np.csv`**: The raw data for the $R_t$ plot.


## Step 3: Compare $R_t$ Estimates

The second script, `compare_r_numbers.r`, is designed to compare the $R_t$ we just calculated with `EpiEstim` against the $R_t$ that was calculated internally by the `rEpiabm` simulation. 

A key challenge is that `EpiEstim` calculates an **instantaneous** $R_t$ (who gets infected *today*), while `rEpiabm` tracks a **case** $R_t$ (who infected whom). The script performs a mathematical conversion to make them comparable.

**Instructions:**

1. Edit the file `compare_r_numbers.r` to ensure both directories point to the correct locations, they should read (replacing <your_country> with your region of interest):
    - epiestim_dir <- "rEpiabm/data/<your_country>/simulation_outputs/epiestim"
    - epiabm_dir <- "rEpiabm/data/<your_country>/simulation_outputs"

2. Run `compare_r_numbers.r` to execute the comparison script. It will generate and save a final comparison plot.

## Step 4: Evaluate the Final Plot

The final, most important output is the `Rt_comparison_plot.png` file, which you can find in the `data/<your_country>/simulation_outputs/epiestim/` directory.

<div style="margin: 1em 0; padding: 1em; border-left: 4px solid var(--jp-warn-color0); background-color: var(--jp-layout-color2); color: var(--jp-ui-font-color1);">
<strong>Interpreting the Plot:</strong>
    <ul>
        <li>The <b>Epiabm case R_t (orange, dotted)</b> line represents the 'ground truth' from our simulation model.</li>
        <li>The <b>Case R (Converted) (red, dashed)</b> line is the result from the external EpiEstim package after being made comparable.</li>
    </ul>
</div>

A close alignment between these two lines validates our simulation, showing that it is consistent with the estimates produced by a standard, widely-used epidemiological tool.