# Multiple Release Sites AutoEmulate

This is a study looking at how to handle multiple release sites with the simulation.
In each simulation two coordinates are randomly selected as the release sites.
These then become parameters to the AutoEmulate function.

## Generating the Data

To generate the data we modify the data preparation scripts to randomly select two release sites from the available locations.
This is set up in the `data/generated/multi_release/multi_release_config.yaml` configuration file.

In order to run the data preparation with this configuration, use the following command:

```bash
export WORKERS_FOR_MOZZIE=12
python py_script/generate/build_param_files.py data/generated/multi_release/multi_release_config.yaml
python py_script/generate/build_coord_files.py data/generated/multi_release/multi_release_config.yaml
python py_script/generate/pl_run_full_set.py data/generated/multi_release/multi_release_config.yaml
python py_script/data_prep/load_state_site_data.py data/generated/multi_release/multi_release_config.yaml 460
```

This will generate for two release sites at 460 time steps which roughly corresponds to one year after the release of the gene drive mosquitoes.
This as default will run 100 simulations but I would recommend increasing this by changing the `num_samples` parameter in the configuration file.


In [None]:
import pandas as pd
from autoemulate import AutoEmulate
from autoemulate.transforms import PCATransform
from torch import Tensor

from mozzie.parsing import aggregate_mosquito_data, cast_back_data
from mozzie.visualise import plot_map_scatter


In [None]:
X_data = pd.read_csv(
    "../data/generated/multi_release/processed_site_state_460/X_train.csv"
).values
y_data = pd.read_csv(
    "../data/generated/multi_release/processed_site_state_460/y_train.csv"
).values

X_test = pd.read_csv(
    "../data/generated/multi_release/processed_site_state_460/X_test.csv"
).values
y_test = pd.read_csv(
    "../data/generated/multi_release/processed_site_state_460/y_test.csv"
).values

coords = pd.read_csv(
    "../data/generated/multi_release/coords/coords_1000.csv", header=0, sep="\t",
)[['x', 'y']].values

In [None]:
plot_map_scatter(
    aggregate_mosquito_data(cast_back_data(y_data[1]), "total_drive"),
    coords,
    title="Example of a single run of the model"
)


In [None]:
em = AutoEmulate(
    X_data,
    y_data,
    models=["GaussianProcessRBF"],
    y_transforms_list=[
        [PCATransform(n_components=10)],
        [PCATransform(n_components=20)],
        [PCATransform(n_components=40)]
    ],
)


In [None]:
em.summarise()


## Making Predictions

After fitting the AutoEmulate model, you can make predictions on the test data.


In [None]:
best = em.best_result()

y_predict = best.model.predict(Tensor(X_test)).mean

In [None]:
view_idx = 0

plot_map_scatter(
    aggregate_mosquito_data(cast_back_data(y_predict[view_idx]), "total_drive"),
    coords,
    title="Example of a single run of the emulator prediction"
)

In [None]:
plot_map_scatter(
    aggregate_mosquito_data(cast_back_data(y_test[view_idx]), "total_drive"),
    coords,
    title="The actual data for the same run of the simulation"
)