# Centre Release Model

This is a simple study looking at the fitness and dispersal rate of a population of mosquitoes.
In this model, we release a certain number of mosquitoes at the centre of a defined area.
It will see how the population changes over time and will try to model the overall effects.

In [None]:
import pandas as pd
from autoemulate import AutoEmulate
from autoemulate.transforms import PCATransform
from torch import Tensor

from mozzie.parsing import aggregate_mosquito_data, cast_back_data
from mozzie.visualise import plot_map_scatter, plot_total_data

## Generating the Data

This notebook is looking at the centre release model, where a number of mosquitoes are released at the centre of a defined area.

This has an example configuration file that sets up the model parameters as well as an associated coordinates file.
`data/generated/centre_release/centre_release_config.yaml`

To generate the data, we recommend increasing the `num_samples` parameter to get a better coverage of the parameter space.
The value is set at 25 but will need to be much more to build an effective model.

The following commands will generate the data for the model:

```bash
export WORKERS_FOR_MOZZIE=12
python py_script/generate/build_param_files.py data/generated/centre_release/centre_release_config.yaml
python py_script/generate/pl_run_full_set.py data/generated/centre_release/centre_release_config.yaml
python py_script/data_prep/load_total_data.py data/generated/centre_release/centre_release_config.yaml
python py_script/data_prep/load_state_data.py data/generated/centre_release/centre_release_config.yaml 460
```


## Modelling the Total Population

This looks at modelling the total population over time, ignoring local distribution effects.

It starts by setting up a simple emulator to examine the already collected data.

### Getting the Total Data

To begin with we load in the data, and visualise a single run of the model.

In [None]:
X_total_data = pd.read_csv(
    "../data/generated/centre_release/processed_total/X_train.csv"
).values
y_total_data = pd.read_csv(
    "../data/generated/centre_release/processed_total/y_train.csv"
).values

X_total_test = pd.read_csv(
    "../data/generated/centre_release/processed_total/X_test.csv"
).values
y_total_test = pd.read_csv(
    "../data/generated/centre_release/processed_total/y_test.csv"
).values

In [None]:
plot_total_data(
    cast_back_data(y_total_data[0]),
    title="Example of a single run of the model"
)

### Running the Total AutoEmulate model

We then run the total AutoEmulate model on the data.
For simplicity, we just use a Gaussian Process with an RBF kernel and a PCA transform on the output data to reduce the dimensionality.
We will consider a range of PCA components to see how this affects the model performance.
This can be extended to consider other models and transforms as required.

In [None]:
em_total = AutoEmulate(
    X_total_data,
    y_total_data,
    models=["GaussianProcessRBF"],
    y_transforms_list=[
        [PCATransform(n_components=10)],
        [PCATransform(n_components=20)],
        [PCATransform(n_components=40)]
    ],
)


In [None]:
em_total.summarise()

### Making Predictions for the Total Data

After fitting the AutoEmulate model, you can make predictions on the test data.

This involves using the best model found during the fitting process to predict the outputs for the test inputs.
This is almost certainly going to be the one with the largest PCA space.

The predicted outputs can then be compared to the actual test outputs to evaluate the model's performance.

In [None]:
best_total = em_total.best_result()

y_total_predict = best_total.model.predict(Tensor(X_total_test)).mean

In [None]:
view_idx = 0

plot_total_data(
    cast_back_data(y_total_predict[view_idx]),
    title="Example of a single run of the emulator prediction"
)

In [None]:
plot_total_data(
    cast_back_data(y_total_test[view_idx]),
    title="The actual data for the same run of the simulation"
)

## Spatial Predictions

Finally, we can look at the spatial distribution of the mosquitoes for a given run of the simulation.
This looks at a certain time point and compares the predicted and actual distributions.

For this we are looking at 460 days into the simulation, which corresponds to around one year after the initial release of mosquitoes.

As with the total the first step is to load in the spatial data for the model runs.

In [None]:
X_state_data = pd.read_csv(
    "../data/generated/centre_release/processed_state_460/X_train.csv"
).values
y_state_data = pd.read_csv(
    "../data/generated/centre_release/processed_state_460/y_train.csv"
).values

X_state_test = pd.read_csv(
    "../data/generated/centre_release/processed_state_460/X_test.csv"
).values
y_state_test = pd.read_csv(
    "../data/generated/centre_release/processed_state_460/y_test.csv"
).values

coords = pd.read_csv(
    "../data/generated/centre_release/coords.csv", header=0, sep="\t",
)[['x', 'y']].values

In [None]:
plot_map_scatter(
    aggregate_mosquito_data(cast_back_data(y_state_data[1]), "total_drive"),
    coords,
    title="Example of a single run of the model"
)

In [None]:
em_state = AutoEmulate(
    X_state_data,
    y_state_data,
    models=["GaussianProcessRBF"],
    y_transforms_list=[
        [PCATransform(n_components=10)],
        [PCATransform(n_components=20)],
        [PCATransform(n_components=40)]
    ],
)

In [None]:
em_state.summarise()

### Spatial Predictions

It is then possible to run the AutoEmulate model on the spatial data in a similar manner to the total data.

In [None]:
best = em_state.best_result()

y_state_predict = best.model.predict(Tensor(X_state_test)).mean

In [None]:
view_state_idx = 0

plot_map_scatter(
    aggregate_mosquito_data(
        cast_back_data(y_state_predict[view_state_idx]), "total_drive"
    ),
    coords,
    title="Example of a single run of the emulator prediction"
)

In [None]:
plot_map_scatter(
    aggregate_mosquito_data(
        cast_back_data(y_state_test[view_state_idx]), "total_drive"
    ),
    coords,
    title="The actual data for the same run of the simulation"
)