# CAPTAIN tutorial – Conservation planning on empirical data

** - NOTE: The latest developments are currently available in the [dev branch of captain](https://github.com/captain-project/captain-project/tree/dev). - **

## Building the environment using Marxan-formatted CSV files.
The following examples show how to setup a CAPTAIN analysis to optimize the placement of protection units (PUs).
The empirical dataset analyzed here is from the paper by [Carrasco et al. 2020](https://link.springer.com/article/10.1007/s10531-020-01947-1) and includes the predicted distribution for ~1500 species of endemic trees in Madagascar across 22,000 potential PUs and a quantification of the disturbance or threat to species across PUs. The files are available [here](https://zenodo.org/api/files/c4663248-5dc2-4932-9d43-ec0c490595ae/empirical_data.zip) contain comma-separated values (CSV) and were formatted to be compatible with the program [Marxan](https://marxansolutions.org).

In CAPTAIN we use these files to initialize an environment, over which we can then apply a pre-trained model to predict areas of priority for conservation.
This is done using the `build_empirical_env()` function and passing 4 arguments:
1. the working directory where all files are stored
2. a CSV table containing species distribution data: this has three columns: a species identifier, a PU identifier, and the abundance of the species in the PU. For each PU there is one entry for each of the species found (or predicted through e.g. niche modeling) in it. In the example here we have only species presence/absence, so the abundance column will only have values of 0 and 1.
3. a CSV table with the cost of protection listed for each PU. This can be a relative cost (i.e. indicating the relative difference in cost among PUs) and by default is assumed to be proportional to the anthropogenic disturbance on the PU.
4. a CSV file with the coordinates of the mid-point of each PU. this is currently only used for plotting.

In [None]:
import os
os.chdir("path_to_captain_library")
import captain as cn

wd = 'empirical_data/'
puvsp_file = 'puvsp.csv'
pu_file = 'pu.csv'
pu_info_file = 'Planning_Units.csv'

# build env
env = cn.build_empirical_env(wd=wd,
                             puvsp_file=puvsp_file,
                             pu_file=pu_file,
                             pu_info_file=pu_info_file)


The function also automatically produces two captain-formatted files that allow for a quicker reloading of the environment. The environment can now be rapidly re-loaded if needed, using these captain-formatted files, as shown below:

In [None]:
# fast loading files
puvsp_file = 'puvsp.csv'
pu_file = 'pu.csv'
pu_info_file = 'Planning_Units.csv'

hist_file = 'hist.npy' # species distribution data
puid_file = 'pu_id.npy' # PU identifiers
spid_file = 'sp_id.npy' # Species identifiers

# build env
env = cn.build_empirical_env(wd=wd,
                             hist_file=hist_file,
                             puid_file=puid_file,
                             spid_file=spid_file,
                             pu_file=pu_file,
                             pu_info_file=pu_info_file)

## Running an optimized policy on the empirical environment.
We can now use the loaded environment to apply a pre-trained model and perform a conservation planning experiment and predict areas of highest conservation priority
The model was trained based on simulated datasets using the following settings

```python
cn.train_model(obsMode=1, # full species monitoring
               observePolicy=2, # recurrent monitoring, at-once protection
               start_protecting=3, # system is monitored for 3 steps before placing protection units
               rewardMode=0,  # objective: minimize species loss
               disturbance=4,
               n_nodes=[8, 0],
               budget=0.11,
               batchSize=6,
               steps=25,
               epochs=1000)
```

The policy run in with an empirical dataset can be constrained by a budget and guided by a conservation target defining the fraction of the species range that should be protected to consider the species as effectively protected. This protection target can be specified as a single value (e.g. `protection_target=0.1` indicates that 10% of the range should be protected for each species) or as a vector of values (of length equal to the number of species in the system) to apply a species-specific conservation target.

The budget is specified as a fraction of the total cost of all PUs. For instance, with `budget=0.1` a maximum of 10% of all PUs can be protected (fewer PUs will be protected if PUs with higher cost are selected).

The `update_features` specifies how frequently the features should be extracted from the system. Ideally, this is done at each step (`update_features=1`) but, to speed up the run, approximate solutions can be obtained performing this step e.g. every 10 or 100 steps.

The seed specifies how to initialize the random number generators making the analysis fully reproducible. In particular, each analysis assigns random sensitivities to the species in the system, which affect their probability of occurring in areas with anthropogenic disturbance. If an empirical estimate of species sensitivity exists, this can be used in setting up the system with the function `build_empirical_env`, using the argument `species_sensitivities=a`, where `a` is an array of length equal to the number of species and with values ranging from 0 (lowest sensitivity) to 1 (highest sensitivity). 

In [None]:
trained_model_file = 'trained_models/full_monitor_protect_at_once_d4_n8-0.log'

output_file = 'output'

env, out_file = cn.run_policy_empirical(env,
                                        trained_model=trained_model_file,
                                        obsMode=1, # full species monitoring
                                        observePolicy=2, # recurrent monitoring, at-once protection
                                        n_nodes=[8, 0],
                                        budget=0.1,
                                        protection_target=0.1,
                                        stop_at_end_budget=True,
                                        seed=4321,
                                        update_features=10,
                                        wd=wd,
                                        result_file=output_file)

The result of this analysis is stored in a pickle file containing the environment at the last step of the policy implementation. The function returns the updated empirical environment and the name of the pickle file storing it. 

The `run_policy_empirical` function also implements an option to run multiple replicates in which the system is initialized based on different random seeds.  These can be used to average the results across replicates to account for uncertainties around species ranges and species-specific sensitivities.

In [None]:
env_list, out_files = cn.run_policy_empirical(env,
                                              trained_model=trained_model_file,
                                              obsMode=1, # full species monitoring
                                              observePolicy=2, # recurrent monitoring, at-once protection
                                              n_nodes=[8, 0],
                                              budget=0.1,
                                              protection_target=0.1,
                                              stop_at_end_budget=True,
                                              seed=4321,
                                              update_features=100,
                                              result_file=output_file,
                                              replicates=6)

This command runs 6 simulations automatically applying different random seeds in each replicate. 

## Setting conservation target using the ConservationTarget class

The options below are currently available in the [dev branch of captain](https://github.com/captain-project/captain-project/tree/dev). After setting up the environment as shown above create an object of class ConservationTarget. One of the options is to specify the fraction of species range to be protected. This can be either a single number (between 0 and 1) in which case the same fraction is applied to all species (10% in the example below), or an array of values specifying the protection range for each species. 

In [None]:
protect_fraction = 0.1
conservation_target = cn.FractionConservationTarget(protect_fraction=protect_fraction)

# plot the fraction of species range targeted for protection as a function of species range (in this case constant)
cn.plot_target(conservation_target,
               env.bioDivGrid.geoRangePerSpecies())

Another available option is to make the target a function of species range size, specifically aiminig to protect a larger fraction of the range in species with small range and a smaller fraction in widespread species. This can be achieved using the RangeConservationTarget class to specify the minimum fraction of targeted range (applied to species with the largest range; flag `min_fr`) and the maximum fraction (applied to species with the smallest range; flag `max_fr`). The target can be further truncated to a maximum number of protection units regardless of the species range size (flag `max_range`).  

In [None]:
conservation_target = cn.RangeConservationTarget(min_fr=0.01, max_fr=1, max_range=250)

# plot the fraction of species range targeted for protection as a function of species range
cn.plot_target(conservation_target,
               env.bioDivGrid.geoRangePerSpecies())

The conservation target can be applied when running an optimized conservation policy. 

In [None]:
trained_model_file ='trained_models/full_monitor_protect_at_once_d4_n8-0.log'

env_list, out_files = cn.run_policy_empirical(env,
                                              trained_model=trained_model_file,
                                              obsMode=1, # full species monitoring
                                              observePolicy=2, # recurrent monitoring, at-once protection
                                              n_nodes=[8, 0],
                                              budget=0.1,
                                              conservation_target=conservation_target, # apply conservation target
                                              stop_at_end_budget=True,
                                              seed=4321,
                                              update_features=100,
                                              result_file='output',
                                              replicates=1)

## Summarizing the results 

After running the policy (one or multiple replicates), the results can be summarized using the `summarize_policy_empirical` function. This returns a Pandas DataFrame with the protection unit ID (`PUID`), the geographic coordinates if available, and the Priority, i.e. whether the PU was selected for protection and, in the case of multiple replicates the frequency at which is was selected. 

In [None]:
summary_res = cn.summarize_policy_empirical(env_list)


```
>>> summary_res

Out[5]: 
          PUID  Longitude   Latitude  Priority
0          2.0  45.157739 -25.581437       0.0
1          3.0  45.203964 -25.581787       0.2
2          4.0  45.252969 -25.576135       0.0
3         13.0  45.155287 -25.538333       0.0
4         14.0  45.204793 -25.538487       0.1
        ...        ...        ...       ...
22389  24453.0  49.319646 -12.084831       0.0
22390  24457.0  49.228681 -12.041302       0.0
22391  24458.0  49.273839 -12.039891       0.0
22392  24459.0  49.308632 -12.041443       0.0
22393  24462.0  49.273049 -11.994844       0.1

[22394 rows x 4 columns]

```