# Examples for how to use our ecological-RL API

Our approach seeks to minimize the amount of computational information that the user
needs to provide in order to get an RL algorithm up and running on their population
dynamics control problem.

## 1. Using ray RLLib to train

The class `ray_trainer_api.ray_trainer` may be used for defining, tuning, and training an agent using the ray RLLib framework.

In [1]:
# necessary installations for our package:

#! pip install ray[rllib]
#! pip install gymnasium
#! pip install numpy
#! pip install pandas
#! pip install scipy

import numpy as np

from ray_trainer_api import ray_trainer
from dyn_fns import threeSp_1


  from .autonotebook import tqdm as notebook_tqdm
2023-08-22 12:50:36,635	INFO util.py:159 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
2023-08-22 12:50:38,641	INFO util.py:159 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.


ModuleNotFoundError: No module named 'tree'

### Ecological input

The cell below translates the ecological data defining the control problem to the format that our classes use.

The control problem is taken from the [rl-minicourse](https://github.com/cboettig/rl-minicourse) repository (found in the `challenge.ipynb` notebook).

The `metadata` dictionary encapsulates most of the information of the control problem, except for the actual dynamics of the system. `dyn_fn` encapsulates the dynamics of the system (note that the number of arguments of this function must match `metadata['n_sp']`).

In [10]:
metadata = {
	#
	# structure of ctrl problem
	'name': 'minicourse_challenge', 
	'n_sp':  3,
	'n_act': 2,
	'_harvested_sp': [0,1],
	#
	# about episodes
	'init_pop': np.float32([0.5, 0.5, 0.2]),
	'reset_sigma': 0.01,
	'tmax': 800,
	#
	# about dynamics / control
    'extinct_thresh': 0.05,
    'penalty_fn': lambda t: - 800 / (t+1),
	'var_bound': 4,
	'_costs': np.zeros(2, dtype=np.float32),
	'_prices': np.ones(2, dtype=np.float32),
}

params = {
	"r_x": np.float32(0.13),
	"r_y": np.float32(0.2),
	"K": np.float32(1),
	"beta": np.float32(.1),
	"v0":  np.float32(0.1),
	"D": np.float32(0.7),
	"tau_yx": np.float32(0),
	"tau_xy": np.float32(0),
	"alpha": np.float32(.3), 
	"dH": np.float32(0.03),
	"sigma_x": np.float32(0.05),
	"sigma_y": np.float32(0.05),
	"sigma_z": np.float32(0.05),
}

def dyn_fn(X, Y, Z):
	global params
	p = params
	#
	return np.float32([
		X + (p["r_x"] * X * (1 - (X + p["tau_xy"] * Y) / p["K"])
            - (1 - p["D"]) * p["beta"] * Z * (X**2) / (p["v0"]**2 + X**2)
            + p["sigma_x"] * X * np.random.normal()from base_env import ray_eco_env

env = ray_eco_env(config={'metadata': metadata,'dyn_fn': dyn_fn})
            ),
		Y + (p["r_y"] * Y * (1 - (Y + p["tau_yx"]* X ) / p["K"] )
				- p["D"] * p["beta"] * Z * (Y**2) / (p["v0"]**2 + Y**2)
				+ p["sigma_y"] * Y * np.random.normal()
				),
		Z + p["alpha"] * p["beta"] * Z * (
				(1-p["D"]) * (X**2) / (p["v0"]**2 + X**2)
				+ p["D"] * (Y**2) / (p["v0"]**2 + Y**2)
				) - p["dH"] * Z +  p["sigma_z"] * Z  * np.random.normal()
	])

### Training

With the previous setup, we may define our trainer and train it as shown below.

In [11]:
RT = ray_trainer(
	algo_name="ppo", 
	config={
        'metadata': metadata,
        'dyn_fn': dyn_fn,
    },
)
agent = RT.train(iterations=300)



iteration nr. 1

In [None]:
from base_env import ray_eco_env

env = ray_eco_env(config={'metadata': metadata,'dyn_fn': dyn_fn})