approx-model-or-approx-soln

Pretty darn good control: when are approximate solutions better than approximate models?

Existing methods for optimal control methods struggle to deal with the complexity commonly encountered in real-world systems, including dimensionality, process error, model bias and heterogeneity. Instead of tackling these complexities directly, researchers have typically sought to find exact optimal solutions to simplified models of the processes in question. When is the optimal solution to a very approximate, stylized model better than an approximate solution to a more accurate model? This question has largely gone unanswered owing to the difficulty of finding even approximate solutions in the case of complex models. Our approach draws on recent algorithmic and computational advances in deep reinforcement learning. These methods have hitherto focused on problems in games or robotic mechanics, which operate under precisely known rules. We demonstrate the ability for novel algorithms using deep neural networks to successfully approximate such solutions (the "policy function" or control rule) without knowing or ever attempting to infer a model for the process itself. This powerful new technique lets us finally begin to answer the question. We show that in many but not all cases, the optimal policy for a carefully chosen over-simplified model can still out-perform these novel algorithms trained to find approximate solutions to simulations of a realistically complex system. Comparing these two approaches can lead to insights in the importance of real-world features of observation and process error, model biases, and heterogeneity.

Code

All code is in the src directory. data is generated by running the code. The the main scripts are worflow.py, worflow_oneSp.py, and jiggle_worflow.py. The first two scripts produce the main result of the paper---tuning classical management strategies, training a DRL agent, and comparing the performance of the two. The latter script produces the stability result. That script might take a long time running.

To produce new data, the user may delete the contents of data/results_data and run the scripts above from the the src subdirectory. The options inside those scripts allow one to specify the dynamical model desired and the number of fisheries (i.e. the number of harvested species). An image of the agent trained this way will be saved at src/cache.

Detailed structure of the code:

Main scripts:

workflow.py, workflow_oneSp.py: Scripts that train a DRL agent, a constant mortality policy and a constant escapement policy. workflow.py is used for three species model cases (Models 2-4 in the manuscript), workflow_oneSp.py is used for the single species model (Model 1).
jiggle_workflow.py: Script to perform the stability analysis

Class files:

envs/oneSpecies.py, envs/twoThreeFishing.py: Classes RL environments (envs) for the single-species and three-species models respectively. The latter includes env classes for the single-fishery and two-fishery cases.
parameters.py: There are two classes in this file:
1. parameters() objects parametrize the three-species models (Models 2-4 in the manuscript)
2. parameters_oneSp() objects parametrize the single-species model (Model 1)

Function files:

envs/growth_fns.py: a collection of growth functions (i.e. dynamical recruitment models) that define the actual dynamical model used in policy optimization.
eval_util.py: Data generation. Generating simulated data under DRL and classical policies, interpolating policies, plotting results.
msy_fns.py: Functions used to optimize a constant mortality policy (i.e. find the MSY).
one_fishery_esc_fns.py: Functions used to optimize a constant escapement policy in the single-fishery cases
two_fishery_esc_fns.py: Functions used to optimize a constant escapement policy in the two-fishery cases
uncontrolled_fns.py: Functions used to generate and plot the natural dynamics of the system (with no harvest)
util.py: general miscellaneous utility functions

Supporting scripts:

delete_cache.py: deletes all DRL policies saved in the 'cache' subdirectory.
esc_heatmap.py: generates a heatmap of the constant escapement strategy for different values of costant escapement in the two fishery case.
esc_oneSp_performance.py: plots the performance of constant escapement strategies for several values of constant escapement in the single fishery case.
rl_vs_esc_t.py: plots the average difference between DRL and optimal constant escapement policy actions as a function of time (averaged over time-bins).

Quickstart

Clone this repository into RStudio using the 'new project' menu. When R starts up, follow the renv prompt to run renv::restore() to install necessary dependencies (for both python and R). Python scripts can be run in RStudio (interactively with run button or in batch with the 'source script' button) or any other standard environment (VSCode, bash terminal).

Manuscript

Manuscript source is in manuscript/manuscript.Rmd.

Name		Name	Last commit message	Last commit date
Latest commit History 413 Commits
data		data
manuscript		manuscript
renv		renv
src		src
.Rprofile		.Rprofile
.gitignore		.gitignore
README.md		README.md
approx-model-or-approx-soln.Rproj		approx-model-or-approx-soln.Rproj
renv.lock		renv.lock
requirements.txt		requirements.txt
tensorboard.R		tensorboard.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

manuscript

manuscript

renv

renv

src

src

.Rprofile

.Rprofile

.gitignore

.gitignore

README.md

README.md