Skip to content

acoache/RL-DynamicConvexRisk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning with Dynamic Convex Risk Measures

This Github repository regroups the Python code to run the actor-critic algorithm and replicate the experiments given in the paper Reinforcement Learning with Dynamic Convex Risk Measures by Anthony Coache and Sebastian Jaimungal. There is one folder for each set of experiments, respectively the statistical arbitrage, cliff walking and hedging with friction examples. There is also a Python notebook to showcase how to use our code and replicate some of the experiments.

For further details on the algorithm and theoretical aspects of the problem, please refer to our paper.

Thank you for your interest in my research work. If you have any additional enquiries, please reach out to myself at anthony.coache@mail.utoronto.ca.

Authors

Anthony Coache & Sebastian Jaimungal


All folders have the same structure, with the following files:

  • hyperparams.py
  • main.py
  • main_plot.py
  • actor_critic.py
  • envs.py
  • models.py
  • risk_measure.py
  • utils.py

hyperparams.py

This file contains functions to initialize and print all hyperparameters, both for the environment and the actor-critic algorithm.

main.py

This file contains the program to run the training phase. The first part concerns the importation of libraries and initialization of all parameters, either for the environment, neural networks or risk measure. Some notable parameters that need to be specified by the user in the hyperparams.py file are the numbers of epochs, learning rates, size of the neural networks and number of episodes/transitions among others. The next section is the training phase and its skeleton is given in the paper. It uses mostly functions from the actor_critic.py file. Finally, the models for the policy and value function are saved in a folder, along with diagnostic plots.

main_plot.py

This file contains the program to run the testing phase. The first part concerns the importation of libraries and initialization of all parameters. Note that parameters must be identical to the ones used in main.py. The next section evaluates the policy found by the algorithm. It runs several simulations using the best behavior found by the actor-critic algorithm. Finally it outputs graphics to assess the performance of the procedure, such as the preferred action in any possible state and the estimated distribution of the cost when following the best policy.

actor_critic.py

The whole algorithm is wrapped into a single class named ActorCriticPG, where input arguments specify which problem the agent faces. The user needs to give an environment, a (convex) risk measure, as well as two neural network structures that play the role of the value function and agent's policy. Each instance of that class has functions to select actions from the policy, whether at random or using the best behavior found thus far, and give the set of invalid actions. There is also a function to simulate (outer) episodes and (inner) transitions using the simulation upon simulation approach discussed in the paper. The update of the value function is wrapped in a function which takes as inputs the mini-batch size, number of epochs and characteristics of the value function neural network structure, such as the learning rate and the number of hidden nodes. Similarly, another function implements the update of the policy and takes as inputs the mini-batch size and number of epochs.

envs.py

This file contains the environment class for the RL problem, as well as functions to interact with it. It has the PyTorch and NumPy versions of the simulation engine.

models.py

Models are regrouped under this file with classes to build ANN structures using the PyTorch library.

risk_measure.py

This file has the class that creates an instance of a risk measure, with functions to compute the risk and calculate its gradient. Risk measures currently implemented are the expectation, the conditional value-at-risk (CVaR), the mean-semideviation, a penalized version of the CVaR, and a linear combination of the mean and CVaR. More specifically, we have

equation

equation

equation

equation

equation

equation

utils.py

This file contains some useful functions and variables, such as a function to create new directories and colors for the visualizations.


About

Python code to perform risk-sensitive Reinforcement Learning with dynamic convex risk measures

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published