# CPR appropriation baselines

This notebook contains actual Harvest trainings for the DQN baseline described in the original paper. The environment in use is a custom implementation of Harvest.

## Pre-requisites

The cells down below install and import the necessary libraries to successfully run the notebook examples.

In [None]:
import sys
sys.path.append('../')

In [None]:
%%capture
!pip install -r ../init/requirements.txt
!pip install ../src/gym_cpr_grid

In [None]:
import numpy as np
import gym
import ray
import matplotlib.pyplot as plt
from IPython import display

from src import rllib

%load_ext autoreload
%autoreload 2

## Utilities

The cell down below defines common variables to be used throughout the notebook.

In [None]:
n_agents = 1
grid_width = 25 
grid_height = 7
max_episodes = 1000
num_workers = 4
seed = 42
tagging_ability = False
gifting_mechanism = None
rllib_log_dir = "../rllib_logs/"
wandb_api_key = open("../wandb_api_key_file", "r").read().strip()
wandb_project = "cpr-appropriation"

In [None]:
ray.shutdown()
ray.init(local_mode=True)

## Random

This section shows a simple set of random agents sifting through the environment, as a way to show the general Gym workflow and the how rendering works.

In [None]:
env = gym.make(
    'gym_cpr_grid:CPRGridEnv-v0', 
    n_agents=n_agents, 
    grid_width=grid_width, 
    grid_height=grid_height,
    initial_resource_probability=0.2
)

In [None]:
observations = env.reset()
fig, ax, img = env.plot(env.render('rgb_array'))
for _ in range(env._max_episode_steps):
    display.display(plt.gcf())
    action_dict = {h: env.action_space.sample() for h in range(env.n_agents)}
    print(action_dict)
    observations, rewards, dones, infos = env.step(action_dict)
    print(infos)
    display.clear_output(wait=True)
    img.set_data(env.render(mode='rgb_array'))
env.close()

## DQN baseline

In this section we train the DQN baseline reported in the original paper, through RLlib's implementations, so as to understand whether or not our custom environment is implemented correctly.

In [None]:
experiment_analysis = rllib.rllib_baseline(
    "dqn",
    n_agents,
    grid_width,
    grid_height,
    wandb_project,
    wandb_api_key,
    rllib_log_dir,
    max_episodes,
    tagging_ability=tagging_ability,
    gifting_mechanism=gifting_mechanism,
    num_workers=num_workers,
    jupyter=False,
    seed=seed
)

## VPG baseline

In this section we train a Vanilla Policy Gradient model through RLlib's implementations, to see how it stacks up to the DQN baseline.

In [None]:
experiment_analysis = rllib.rllib_baseline(
    "vpg",
    n_agents,
    grid_width,
    grid_height,
    wandb_project,
    wandb_api_key,
    rllib_log_dir,
    max_episodes,
    tagging_ability=tagging_ability,
    gifting_mechanism=gifting_mechanism,
    num_workers=num_workers,
    jupyter=False,
    seed=seed
)