# Experiments Notebook
This notebook contains the calls needed to replicate the experiments run in this project.


### Reproduction
1. Select the Benchmark and tasks by defining a set containing the benchmark (only one at a time) and a set containing all tasks to run. For example: 
    ```
    DEFAULT_DATASETS = {"atari100k"}
    ATARI_TASKS = {"atari100k_krull", "atari100k_battle_zone", "atari100k_boxing"}
    ```
    Pass them to the `run_experiment` function for `datasets` and `tasks` respectively.
2. The configurations defined in `presets.py` will override the `configs.yaml`. Make sure they are as desired.
3. Run the experiment using the following command:

In [None]:
python experiments/experiment_definitions.py run_standard_dreamer --name "DreamerV3 Baseline" --description "DreamerV3 standard configuration run" --num_seeds 2

### Optimized Replay Buffer
Follow steps 1-3 from the previous section. To activate the prioritized replay buffer, `replay_context` has to be 1. The remaining important configurations we used are listed below:
* `"replay.fracs.uniform"`: `0.0`
* `"replay.fracs.priority"`: `1.0`
* `"replay.fracs.recency"`: `0.0`
* `"replay.prio.exponent"`: `0.8`
* `"replay.prio.maxfrac"`: `0.5`
* `"replay.prio.initial"`: `1.0`
* `"replay.prio.zero_on_sample"`: `False`

<br><br>
When setup, run:

In [None]:
python experiments/experiment_definitions.py run_replay_buffer_experiment --name "DreamerV3 Prioritized Replay Buffer" --description "DreamerV3 optimized replay buffer configuration run" --num_seeds 2

### Latent Reward Disagreement (Exp. Decay)
Follow steps 1-3 from the previous section. To activate the latent reward disagreement, set `agent.use_intrinsic` to `True` and `agent.intrinsic.scheduling_strategy` to `"exp_decay"` for exponential decay scheduling. The remaining important configurations we used for our experiments are listed below:
* `"agent.intrinsic.learn_strategy"`: `"joint_mlp"` > Other options are ema and perturbed_starts
* `"agent.intrinsic.exploration_type"`: `"reward_variance"` > Other options are state_disagreement
* `"agent.intrinsic.reward_type"`: `"disagreement"` > Other options include prediction_error and max_disagreement
* `"agent.intrinsic.scheduling_strategy"`: `"exp_decay"`

<br><br>
When setup, run:

In [None]:
python experiments/experiment_definitions.py run_latent_disagreement_experiment_exp_decay --name "DreamerV3 Latent Reward Disagreement with exponential decay scheduling" --description "DreamerV3 guided by latent reward disagreement with exponential decay scheduling" --num_seeds 2

### Latent Reward Disagreement (Exponential Moving Average Slope)
Follow steps 1-3 from the previous section. To activate the latent reward disagreement, set `agent.use_intrinsic` to `True` and `agent.intrinsic.scheduling_strategy` to `"slope_ema"` for EMA Slope scheduling. The remaining important configurations we used for our experiments are listed below:
* `"agent.intrinsic.learn_strategy"`: `"joint_mlp"` > Other options are ema and perturbed_starts
* `"agent.intrinsic.exploration_type"`: `"reward_variance"` > Other options are state_disagreement
* `"agent.intrinsic.reward_type"`: `"disagreement"` > Other options include prediction_error and max_disagreement
* `"agent.intrinsic.scheduling_strategy"`: `"slope_ema"`

<br><br>
When setup, run:

In [None]:
python experiments/experiment_definitions.py run_latent_disagreement_experiment_ema --name "DreamerV3 Latent Reward Disagreement with EMA slope scheduling" --description "DreamerV3 guided by latent reward disagreement with EMA slope scheduling" --num_seeds 2

### Results
The results are logged in the logdir. For plotting the results, please refer to the readme.

## Individual Contributions
Most of the ideation behind our extensions was conducted in brainstorming sessions that all team members attended. Even if it was not their main contribution, all team member contributed to all parts of this work. Parts of the implementation were done in peer-coding sessions.
- Lukas Bierling: Major efforts on the implementation of all extensions and its variants. Collaboration on the ideation and interpretation of results. Coordinated workstreams and repository use. General collaboration on ideation & implementation as indicated above.
- Davide Paserio: Design & parts of the Prioritized Replay Buffer implementation, contribution to the implementation of the latent reward disagreement. General collaboration on ideation & implementation as indicated above.
- Jan Henrik Bertrand: Desgin & parts of the Latent Reward Disagreement implementation. Design & Implementation of the experimental framework. Running the experiments. General collaboration on ideation & implementation as indicated above.
- Kiki van Gerwen: Design of the custom plotting tool, contribution to the implementation of the latent reward disagreement. General collaboration on ideation & implementation as indicated above.