Skip to content

Aims to analyze if the method called "Sample Injection for Reinforcement Learning", in general can be helpful for different reinforcement learning algorithms and evironments to achieve better and/or faster results.

License

Notifications You must be signed in to change notification settings

Johannes0Horn/RL_sample_injection

Repository files navigation

Sample Injection for Reinforcement Learning evaluation

This repository aims to analyse if the method called "Sample Injection for Reinforcement Learning" proposed in the paper "Towards generating complex programs represented as node-trees with reinforcement learning", in general can be helpful for different reinforcement learning algorithms and evironments to achieve better and/or faster results.

Done:

Test sample injection on a state of the art reinforcement algorithm in an easy environment:

To do so, the best performing algorithm(TD3) according to the OpenAI gym Leaderboard for the OpenAI gym Environment "Pendulum-v0" implemented by Kanishk Navale in a Multi Agent-manner, was bugfixed and modified: Cooperative-Deep-RL-Multi-Agents.

To evaluate sample injection vs. the standard learning approach, 3 experiments were run.

Preparation:

Train a model from scratch with 1000 Episodes and create samples for almost perfect Episode execution:

  • in MultiAgentProfiling/train.py set:
    • use_predefined_actions_prob = 0
    • load_predefined_actions = False
    • preinject_episodes = 0
    • n_games = 1000
  • run python MultiAgentProfiling/train to train the model.
  • Create samples from the trained model:
    • in test_agents.py set:
      • save_actions = True
      • use_predefined_actions = False
    • run python test_agents.py to in infere the model and save action samples.
    • (optional: verify good performance of predefined actions):
      • in test_agents.py set:
        • use_predefined_actions = True
        • save_actions = False
        • uncomment line 91: # env.render() to view the actions
      • run python test_agents.py again.

1. Train the model for 300 Episodes without sample injection from scratch:

- in MultiAgentProfiling/train.py set:
    - use_predefined_actions_prob = 0
    - load_predefined_actions = False
    - preinject_episodes = 0
- run `python MultiAgentProfiling/train.py` to train the model.
- Create a new Folder in MultiAgentProfiling/data/ called *your experiment name1* 
- Move the the created train logs from MultiAgentProfiling/data/ to MultiAgentProfiling/data/*your experiment name1* 

2. Train the model for 300 Episodes with a steady probability of 10% per Episode to be sample injected

- in MultiAgentProfiling/train.py set:
    - use_predefined_actions_prob = 0.1
    - load_predefined_actions = True
    - preinject_episodes = 0
- run `python MultiAgentProfiling/train.py` to train the model.
- Create a new Folder in MultiAgentProfiling/data/ called *your experiment name2* 
- Move the the created train logs from MultiAgentProfiling/data/ to MultiAgentProfiling/data/*your experiment name2* 

3. Train the model for 20 Episodes with a steady probability of 50% per Episode to be sample injected. Then reduce the probability of sample injecion to 10% until 300 total Episodes of training are reached.

- in MultiAgentProfiling/train.py set:
    - use_predefined_actions_prob = 0
    - load_predefined_actions = False
    - preinject_episodes = 0
    - preinject_episodes = 20
    - preinject_predefined_prob = 0.5
- run `python MultiAgentProfiling/train.py` to train the model.
- Create a new Folder in MultiAgentProfiling/data/ called *your experiment name3* 
- Move the the created train logs from MultiAgentProfiling/data/ to MultiAgentProfiling/data/*your experiment name3* 

Results:

In MultiAgentProfiling/profile set:

  • experiment1 = your experiment name1
  • experiment2 = your experiment name2
  • experiment3 = your experiment name3

run python MultiAgentProfiling/profile to create the plot.

Experiments Training Profile of three Agents results averaged

While all models in the end achieve the same training performance, one can tell that the models trained with sample injection converge faster in the beginning until ~ Episode 110. Injected episodes are not taken into account for this result plot. This looks promising for environments which are harder to solve with a bigger action and observation space, so that algorithms which are "sample injected" might find high rewarding actions faster.

TODO:

  • Benchmark sample injection vs supervised pre-training.
  • Test sample injection on a state of the art reinforcement algorithm in a more difficult environment like Ant-v2
  • Apply an actor-critic algorithm on top, to decide per episode if to use sample injection

About

Aims to analyze if the method called "Sample Injection for Reinforcement Learning", in general can be helpful for different reinforcement learning algorithms and evironments to achieve better and/or faster results.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages