To implement a new agent and/or model one should just need to create a new file as outlined below with a couple functions implemented. Most things like a general training loop and collecting statistics during training are taken care of by agent_runner.py or agent.py. Hopefully this allows us to just implement model/agent specific things and not have to worry about the infrastructure.
- Benchmark numerous agent types across discrete and continuous games
- Models considered: -- DQN Models:
- "Vanilla" DQN
- Dueling DQN
- Dueling Double DQN
-- Policy Gradient Models:
- MC (REINFORCE)
- PPO
- "Vanilla"
Tasks:
- Create a new agent file in the agents directory
- Create your optimizer and assign to self.optimizer
- Create your loss function and assign to self.criterion
- Perform any other agent specific items needed
- EX: In DQN you would create your target model as seen in sample_agent.py here
- Note that epsilon and test are passed as arguments here
- The buffer already exists for your agent so you could just use copy the line in the sample_agent.py unless you have specific functionality to perform
- Again, you likely will just need to copy the method from sample_agent.py
- This called at each step within an episode
- Return True if the agent is able to be trained
- Return False if the agent is not able to be trained
- EX: For DQN, you would return False until you have enough entries in your replay buffer as seen in the sample_agent
-
This is called at each step of the training loop if can_train is True
-
Pseudocode from agent_runner.py:
for episode:
for step:
train() <--- This method
- The current tuple for the step is provided and any training should be performed here
- This is where you calculate loss and perform gradient descent
- EX: In sample_agent you will see this is where we use the replay buffer etc...
- NOTE: The loss should be returned at the end of this function
- This really just involves creating a normal torch model in the models directory
- See sample_model.py for an example
python main.py --list
This remains largely unchanged from Project 3 though there are some required arguments and lots more optional arguments
python main.py --train_dqn --agent sample_agent.SampleAgent --model sample_model.SampleModel --run_name training_run
After a run is executed a directory "archive" is created with data from each run performed. The directory name is made from whatever argument is passed in for run_name with the date appended to the end.
This uses the test_dqn argument as well as arguments to indicate which save model and optimizer to use
python main.py --test_dqn --agent sample_agent.SampleAgent --model sample_model.SampleModel --run_name test_run --model_path ./archive/training_run/my_model.pth --optimizer_path ./archive/training/my_optimizer.pth