Skip to content

Latest commit

 

History

History
119 lines (94 loc) · 6.12 KB

README.md

File metadata and controls

119 lines (94 loc) · 6.12 KB

Penalty Shot Task

We create a platform to pit SOTA Deep Reinforcement Learning algorithms against each other on the Penalty Shot Kick task. The task involves two agents simulating an penalty shootout. Do check out our results here and the references here. This was part of a course project under Prof. Ashutosh Modi, CS698R, IITK - Deep Reinforcement Learning.

Table of Contents

Features

  • Rendering of the custom environment at each step
  • Fully configurable environments and policies
  • Async server to play with a policy manually

Back to TOC

How to get started

Install packages

The -e flag is included to make the project package editable

Login to wandb.ai to record your experimental runs

pip install -e .
pip install -e ./gym-env
wandb login

Back to TOC

Train and test a model

Use files in utils/config/ to control configuration of agent specific policy hyper-parameters and environment parameters

Example command to run that trains a puck and bar with PPO algorithm and uses a previously saved policy for each of the agents with 1 training environment and 2 test environment

python ./utils/train.py  --wandb-name "Name for Wandb Run" --training-num 1 --test-num 2 --puck ppo --bar ppo --load-puck-id both_ppo --load-bar-id both_ppo 

Back to TOC

To play as bar:

Open 3 terminals and run

python ./examples/server/start_server.py
python ./examples/server/agent_puck.py
python ./examples/server/agent_bar.py

Click Start and use the mouse slider to control the direction of the bar.

Back to TOC

Codebase

Game Environment

It consists of a puck and a bar with puck moving towards bar at constant horizontal speed. Both of them are controlled by separate agents. The goal of puck is to move past bar and reach final line while the goal of bar is to catch puck before it can reach the final line.

The environment has been developed using OpenAI Gym library which accepts two action parameters corresponding to puck and bar, and moves the game by one time step giving output a tuple of state, reward, completion state and extra information object. See code Back to TOC

Agents

  • lib-agents: It features trivial, value based and policy based algorithms including smurve, DQN, TD3, PPO and DDPG.
  • comm-agents: It implements the hardcoded approach for finding a baseline and pure exploration strategy. It also implements the mouse slider.
  • Also implements a TwoAgentPolicyWrapper to combine policies for the puck and the agent. See code Back to TOC

Utils

  • Contains training script and utility functions implementing wrappers
  • Stores policy and environment configuration information. See code Back to TOC

Async Communication

To support asynchronous inputs from agents, we have created a main server which controls the environment. The agents using client class to connect to the server and use its step function to give their action and receive corresponding result tuple. The server takes actions from the agents as input and synchronizes them and updates the environment by one time step. See code Back to TOC

Examples

  • Script for playing with the puck as a bar
  • A notebook demonstrating smurves See Code Back to TOC

Authors

Back to TOC

Want to Contribute?

We welcome everyone to report bugs, raise issues, add features etc. Drop a mail to the authors to discuss more.

Back to TOC

Literature References

  • McDonald, K.R., Broderick, W.F., Huettel, S.A. et al. Bayesian nonparametric models characterize instantaneous strategies in a competitive dynamic game. Nat Commun 10, 1808 (2019). https://doi.org/10.1038/s41467-019-09789-4
  • Iqbal SN, Yin L, Drucker CB, Kuang Q, Gariépy JF, et al. (2019) Latent goal models for dynamic strategic interaction. PLOS Computational Biology 15(3): e1006895. https://doi.org/10.1371/journal.pcbi.1006895
  • Kelsey R McDonald, John M Pearson, Scott A Huettel, Dorsolateral and dorsomedial prefrontal cortex track distinct properties of dynamic social behavior, Social Cognitive and Affective Neuroscience, Volume 15, Issue 4, April 2020, Pages 383–393, https://doi.org/10.1093/scan/nsaa053
  • He He, Jordan Boyd-Graber, Kevin Kwok and Hal Daumé III. Opponent Modeling in Deep Reinforcement Learning. ArXiv:1609.05559

Back to TOC

Acknowledgement

  • We thank Prof. Ashutosh Modi for providing us with this opportunity to explore and learn more about SOTA algorithms through a project.
  • We thank our mentor, Avisha Gaur for guiding us and clarifying our doubts related to the environment
  • We thank our TA, Gagesh Madaan for providing valuable inputs regarding various solution approaches.
  • We also thank the creators of Tianshou and OpenAI Gym library which forms a core part of our codebase
  • We thank the open source community for wonderful libraries for everything under the sun!

Back to TOC