We use Multi Agent Deep Deterministic Policy Gradient to train agents on our custom environment.
Algorithm | Environment |
In our setup, we take action as the control points for planning our trajectory using bezier curves.The trajectory followed by the agent is taken as observation.
The reward structure is as follows:
- -10 for any collision
- +1 for reaching the target.
git clone https://github.com/COPS-IITBHU/MultiAgent_Grid.git
cd MultiAgent_Grid
pip install requirements.txt
python train.py
python eval.py