The configuration for this run is listed at the end. The training process took about 35 minutes.
To get started with this project, follow these steps:
-
Clone the Repository:
git clone https://github.com/ProfessorNova/PPO-Car.git cd PPO-Car
-
Set Up Python Environment: Make sure you have Python installed (tested with Python 3.10.11).
-
Install Dependencies: Run the following command to install the required packages:
pip install -r requirements.txt
For proper PyTorch installation, visit pytorch.org and follow the instructions based on your system configuration.
-
Train the Model: To start training the model, run:
python train.py --run-name "my_run"
To train using a GPU, add the
--cuda
flag:python train.py --run-name "my_run" --cuda
-
A window will open where you can select the track you want to train on. (You can also create your own track with the
track_editor.py
script. More on that below) -
Monitor Training Progress: You can monitor the training progress by viewing the videos in the
videos
folder or by looking at the graphs in TensorBoard (you might need to install tensorboard):tensorboard --logdir "logs"
This environment simulates a simple 2D car driving on a track. The track layout is defined by a JSON file. The objective is for the car to navigate the track, passing through reward gates while avoiding walls. The car has adjustable velocity and can make sharp turns.
The action space is a Discrete(9)
space with the following actions:
0
: Move forward1
: Move backward2
: Turn left3
: Turn right4
: Move forward-left5
: Move forward-right6
: Move backward-left7
: Move backward-right8
: Do nothing (reduces velocity)
The observation space is a Box(6 + num_rays,)
with the following features:
0
: Normalized x position (range: 0 to 1)1
: Normalized y position (range: 0 to 1)2
: Normalized x velocity (range: -1 to 1)3
: Normalized y velocity (range: -1 to 1)4
: Cosine of the car’s angle (range: -1 to 1)5
: Sine of the car’s angle (range: -1 to 1)6
to6 + num_rays
: Distance to the nearest wall for each ray
- +0.01: For taking a forward action.
- +1.0: For passing through a reward gate.
- -3.0: For hitting a wall.
- +10.0: For completing a lap.
The car starts at a predefined position and direction, configurable in the track JSON file.
An episode ends if the car hits a wall or if the maximum time step count (1000) is reached.
You can set the path to the track JSON file in the environment’s reset function. This is demonstrated in the train.py
script.
To create a custom track, follow these steps:
1. Run the Track Editor Script:
Execute the following command in your terminal to launch the track editor:
python track_editor.py
This opens a window where you can draw the track layout.
2. Draw the Outer Border:
- Click within the window to place points and define the outer border of the track.
- If you make a mistake, press
c
to clear the entire track and start over. - Once satisfied with the outer border, press
n
to close the loop and move to the next step.
3. Draw the Inner Border:
- Follow the same process to draw the inner border of the track.
- Press
n
once you've completed the inner border to proceed.
4. Place Reward Gates:
- Place the reward gates along the track. The first gate serves as the finish line.
- Ensure the gates are placed in the order they should be passed by the car.
- Press
n
after placing all the gates.
5. Set the Start Position and Direction:
- Click to place the car's starting position and define its initial direction.
- When ready, press
s
to save the track.
For a visual guide, refer to this GIF:
The default hyperparameters used in training are defined in the parse_args()
function inside the train.py
script. You can also list them by running the following command:
python train.py --help
If your system has limited RAM, consider lowering the n_envs parameter to reduce memory usage.
Here are the specifications of the system used for training:
- CPU: AMD Ryzen 9 5900X
- GPU: Nvidia RTX 3080 (12GB VRAM)
- RAM: 64GB DDR4
- OS: Windows 11
The training process utilized the big_track.json
file with the following hyperparameters:
- n_envs: 24
- n_epochs: 200
- n_steps: 1024
- batch_size: 512
- train_iters: 40
- gamma: 0.99
- gae_lambda: 0.95
- clip_coef: 0.2
- vf_coef: 0.5
- ent_coef: 0.001
- max_grad_norm: 1.0
- learning_rate: 3e-4
- learning_rate_decay: 0.99
- reward_scaling: 0.1
The following charts provide insights into the performance during training: