A reinforcement learning-based multi-agent path planning system using Soft Actor-Critic (SAC) with curriculum learning and attention mechanisms.
- Multi-Agent Coordination: Leader-follower formation control with dynamic agent count adaptation
- Curriculum Learning: Progressive task difficulty with automatic knowledge transfer
- Attention Mechanism: Structured attention networks for agent communication
- Multiple Algorithms:
- SAC (Soft Actor-Critic)
- MASAC (Multi-Agent SAC with attention)
- H-CRRT (Hierarchical RRT* baseline)
- Flexible Environment: Customizable obstacles, goals, and agent configurations
- Visualization: Real-time rendering and trajectory analysis
path planning2/
βββ main_SAC.py # Basic SAC implementation
βββ main_SAC_curriculum.py # Curriculum learning with MASAC
βββ requirements.txt # Python dependencies
β
βββ rl_env/ # Reinforcement learning environment
β βββ path_env.py # Main environment interface
β βββ components/ # Entity management, rewards, rendering
β
βββ masac_adapter/ # Multi-agent SAC with attention
β βββ actor_networks.py # Leader and follower actor networks
β βββ critic_networks.py # Structured attention critic
β βββ smer_memory.py # Experience replay buffer
β βββ masac_controller.py # MASAC controller
β
βββ curriculum/ # Curriculum learning framework
β βββ curriculum_manager.py # Task progression manager
β βββ task_generator.py # Task difficulty generator
β βββ task_sequencer.py # Task ordering
β βββ knowledge_transfer.py # Policy transfer between tasks
β
βββ H_CRRT/ # Baseline RRT* planner
β βββ rrtstar.py # RRT* path planning
β βββ formation.py # Formation control
β βββ tracking.py # Path tracking controller
β
βββ masac_no_curriculum/ # Ablation study versions
βββ masac_no_attention/
- Python 3.7+
- CUDA-capable GPU (recommended for training)
pip install -r requirements.txt- PyTorch >= 1.8.0
- NumPy
- Pygame (for visualization)
- Matplotlib (for plotting)
- TensorBoard (for training logs)
Train a basic SAC agent:
python main_SAC.py --mode trainTrain with progressive curriculum:
python main_SAC_curriculum.py --use_curriculumpython main_SAC_curriculum.py --test --model_path Path_SAC_curriculum_step4_ep80python main_SAC.py --mode trainpython main_SAC.py --mode train --renderpython main_SAC.py --mode testpython main_SAC.py --mode test --model_path D:/pa/path planning2/Path_SAC_actor_L1.pthpython main_SAC_curriculum.py --use_curriculumpython main_SAC_curriculum.py --use_curriculum --renderpython main_SAC_curriculum.py --testpython main_SAC_curriculum.py --log_level warningpython main_SAC_curriculum.py --test \
--model_path D:/pa/path planning2/Path_SAC_curriculum_step4_ep80 \
--test_episodes 100 \
--hero_count 1 \
--enemy_count 3 \
--obstacle_count 2| Argument | Type | Default | Description |
|---|---|---|---|
--mode |
str | train | Mode: train or test |
--render |
flag | False | Enable visualization |
--use_curriculum |
flag | False | Enable curriculum learning |
--test |
flag | False | Run in test mode |
--test_episodes |
int | 100 | Number of test episodes |
--hero_count |
int | 1 | Number of leader agents |
--enemy_count |
int | 4 | Number of follower agents |
--obstacle_count |
int | 1 | Number of obstacles |
--model_path |
str | - | Path to saved model |
--log_level |
str | info | Logging level: debug, info, warning, error |
--test_speed |
float | 1.0 | Test simulation speed |
--analyze |
flag | False | Generate analysis plots |
--result_path |
str | results/ | Path to save results |
cd H_CRRT
python run_hcrrt.py --hero_count 1 --enemy_count 3 --obstacle_count 3 --test_episodes 20Edit configuration in rl_env/components/entities.py:
SCREEN_W = 800 # Environment width
SCREEN_H = 600 # Environment height
AREA_X = 100 # Valid area boundaries
AREA_Y = 100
AREA_WITH = 600
AREA_HEIGHT = 500Modify curriculum parameters in curriculum/utils/config.py:
curriculum_manager:
max_curriculum_steps: 20
max_episodes_per_task: 200
evaluation_window: 15
success_rate_threshold: 0.9Adjust in main_SAC_curriculum.py:
batch_size = 256
gamma = 0.99
tau = 0.01
value_lr = 3e-4
policy_lr = 1e-4Training results are saved in:
results/: Test results and analysismodels/: Saved model checkpoints- TensorBoard logs for training curves
tensorboard --logdir=runs- Role-specific actor networks (leader/follower)
- Structured attention critic for agent coordination
- Shared replay buffer with experience prioritization
- Fixed task progression with increasing difficulty
- Knowledge transfer via policy parameter reuse
- Adaptive agent count scaling
- Hierarchical RRT* for global path planning
- Distributed formation tracking control
- Pure pursuit and speed synchronization
- First training run will be slower due to environment initialization
- GPU is highly recommended for training (10-100x speedup)
- Rendering significantly slows down training
- Saved models include both actor and critic networks
Reduce batch size or replay buffer size:
python main_SAC_curriculum.py --batch_size 128Run without rendering:
python main_SAC_curriculum.py --use_curriculum- Disable rendering during training
- Reduce number of training episodes
- Use GPU acceleration
This project is available for academic and research purposes.
Contributions are welcome! Please feel free to submit issues or pull requests.
For questions or collaboration, please open an issue on GitHub.
Last Updated: February 2026