GitHub - amimayo/Spikerman: Hexapod Robot Locomotion By RL Using Spiking Neural Networks

Introduction

Spikerman is a continuous control project focused on hexapod locomotion using the MuJoCo physics engine. The core system relies on a Central Pattern Generator (CPG) to establish a base walking rhythm while a Reinforcement Learning agent provides residual joint corrections to maintain balance and optimize forward movement. By leveraging Spiking Neural Networks, this hybrid approach achieves computationally sparse, energy-efficient, and stable locomotion.

📁 File Structure


Spikerman/
├── assets/                  # Evaluation test GIFs and videos
├── configs/                 # Hyperparameter configurations
│   ├── popsan_config.yaml   # Config for Spiking Actor Network
│   └── sac_config.yaml      # Config for standard SAC baseline
├── env/                     # Environment definitions
│   ├── hexapod.xml          # MuJoCo XML File
│   └── spikerman_env.py     # Custom Environment With CPG
├── models/                  # Saved model weights
│   ├── popsan/
│   └── sac/
├── results/                 # Training logs and graphs
│   ├── popsan/
│   │   ├── popsan_training_rewards_graph.png
│   │   └── spikerman_popsan_traininglog.csv
│   └── sac/
│       ├── sac_training_rewards_graph.png
│       └── spikerman_sac_traininglog.csv
├── rl/                      # Reinforcement Learning Algorithms
│   ├── popsan/              # PopSAN Soft Actor Critic
│   │   ├── agent.py         # Agent
│   │   └── train_test.py    # Training And Testing
│   └── sac/                 # Standard Soft Actor Critic implementation
│       ├── agent.py         # Agent
│       └── train_test.py    # Training And Testing
├── scripts/                 # Utility scripts
│   └── visualize.py         # Plot Training Rewards
├── requirements.txt         # Project Dependencies
├── spikerman.py             # Main execution script
├── LICENSE                  # MIT License
└── README.md                # Project Documentation

🧠 Background : Spiking Neural Networks & PopSAN

Standard neural networks pass continuous floating point numbers between layers. Spiking Neural Networks (SNNs) operate more like biological brains by passing binary spikes (0 or 1) over discrete time steps.

PopSAN uses the CuBA LIF (Current Based Leaky Integrate and Fire) neuron model. In a CuBA LIF neuron, incoming spikes are converted into current which builds up a "membrane potential" inside the neuron. This potential constantly leaks or decays over time. If the potential crosses a specific threshold, the neuron fires a single spike and its voltage resets.

PopSAN Architecture

Because SNNs output binary spikes, getting precise continuous values for motor torques is difficult. A single neuron firing a 0 or 1 cannot easily represent a motor torque of 0.45.

PopSAN (Population coded Spiking Actor Network) solves this using a technique called Population Coding. Instead of relying on a single output neuron per motor joint, each action dimension is represented by a population of neurons (a population size of 10 is used here).

The continuous input observation is passed through a learnable encoder that generates a mean and standard deviation for each population's Gaussian receptive field. Using deterministic encoding, the inputs are converted into spike trains that unroll over a simulated time window of 5 steps. After processing through the hidden layers, the final motor actions are decoded. A learnable decoder computes a weighted sum of the output population firing rates and adds a bias to produce the continuous action. This architecture gives us smooth and continuous motor control while keeping the internal hidden layers completely spiking and energy efficient.

Note : The original PopSAN research paper uses a Rectangular Surrogate Gradient for backpropagation. Since snnTorch does not support rectangular gradients natively, this implementation approximates it using the ATAN surrogate gradient with alpha=2.0. The original research paper uses 256 hidden dimensions for its hidden layers. The current iteration of Spikerman utilizes 512 hidden dimensions to handle the complex MuJoCo hexapod dynamics.

⚙️ Environment

The baseline training algorithm used is Soft Actor Critic (SAC). Instead of forcing the network to learn locomotion completely from scratch, the environment logic utilizes a framework that combines pattern generation with learned residual corrections.

Control Dynamics

Central Pattern Generation (CPG): A CPG Network generates phase coupled sine waves that act as a base rhythm for the hexapod's lift and extend motions resulting in a standard tripod gait.
Residual Actions: The RL agent observes the state and outputs a scaled action (-1.0 to 1.0) which is multiplied by a 0.5 factor and added to the CPG's prior action. The agent only learns the micro adjustments needed to correct for tilt, slip, or orientation errors.
Action Smooth Filtering: To prevent jittery or erratic motor movements, the combined CPG and residual action is passed through an exponential moving average filter (alpha=0.6) before being applied to the MuJoCo actuators.
Hoisting: During early training, a simulated upward force is applied to the robot's torso. This "hoisting" factor linearly decays over the first 150,000 steps allowing the agent to learn the gait mechanics without instantly collapsing under gravity.

The reward function evaluates multiple factors to ensure stable and efficient walking. An episode is considered a "success" if the agent achieves a score above the 2500 reward threshold which was empirically determined as the minimum reward at which the hexapod demonstrates stable forward locomotion throughout the full 1000-step episode.

🎛️ Hyperparameters

The following hyperparameters were used to achieve the final evaluation scores for both the continuous SAC baseline and the discrete PopSAN architecture. All of these training parameters are fully customizable. The exact values are explicitly defined in their respective YAML files within the configs/ directory.

Parameter	SAC Baseline	PopSAN
Learning Rate	3e-4	3e-4
Hidden Dimensions	512	512
Batch Size	128	128
Buffer Size	1,000,000	1,000,000
Discount Factor ($\gamma$)	0.99	0.99
Target Smoothing ($\tau$)	0.005	0.005
Temperature	0.05	0.05
Epsilon	1e-6	1e-6
Warmup Steps	10,000	10,000
Hoist Steps	150,000	150,000
Training Episodes	500	500
Population Size	N/A	10
Simulation Timesteps ($T$)	N/A	5
CuBA LIF Alpha (Decay)	N/A	0.9
CuBA LIF Beta	N/A	0.8

📊 Results

The implementation of the PopSAN architecture demonstrated massive computational efficiency compared to the standard ANN baseline. While the ANN achieved a marginally higher average evaluation reward, both models successfully cleared the 2500 threshold with a 100% success rate. The PopSAN achieved this comparable locomotion quality while being energy efficient.

Environment	Model Architecture	Training Rewards Graph
Spikerman (MuJoCo)	SAC
Spikerman (MuJoCo)	PopSAN

Performance Metrics

Metric	SAC Baseline	PopSAN
Average Reward	3706.41	3474.42
Success Rate	100%	100%
Temporal Sparsity	N/A	83.1%
SAC Actor MACs	320,000	—
Equivalent Capacity ANN MACs	748,544	—
PopSAN SOPs	—	1,015

SAC Actor MACs are computed for the actual trained MLP (77→512→512→18). Equivalent ANN MACs represent a hypothetical MLP with the same population-expanded dimensions as the PopSAN (770→512→512→180), and reflect the computational cost of an ANN with equivalent representational capacity. PopSAN SOPs count only active synaptic operations from non-silent neurons.

The 83.1% sparsity indicates that the active neurons only fire when strictly necessary to correct the hexapod's posture resulting in extremely low Synaptic Operations (SOPs) per step.

Despite the slight reduction in raw episode reward (−6.3%), the PopSAN architecture achieves 100% task success, identical to the SAC baseline. The marginal reward gap reflects the SNN's inherent regularisation through spike sparsity rather than a reduction in locomotion quality.

🎬 Showcase

Environment	Model Architecture	Demonstration
Spikerman (MuJoCo)	SAC
Spikerman (MuJoCo)	PopSAN

🚀 Getting Started

Follow these steps to set up the physics simulation and test the trained models locally.

1. Clone the repository and install dependencies:

Ensure you have Python 3.10 or higher installed. Install the required libraries.


git clone https://github.com/amimayo/Spikerman.git
cd Spikerman
pip install -r requirements.txt

2. Run the main script:

You can train, evaluate the models or visualize the training rewards graph using the command line interface.


# To train the PopSAN / SAC model 
python spikerman.py --mode train --algo popsan --config configs/popsan_config.yaml

# To evaluate the PopSAN / SAC Model
python spikerman.py --mode eval --algo popsan --config configs/popsan_config.yaml

# To plot the training rewards graph for PopSAN / SAC Model
python spikerman.py --mode visualize --algo popsan

📚 References

🛠️ To-Do List

[🟩] Complete First Simulation of PopSAN and SAC (512 Hidden Dimensions) and Verify Environment : Run and verify full simulations for the implementations
[🟨] Compute CoT : Compute Cost of Transport
[🟨] Complete Simulation of PopSAN and SAC (256 Hidden Dimensions) : Run and verify full simulations for the implementation with 256 hidden dimensions

📜 License

Distributed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

📁 File Structure

🧠 Background : Spiking Neural Networks & PopSAN

PopSAN Architecture

⚙️ Environment

Control Dynamics

🎛️ Hyperparameters

📊 Results

Performance Metrics

🎬 Showcase

🚀 Getting Started

📚 References

🛠️ To-Do List

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
configs		configs
env		env
models		models
results		results
rl		rl
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
spikerman.py		spikerman.py

Folders and files

Latest commit

History

Repository files navigation

Introduction

📁 File Structure

🧠 Background : Spiking Neural Networks & PopSAN

PopSAN Architecture

⚙️ Environment

Control Dynamics

🎛️ Hyperparameters

📊 Results

Performance Metrics

🎬 Showcase

🚀 Getting Started

📚 References

🛠️ To-Do List

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages