Spikerman is a continuous control project focused on hexapod locomotion using the MuJoCo physics engine. The core system relies on a Central Pattern Generator (CPG) to establish a base walking rhythm while a Reinforcement Learning agent provides residual joint corrections to maintain balance and optimize forward movement. By leveraging Spiking Neural Networks, this hybrid approach achieves computationally sparse, energy-efficient, and stable locomotion.
Spikerman/
βββ assets/ # Evaluation test GIFs and videos
βββ configs/ # Hyperparameter configurations
β βββ popsan_config.yaml # Config for Spiking Actor Network
β βββ sac_config.yaml # Config for standard SAC baseline
βββ env/ # Environment definitions
β βββ hexapod.xml # MuJoCo XML File
β βββ spikerman_env.py # Custom Environment With CPG
βββ models/ # Saved model weights
β βββ popsan/
β βββ sac/
βββ results/ # Training logs and graphs
β βββ popsan/
β β βββ popsan_training_rewards_graph.png
β β βββ spikerman_popsan_traininglog.csv
β βββ sac/
β βββ sac_training_rewards_graph.png
β βββ spikerman_sac_traininglog.csv
βββ rl/ # Reinforcement Learning Algorithms
β βββ popsan/ # PopSAN Soft Actor Critic
β β βββ agent.py # Agent
β β βββ train_test.py # Training And Testing
β βββ sac/ # Standard Soft Actor Critic implementation
β βββ agent.py # Agent
β βββ train_test.py # Training And Testing
βββ scripts/ # Utility scripts
β βββ visualize.py # Plot Training Rewards
βββ requirements.txt # Project Dependencies
βββ spikerman.py # Main execution script
βββ LICENSE # MIT License
βββ README.md # Project Documentation
Standard neural networks pass continuous floating point numbers between layers. Spiking Neural Networks (SNNs) operate more like biological brains by passing binary spikes (0 or 1) over discrete time steps.
PopSAN uses the CuBA LIF (Current Based Leaky Integrate and Fire) neuron model. In a CuBA LIF neuron, incoming spikes are converted into current which builds up a "membrane potential" inside the neuron. This potential constantly leaks or decays over time. If the potential crosses a specific threshold, the neuron fires a single spike and its voltage resets.
Because SNNs output binary spikes, getting precise continuous values for motor torques is difficult. A single neuron firing a 0 or 1 cannot easily represent a motor torque of 0.45.
PopSAN (Population coded Spiking Actor Network) solves this using a technique called Population Coding. Instead of relying on a single output neuron per motor joint, each action dimension is represented by a population of neurons (a population size of 10 is used here).
The continuous input observation is passed through a learnable encoder that generates a mean and standard deviation for each population's Gaussian receptive field. Using deterministic encoding, the inputs are converted into spike trains that unroll over a simulated time window of 5 steps. After processing through the hidden layers, the final motor actions are decoded. A learnable decoder computes a weighted sum of the output population firing rates and adds a bias to produce the continuous action. This architecture gives us smooth and continuous motor control while keeping the internal hidden layers completely spiking and energy efficient.
Note : The original PopSAN research paper uses a Rectangular Surrogate Gradient for backpropagation. Since snnTorch does not support rectangular gradients natively, this implementation approximates it using the ATAN surrogate gradient with alpha=2.0. The original research paper uses 256 hidden dimensions for its hidden layers. The current iteration of Spikerman utilizes 512 hidden dimensions to handle the complex MuJoCo hexapod dynamics.
The baseline training algorithm used is Soft Actor Critic (SAC). Instead of forcing the network to learn locomotion completely from scratch, the environment logic utilizes a framework that combines pattern generation with learned residual corrections.
-
Central Pattern Generation (CPG): A CPG Network generates phase coupled sine waves that act as a base rhythm for the hexapod's lift and extend motions resulting in a standard tripod gait.
-
Residual Actions: The RL agent observes the state and outputs a scaled action (-1.0 to 1.0) which is multiplied by a 0.5 factor and added to the CPG's prior action. The agent only learns the micro adjustments needed to correct for tilt, slip, or orientation errors.
-
Action Smooth Filtering: To prevent jittery or erratic motor movements, the combined CPG and residual action is passed through an exponential moving average filter (
alpha=0.6) before being applied to the MuJoCo actuators. -
Hoisting: During early training, a simulated upward force is applied to the robot's torso. This "hoisting" factor linearly decays over the first 150,000 steps allowing the agent to learn the gait mechanics without instantly collapsing under gravity.
The reward function evaluates multiple factors to ensure stable and efficient walking. An episode is considered a "success" if the agent achieves a score above the 2500 reward threshold which was empirically determined as the minimum reward at which the hexapod demonstrates stable forward locomotion throughout the full 1000-step episode.
The following hyperparameters were used to achieve the final evaluation scores for both the continuous SAC baseline and the discrete PopSAN architecture. All of these training parameters are fully customizable. The exact values are explicitly defined in their respective YAML files within the configs/ directory.
| Parameter | SAC Baseline | PopSAN |
|---|---|---|
| Learning Rate | 3e-4 | 3e-4 |
| Hidden Dimensions | 512 | 512 |
| Batch Size | 128 | 128 |
| Buffer Size | 1,000,000 | 1,000,000 |
| Discount Factor ( |
0.99 | 0.99 |
| Target Smoothing ( |
0.005 | 0.005 |
| Temperature | 0.05 | 0.05 |
| Epsilon | 1e-6 | 1e-6 |
| Warmup Steps | 10,000 | 10,000 |
| Hoist Steps | 150,000 | 150,000 |
| Training Episodes | 500 | 500 |
| Population Size | N/A | 10 |
| Simulation Timesteps ( |
N/A | 5 |
| CuBA LIF Alpha (Decay) | N/A | 0.9 |
| CuBA LIF Beta | N/A | 0.8 |
The implementation of the PopSAN architecture demonstrated massive computational efficiency compared to the standard ANN baseline. While the ANN achieved a marginally higher average evaluation reward, both models successfully cleared the 2500 threshold with a 100% success rate. The PopSAN achieved this comparable locomotion quality while being energy efficient.
| Environment | Model Architecture | Training Rewards Graph |
|---|---|---|
| Spikerman (MuJoCo) | SAC | ![]() |
| Spikerman (MuJoCo) | PopSAN | ![]() |
| Metric | SAC Baseline | PopSAN |
|---|---|---|
| Average Reward | 3706.41 | 3474.42 |
| Success Rate | 100% | 100% |
| Temporal Sparsity | N/A | 83.1% |
| SAC Actor MACs | 320,000 | β |
| Equivalent Capacity ANN MACs | 748,544 | β |
| PopSAN SOPs | β | 1,015 |
SAC Actor MACs are computed for the actual trained MLP (77β512β512β18). Equivalent ANN MACs represent a hypothetical MLP with the same population-expanded dimensions as the PopSAN (770β512β512β180), and reflect the computational cost of an ANN with equivalent representational capacity. PopSAN SOPs count only active synaptic operations from non-silent neurons.
The 83.1% sparsity indicates that the active neurons only fire when strictly necessary to correct the hexapod's posture resulting in extremely low Synaptic Operations (SOPs) per step.
Despite the slight reduction in raw episode reward (β6.3%), the PopSAN architecture achieves 100% task success, identical to the SAC baseline. The marginal reward gap reflects the SNN's inherent regularisation through spike sparsity rather than a reduction in locomotion quality.
| Environment | Model Architecture | Demonstration |
|---|---|---|
| Spikerman (MuJoCo) | SAC | ![]() |
| Spikerman (MuJoCo) | PopSAN | ![]() |
Follow these steps to set up the physics simulation and test the trained models locally.
1. Clone the repository and install dependencies:
Ensure you have Python 3.10 or higher installed. Install the required libraries.
git clone https://github.com/amimayo/Spikerman.git
cd Spikerman
pip install -r requirements.txt
2. Run the main script:
You can train, evaluate the models or visualize the training rewards graph using the command line interface.
# To train the PopSAN / SAC model
python spikerman.py --mode train --algo popsan --config configs/popsan_config.yaml
# To evaluate the PopSAN / SAC Model
python spikerman.py --mode eval --algo popsan --config configs/popsan_config.yaml
# To plot the training rewards graph for PopSAN / SAC Model
python spikerman.py --mode visualize --algo popsan
- Deep Reinforcement Learning with Population-Coded Spiking Neural Network for Continuous Control - Tang et al.
- Spiking Neural Network Discovers Energy-Efficient Hexapod Motion in Deep Reinforcement Learning - Naya et al.
- Training Spiking Neural Networks Using Lessons From Deep Learning - Eshraghian et al.
- [π©] Complete First Simulation of PopSAN and SAC (512 Hidden Dimensions) and Verify Environment : Run and verify full simulations for the implementations
- [π¨] Compute CoT : Compute Cost of Transport
- [π¨] Complete Simulation of PopSAN and SAC (256 Hidden Dimensions) : Run and verify full simulations for the implementation with 256 hidden dimensions
Distributed under the MIT License.






