Skip to content

amimayo/Spikerman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Alt Text

Python PyTorch snnTorch Gymnasium MuJoCo

Introduction

Spikerman is a continuous control project focused on hexapod locomotion using the MuJoCo physics engine. The core system relies on a Central Pattern Generator (CPG) to establish a base walking rhythm while a Reinforcement Learning agent provides residual joint corrections to maintain balance and optimize forward movement. By leveraging Spiking Neural Networks, this hybrid approach achieves computationally sparse, energy-efficient, and stable locomotion.


πŸ“ File Structure


Spikerman/
β”œβ”€β”€ assets/                  # Evaluation test GIFs and videos
β”œβ”€β”€ configs/                 # Hyperparameter configurations
β”‚   β”œβ”€β”€ popsan_config.yaml   # Config for Spiking Actor Network
β”‚   └── sac_config.yaml      # Config for standard SAC baseline
β”œβ”€β”€ env/                     # Environment definitions
β”‚   β”œβ”€β”€ hexapod.xml          # MuJoCo XML File
β”‚   └── spikerman_env.py     # Custom Environment With CPG
β”œβ”€β”€ models/                  # Saved model weights
β”‚   β”œβ”€β”€ popsan/
β”‚   └── sac/
β”œβ”€β”€ results/                 # Training logs and graphs
β”‚   β”œβ”€β”€ popsan/
β”‚   β”‚   β”œβ”€β”€ popsan_training_rewards_graph.png
β”‚   β”‚   └── spikerman_popsan_traininglog.csv
β”‚   └── sac/
β”‚       β”œβ”€β”€ sac_training_rewards_graph.png
β”‚       └── spikerman_sac_traininglog.csv
β”œβ”€β”€ rl/                      # Reinforcement Learning Algorithms
β”‚   β”œβ”€β”€ popsan/              # PopSAN Soft Actor Critic
β”‚   β”‚   β”œβ”€β”€ agent.py         # Agent
β”‚   β”‚   └── train_test.py    # Training And Testing
β”‚   └── sac/                 # Standard Soft Actor Critic implementation
β”‚       β”œβ”€β”€ agent.py         # Agent
β”‚       └── train_test.py    # Training And Testing
β”œβ”€β”€ scripts/                 # Utility scripts
β”‚   └── visualize.py         # Plot Training Rewards
β”œβ”€β”€ requirements.txt         # Project Dependencies
β”œβ”€β”€ spikerman.py             # Main execution script
β”œβ”€β”€ LICENSE                  # MIT License
└── README.md                # Project Documentation


🧠 Background : Spiking Neural Networks & PopSAN

Standard neural networks pass continuous floating point numbers between layers. Spiking Neural Networks (SNNs) operate more like biological brains by passing binary spikes (0 or 1) over discrete time steps.

SNN

PopSAN uses the CuBA LIF (Current Based Leaky Integrate and Fire) neuron model. In a CuBA LIF neuron, incoming spikes are converted into current which builds up a "membrane potential" inside the neuron. This potential constantly leaks or decays over time. If the potential crosses a specific threshold, the neuron fires a single spike and its voltage resets.

PopSAN Architecture

Because SNNs output binary spikes, getting precise continuous values for motor torques is difficult. A single neuron firing a 0 or 1 cannot easily represent a motor torque of 0.45.

PopSAN (Population coded Spiking Actor Network) solves this using a technique called Population Coding. Instead of relying on a single output neuron per motor joint, each action dimension is represented by a population of neurons (a population size of 10 is used here).

The continuous input observation is passed through a learnable encoder that generates a mean and standard deviation for each population's Gaussian receptive field. Using deterministic encoding, the inputs are converted into spike trains that unroll over a simulated time window of 5 steps. After processing through the hidden layers, the final motor actions are decoded. A learnable decoder computes a weighted sum of the output population firing rates and adds a bias to produce the continuous action. This architecture gives us smooth and continuous motor control while keeping the internal hidden layers completely spiking and energy efficient.

PopSAN

Note : The original PopSAN research paper uses a Rectangular Surrogate Gradient for backpropagation. Since snnTorch does not support rectangular gradients natively, this implementation approximates it using the ATAN surrogate gradient with alpha=2.0. The original research paper uses 256 hidden dimensions for its hidden layers. The current iteration of Spikerman utilizes 512 hidden dimensions to handle the complex MuJoCo hexapod dynamics.

βš™οΈ Environment

The baseline training algorithm used is Soft Actor Critic (SAC). Instead of forcing the network to learn locomotion completely from scratch, the environment logic utilizes a framework that combines pattern generation with learned residual corrections.

Control Dynamics

  • Central Pattern Generation (CPG): A CPG Network generates phase coupled sine waves that act as a base rhythm for the hexapod's lift and extend motions resulting in a standard tripod gait.

  • Residual Actions: The RL agent observes the state and outputs a scaled action (-1.0 to 1.0) which is multiplied by a 0.5 factor and added to the CPG's prior action. The agent only learns the micro adjustments needed to correct for tilt, slip, or orientation errors.

  • Action Smooth Filtering: To prevent jittery or erratic motor movements, the combined CPG and residual action is passed through an exponential moving average filter (alpha=0.6) before being applied to the MuJoCo actuators.

  • Hoisting: During early training, a simulated upward force is applied to the robot's torso. This "hoisting" factor linearly decays over the first 150,000 steps allowing the agent to learn the gait mechanics without instantly collapsing under gravity.

The reward function evaluates multiple factors to ensure stable and efficient walking. An episode is considered a "success" if the agent achieves a score above the 2500 reward threshold which was empirically determined as the minimum reward at which the hexapod demonstrates stable forward locomotion throughout the full 1000-step episode.


πŸŽ›οΈ Hyperparameters

The following hyperparameters were used to achieve the final evaluation scores for both the continuous SAC baseline and the discrete PopSAN architecture. All of these training parameters are fully customizable. The exact values are explicitly defined in their respective YAML files within the configs/ directory.

Parameter SAC Baseline PopSAN
Learning Rate 3e-4 3e-4
Hidden Dimensions 512 512
Batch Size 128 128
Buffer Size 1,000,000 1,000,000
Discount Factor ($\gamma$) 0.99 0.99
Target Smoothing ($\tau$) 0.005 0.005
Temperature 0.05 0.05
Epsilon 1e-6 1e-6
Warmup Steps 10,000 10,000
Hoist Steps 150,000 150,000
Training Episodes 500 500
Population Size N/A 10
Simulation Timesteps ($T$) N/A 5
CuBA LIF Alpha (Decay) N/A 0.9
CuBA LIF Beta N/A 0.8

πŸ“Š Results

The implementation of the PopSAN architecture demonstrated massive computational efficiency compared to the standard ANN baseline. While the ANN achieved a marginally higher average evaluation reward, both models successfully cleared the 2500 threshold with a 100% success rate. The PopSAN achieved this comparable locomotion quality while being energy efficient.

Environment Model Architecture Training Rewards Graph
Spikerman (MuJoCo) SAC SAC Training Graph
Spikerman (MuJoCo) PopSAN PopSAN Training Graph

Performance Metrics

Metric SAC Baseline PopSAN
Average Reward 3706.41 3474.42
Success Rate 100% 100%
Temporal Sparsity N/A 83.1%
SAC Actor MACs 320,000 β€”
Equivalent Capacity ANN MACs 748,544 β€”
PopSAN SOPs β€” 1,015

SAC Actor MACs are computed for the actual trained MLP (77β†’512β†’512β†’18). Equivalent ANN MACs represent a hypothetical MLP with the same population-expanded dimensions as the PopSAN (770β†’512β†’512β†’180), and reflect the computational cost of an ANN with equivalent representational capacity. PopSAN SOPs count only active synaptic operations from non-silent neurons.

The 83.1% sparsity indicates that the active neurons only fire when strictly necessary to correct the hexapod's posture resulting in extremely low Synaptic Operations (SOPs) per step.

Despite the slight reduction in raw episode reward (βˆ’6.3%), the PopSAN architecture achieves 100% task success, identical to the SAC baseline. The marginal reward gap reflects the SNN's inherent regularisation through spike sparsity rather than a reduction in locomotion quality.


🎬 Showcase

Environment Model Architecture Demonstration
Spikerman (MuJoCo) SAC SAC
Spikerman (MuJoCo) PopSAN PopSAN

πŸš€ Getting Started

Follow these steps to set up the physics simulation and test the trained models locally.

1. Clone the repository and install dependencies:

Ensure you have Python 3.10 or higher installed. Install the required libraries.


git clone https://github.com/amimayo/Spikerman.git
cd Spikerman
pip install -r requirements.txt

2. Run the main script:

You can train, evaluate the models or visualize the training rewards graph using the command line interface.


# To train the PopSAN / SAC model 
python spikerman.py --mode train --algo popsan --config configs/popsan_config.yaml

# To evaluate the PopSAN / SAC Model
python spikerman.py --mode eval --algo popsan --config configs/popsan_config.yaml

# To plot the training rewards graph for PopSAN / SAC Model
python spikerman.py --mode visualize --algo popsan 


πŸ“š References


πŸ› οΈ To-Do List

  • [🟩] Complete First Simulation of PopSAN and SAC (512 Hidden Dimensions) and Verify Environment : Run and verify full simulations for the implementations
  • [🟨] Compute CoT : Compute Cost of Transport
  • [🟨] Complete Simulation of PopSAN and SAC (256 Hidden Dimensions) : Run and verify full simulations for the implementation with 256 hidden dimensions

πŸ“œ License

Distributed under the MIT License.


About

Hexapod Robot Locomotion By RL Using Spiking Neural Networks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages