This project explores Kalman-style smoothing as a general and robust strategy for learning under noisy statistics.
Rather than treating Kalman filtering as a strict state-space estimator, we adopt it as a geometric and statistical smoothing principle, applicable across different learning paradigms.
The project consists of two complementary parts:
- Normalization and Optimization Geometry in Deep Networks
- Kalman-Based Reward Normalization in Reinforcement Learning
Together, these parts demonstrate how temporal smoothing of noisy signals can significantly stabilize training dynamics, improve robustness, and provide interpretable geometric structure in learning trajectories.
For part 1, the repository was tested with Python ≥ 3.8 and PyTorch ≥ 1.10.
The codebase is lightweight and does not rely on any non-standard dependencies.
We recommend using a virtual environment.
conda create -n kalman_norm python=3.9 -y
conda activate kalman_normInstall required packages:
pip install torch torchvision numpy matplotlib tqdmNote on a Weird Runtime Issue
This issue is admittedly a bit strange. From our experience, it seems to be caused by an inconsistency between the Gym and NumPy environments, possibly related to version mismatches or how the runtime handles package states.
If rerunning the two cells once does not fix the problem, try rerunning them multiple times (yes, really). In some cases, the error disappears after a few retries. Unfortunately, we do not yet have a clean or principled explanation for why this happens.
This project provides a single, minimal entry point to reproduce the core experimental results used in the final presentation for the Seminar on Geometry and Topology in Deep Learning.
All CIFAR-10 experiments are run via:
python examples/cifar10_train.py \
--norm_type gkn \
--num_groups 4 \
--p_rate 0.1Normalization layers implicitly define a coordinate system and local geometry in feature space.
However, commonly used methods such as Batch Normalization and Group Normalization rely on instantaneous or weakly smoothed statistics, which can be noisy under small batch sizes or non-ideal training conditions.
We propose Kalman-Style Smoothing for Normalization, instantiated as Group Kalman-Inspired Normalization (GKN), which:
- Computes GroupNorm-style per-sample statistics
- Applies temporal smoothing to group-level moments
- Reduces variance in statistical estimation without sacrificing convergence or accuracy
Training dynamics are viewed as a trajectory on a loss-induced manifold.
Kalman-style smoothing acts as a low-pass filter along the time dimension, leading to:
- Smoother optimization paths
- Reduced curvature and noise in training trajectories
- More stable generalization behavior
- Dataset: CIFAR-10 / CIFAR-100
- Architecture: ResNet variants
- Baselines: BatchNorm (BN), GroupNorm (GN)
- Metrics:
- Train / Test Loss and Accuracy
- Geometric trajectory analysis in loss and accuracy space
- Discrete speed, curvature (turning angle), and path length
| Dataset | Normalization | Test Accuracy (%) |
|---|---|---|
| CIFAR-10 | GN | 88.12 |
| CIFAR-10 | GKN (Ours) | 88.47 |
| CIFAR-100 | GN | 59.63 |
| CIFAR-100 | GKN (Ours) | 62.36 |
We visualize training as curves in:
- Loss space: (L_train, L_test)
- Accuracy space: (Acc_train, Acc_test)
Compared to GN, GKN consistently exhibits:
- Lower trajectory curvature
- Reduced high-frequency oscillations
- More stable coupling between training and generalization
These observations support a geometric interpretation of normalization as a metric regularizer on training dynamics. Please find the plots in final_project.ipynb
Geometric interpretation (accuracy space).
We view training as a discrete trajectory γ(t) = (train_acc(t), test_acc(t)) in a 2D generalization space.
Compared to GN, GKN yields a substantially shorter trajectory (total arclength 132.7 vs 155.9) and smaller average step length (1.34 vs 1.58), indicating fewer epoch-to-epoch oscillations and less “wandering” in the train–test accuracy coupling.
Although turning-angle statistics are comparable in this projection, the reduced path length and step magnitude strongly support the claim that Kalman-style smoothing stabilizes training dynamics and improves robustness in the generalization trajectory.
Reinforcement learning often suffers from high-variance and non-stationary reward signals, which severely degrade training stability and sample efficiency—especially in policy gradient methods.
In this part, we explore Kalman-based reward normalization, replacing conventional mean–std normalization with a lightweight 1D Kalman filter applied directly to reward streams.
- Rewards are modeled as noisy observations of an underlying latent signal
- A 1D Kalman filter provides adaptive, online normalization
- The approach is plug-and-play, computationally lightweight, and compatible with standard RL pipelines
- Algorithms: PPO and maybe its variants
- Environments: CartPole, LunarLander
We evaluate the effectiveness of Kalman-based reward normalization within the PPO framework on two classic control benchmarks: CartPole and LunarLander. Across both environments, the proposed normalization strategy consistently demonstrates clear advantages over standard reward normalization.
In the CartPole environment, where the maximum achievable reward is 500, PPO with Kalman-based reward normalization exhibits significantly faster convergence. Specifically, the agent reaches the maximum reward after only 77 episodes. In contrast, when using conventional reward normalization, PPO requires approximately 306 episodes to achieve the same performance. This large gap highlights the substantial acceleration in early-stage learning brought by the Kalman-based approach.
A similar trend is observed in the LunarLander environment, which has a maximum reward of 200. With Kalman-based reward normalization, PPO converges to the maximum reward within 750 episodes, whereas the baseline approach with standard normalization requires around 991 episodes. Although the performance gap is less dramatic than in CartPole, the improvement remains consistent and meaningful.
Overall, these empirical results show that Kalman-based reward normalization in PPO leads to:
- Faster early-stage convergence, as evidenced by the significantly reduced number of episodes needed to reach maximum reward.
- Reduced variance in learning curves, reflected in more predictable and smoother training dynamics.
- Improved stability across random seeds, with more consistent convergence behavior compared to standard normalization.
These findings suggest that incorporating Kalman filtering into reward normalization can be an effective and lightweight modification to enhance PPO training efficiency and robustness.
This reinforcement learning work was previously submitted to the ICML New in ML Workshop, an inclusive venue aimed at encouraging exploratory machine learning research by early-stage researchers.
Although exploratory in nature, subsequent feedback and industry interest suggest that Kalman-based reward normalization is a promising direction worthy of further investigation.
We therefore include this work as the second part of the present project.
Across both parts, this project demonstrates that:
Kalman-style smoothing provides a unifying geometric and statistical framework for stabilizing learning under noise.
- In supervised learning, it smooths normalization statistics and shapes optimization geometry.
- In reinforcement learning, it smooths reward signals and stabilizes policy updates.
- Learnable Kalman parameters (Q, R)
- Uncertainty-aware normalization layers
- State-dependent reward filtering
- Extensions to continuous control and offline RL
- Connections to information geometry and dynamical systems
MIT License
This project is closely related to our prior work on Kalman-based optimization methods,
which was presented in the Seminar on Geometry and Topology in Deep Learning.
@inproceedings{xiakoala++,
title = {KOALA++: Efficient Kalman-Based Optimization with Gradient-Covariance Products},
author = {Xia, Zixuan and Davtyan, Aram and Favaro, Paolo},
booktitle = {The Thirty-ninth Annual Conference on Neural Information Processing Systems},
}Other References of the project includes:
@inproceedings{wu2018group,
title={Group normalization},
author={Wu, Yuxin and He, Kaiming},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={3--19},
year={2018}
}
@article{schulman2017proximal,
title={Proximal policy optimization algorithms},
author={Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg},
journal={arXiv preprint arXiv:1707.06347},
year={2017}
}