Skip to content
Stochastic Weighted Function Norm Regularization
Jupyter Notebook Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Stochastic Weighted Function Norm Regularization


This code implements a regularization method for neural networks. It provides a proof of concept on a simple task: train a classifier on a small subset of MNIST

It is based on a work where we attempt to bridge the gap between statistical learning theory and deep neural network developments. While the former provides solid foundations for regularization of several learning algorithms such as SVMs and logistic regression, deep neural networks still suffer from a lack of systematic and mathematically motivated regularizers. In this context, we propose to use a regularizer inspired from classical learning theory: Function norm. In other words, we limit the hypothesis set in which we optimize the objective to a ball in the L2 function space. As we proved that the exact computation of this norm is NP-hard, we propose to estimate it stochastically by generating samples using a variational autoencoder.


The code has been tested under Fedora 27 using

  1. Python 2.7.11
  2. PyTorch 0.2.0_4


To run the experiments, refer to We test with two architecture, one with which it is customary to use Dropout and one with which it is customary to use Batch Normalization (see notebooks for details about the architectures and training parameters)

The results of our experiments are displayed in the notebooks. It appears that:

  1. Dropout alone does better that our method, but using the combination gives the best results
  2. Our method slightly outperforms batch normalization alone but the combination of both methods degrades the performance
  3. In all cases our method alone outperforms substantially weight decay alone


We provide 2 modules:


Defines 2 variational autorencoders architectures. For our experiments, we used only one type of VAE; Vanilla VAE. The other architecture is given for further experimentation.


Provides needed classes for random subset selection and models definition

We also provide some utils needed in our training code (see and


  • Amal Rannen Triki, Maxim Berman & Matthew B. Blaschko

** All the authours are with KU Leuven, ESAT-PSI, IMEC, Belgium ** A. Rannen Triki is also affiliated with Yonsei University, Korea

** For questions about the code, please contact Amal Rannen Triki (



  title={Stochastic Weighted Function Norm Regularization},
  author={Rannen Triki, Amal and Berman, Maxim and Blaschko, Matthew B},
  journal={arXiv preprint arXiv:1710.06703},
You can’t perform that action at this time.