Improving Lipschitz-Constrained Neural Networks by Learning Activation Functions

Implementation of experiments done in : https://www.jmlr.org/papers/volume25/22-1347/22-1347.pdf

Description

Lipschitz-constrained neural networks have several advantages compared to unconstrained ones and can be applied to various different problems. Consequently, they have recently attracted considerable attention in the deep learning community. However, it has been shown both theoretically and empirically that networks with ReLU activation functions perform poorly in this context. On the contrary, neural networks with learnable 1-Lipschitz linear splines are known to be more expressive in theory. We propose an efficient method to train such 1-Lipschitz deep spline neural networks. Our numerical experiments for a variety of tasks show that our trained networks match or outperform networks with activation functions specifically tailored towards Lipschitz-constrained architectures.

Requirements

The required packages:

pytorch
torchvision
opencv
h5py
cvxpy
cvxpylayers
tqdm

You can install the exact environment I used with cudatoolkit 10.1 for the GPUs:

conda env create -f environment.yml

Training

You can train a model with the following command:

python train.py --exp 1d or wasserstein or wgan or denoising --config path/to/config --device cpu or cuda:n

Config file details️

Information about the hyperparameters that yield the best performance for the four experiments can be found in the config folder.

Below we detail the model hyperparameters for the denoising experiment that can be controlled in the config file configs/config_denoising.json.

{
    "activation_fn_params": {
        "activation_type": "linearspline",      // choose relu/absolute_value/prelu/groupsort/householder/linearspline
        "groupsort_groupsize": 2,              
        "prelu_init": -1,                       // number in [-1, 1] or "maxmin" (half with 1 and other half with -1)
        "lipschitz_constrained": true,
        "spline_init": "identity",              // can be identity/relu/absolute_value/maxmin
        "spline_range": 0.1,
        "spline_size": 51,                      // number of linear regions +1
        "lmbda": 1e-6                           // TV2 reg for the linear splines
    },
    "exp_name": "name_of_exp",
    "log_dir": "denoising_exps/sigma_5",
    "net_params": {
        "bias": true,
        "kernel_size": 3,
        "num_channels": [1, 64, 64, 64, 64, 64, 64, 64, 1]
    },
    "optimizer": {                              
        "lr_spline_coeffs": 1e-06,
        "lr_spline_scaling_coeffs": 1e-05,
        "lr_weights": 4e-05
    },
    "seed": 42,
    "sigma": 5,                                 // noise level
    "training_options": {
        "epochs": 50,
        "batch_size": 128,
        "num_workers": 1,
        "train_data_file": "path/to/train.file",
        "val_data_file": "path/to/val.file"
    }
}

Below we detail the model hyperparameters that can be controlled in the config file configs/config_wasserstein.json.

{
    "net_params": {
        "bias": true,
        "bjorck_iter": 25,                      // nb of iters for the bjorck algorithm to orthonormalize weight matrices
        "layer_sizes": [
            784,
            1024,
            1024,
            1
        ],
        "projection": "orthonormalize",
        "weight_initialization": "orthonormal" 
    },
    "optimizer": {
        "lr_spline_coeffs": 5e-05,
        "lr_spline_scaling_coeffs": 0.0005,
        "lr_weights": 0.002
    },
    "seed": 42,
    "training_options": {
        "batch_size": 4096,
        "epochs": 1600,
        "num_workers": 1,
        "train_dataset_file": "data/mnist/train.pt",
        "val_dataset_file": "data/mnist/val.pt"
    }
}

Below we detail the model hyperparameters that can be controlled in the config file configs/config_wgan.json.

{
    ...
    "generator_activation_fn_params": {
        "activation_type": "relu"               // we only use ReLU for the generator since it is not 1-lipschitz
    },
    "generator_optimizer": {
        "lr_weights": 0.0002
    }
    ...                                           
}

Below we detail the model hyperparameters that can be controlled in the config file configs/config_denoiser.json.

{
    ...
    "net_params": {
        "bias": true,
        "layer_sizes": [1, 10, 10, 10, 1],          // Chooses the number of neurons of every layers
        "spectral_norm": true,
        "alphas": true,
        "weight_initialization": "Xavier_normal",
        "batch_norm": false
    }                                                 
    ...
    "dataset" : {
        "training_dataset_size": 1000,              // Choose number of training point
        "testing_dataset_size": 10000,              // Choose number of validation point
        "function_type": "f2",                      // Choose between [f1, f2, f3, f4, random_spline]
        "nbr_models": 10,                           // Number of models trained, median or mean results will be reported
        "number_knots": 7                           // Number of knots of the random spline      
    }
    ...
}

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
activations		activations
architectures		architectures
configs		configs
dataloader		dataloader
layers		layers
projections		projections
utils		utils
README.md		README.md
environment.yml		environment.yml
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improving Lipschitz-Constrained Neural Networks by Learning Activation Functions

Description

Requirements

Training

Config file details️

About

Releases

Packages

Languages

StanislasDucotterd/Lipschitz_DSNN

Folders and files

Latest commit

History

Repository files navigation

Improving Lipschitz-Constrained Neural Networks by Learning Activation Functions

Description

Requirements

Training

Config file details️

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages