<h1>Langevin algorithms for very deep Neural Networks with applications to image classification</h1>

We show how to:
<ol>
    <li>Use Langevin optimizers and Layer Langevin optimizers for any Tensorflow model training</li>
    <li>Use our framework for comparing differents optimizers on a same image classification (or general) problem</li>
</ol>

In [None]:
import tensorflow as tf

<h2>1) Langevin Optimizers</h2>

Optimizers in the <tt>optimizers</tt> directory can be directly used as instances of the TensorFlow <tt>Optimizer</tt> base class.

In [None]:
from optimizers.ladam import LAdam, LayerLAdam
from optimizers.lrmsprop import LRMSprop, LayerLRMSprop
from optimizers.ladadelta import LAdadelta, LayerLAdadelta

optimizer = LAdam(learning_rate=1e-3, sigma=1e-3)

Schedules from <tt>tf.keras.optimizers.schedules</tt> may be passed to the arguments <tt>learning_rate</tt> and to <tt>sigma</tt>.

In [None]:
lr_schedule = tf.keras.optimizers.schedules.PiecewiseConstantDecay(
    boundaries=[100], values=[1e-3,1e-4])
sigma_schedule = tf.keras.optimizers.schedules.PiecewiseConstantDecay(
    boundaries=[100], values=[1e-3,0.])

optimizer = LAdam(learning_rate=lr_schedule, sigma=sigma_schedule)

<b>Layer Langevin optimizers</b>

The argument <tt>langevin_layers</tt> specify the layers of the model that are trained with Langevin noise.

When using Layer Langevin optimizers, the function <tt>set_langevin(model)</tt> must be used after the <tt>model</tt> is compiled with this optimizer and built.

In [None]:
from optimizers.base import set_langevin

optimizer = LayerLAdam(learning_rate=1e-3, sigma=1e-3, langevin_layers=[0,1])

model = tf.keras.Sequential([
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(10)
])

model.compile(optimizer=optimizer, loss=tf.keras.losses.MeanSquaredError())
model(tf.random.normal((1,5))) # build the model
set_langevin(model)

<h2>2) Running an experiment for Image Classification</h2>

<b>Model Builder:</b>

A <tt>ModelBuilder</tt> is required for the <tt>experiment</tt>.
We provide different <tt>ModelBuilder</tt>s: <tt>DenseModel</tt>, <tt>ConvDense</tt>, <tt>HighwayModel</tt>, <tt>ResNet</tt>, <tt>DenseNet</tt>.
Each <tt>ModelBuilder</tt> has the method <tt>getModel()</tt> which returns a TensorFlow model.

In [None]:
from models.dense import DenseModel
from models.conv_dense import ConvDense
from models.highway import HighwayModel
from models.resnet import ResNet
from models.densenet import DenseNet

<b><tt>DenseModel(nb_units, classes)</tt></b>: Fully connected model.
<ul>
    <li><tt>nb_units</tt>: list of units in each hidden layer; each hidden layer has ReLU activation</li>
    <li><tt>classes</tt>: number of classes for the output.</li>
</ul>


<b><tt>ConvDense(nb_units, classes)</tt></b>: Convolutional layers before a fully connected layers.
<ul>
    <li><tt>nb_conv</tt>: number of 2D convolutional layers</li>
    <li><tt>filters</tt>, <tt>kernel_size</tt>: parameters of the 2D convolutional layers</li>
    <li><tt>nb_units</tt>: list of units in each hidden dense layer; each hidden layer has ReLU activation</li>
    <li><tt>classes</tt>: number of classes for the output</li>
</ul>


<b><tt>HighwayModel(nb_units, classes)</tt></b>: Same as <tt>DenseModel(nb_units, classes)</tt> but the dense hidden layers are replaced with highway layers and one dense layer is used before the highway layers. All the hidden layers must have the same number of units.


<b><tt>ResNet(input_shape,filters,block_layers,hidden_units,classes,zero_padding,mode)</tt></b>:
<ul>
    <li><tt>input_shape</tt>: 3-tuple of (width, height, channels)</li>
    <li><tt>filters</tt>: initial number of filters in the ResNet architecture (multiplied by two at every new block)</li>
    <li><tt>block_layers</tt>: list of number of residual layers in each block</li>
    <li><tt>hidden_units</tt></li>
    <li><tt>classes</tt>: number of classes for the output</li>
    <li><tt>zero_padding</tt>: 2-tuple for zero padding in each direction</li>
    <li><tt>mode</tt>: either <tt>"resnet"</tt> or <tt>"vgg"</tt>; if <tt>"vgg"</tt> then removes the residual connections</li>
</ul>


<b><tt>DenseNet(input_shape, classes)</tt></b>: DenseNet-121 architecture. <tt>input_shape</tt> is a 3-tuple and width and height should be no smaller than 32.

In [None]:
model_builder = ResNet(
    input_shape=(32, 32, 3),
    filters=16,
    block_layers=[5,5,5],
    hidden_units=512,
    classes=10,
    zero_padding=(0, 0),
    mode='resnet')

<b>Dataloader</b> with

<tt>ImageLoader(dataset_name,batch_size,rescale,augment)</tt>.
<ul>
    <li><tt>dataset_name</tt>: either "mnist", "cifar10" or "cifar100", or any other dataset available in <tt>tensorflow_datasets</tt></li>
    <li><tt>rescale</tt>, <tt>augment</tt>: tensorflow batchable functions that takes as argument a batch of images and returns the rescaled and augmented images
</ul>

In [None]:
from dataloaders import ImageLoader

batch_size = 512
def rescale(x):
    x = tf.cast(x, tf.float32)
    return x/255.

def augment(x):
    x = tf.image.resize_with_crop_or_pad(x, 32 + 4, 32 + 4)
    x = tf.image.random_crop(x,[32,32,3])
    x = tf.image.random_flip_left_right(x)
    return x

dataloader = ImageLoader(
    dataset_name = 'cifar10',
    batch_size = batch_size,
    rescale = rescale,
    augment = augment)

Then build the <tt>experiment</tt> with additional arguments:
<ul>
    <li><tt>optimizers</tt>: list of tensorflow optimizers to compare</li>
    <li><tt>base</tt>: path to save the results</li>
<ul>

In [None]:
from experiment import Experiment

experiment = Experiment(
    model_builder=model_builder,
    dataloader=dataloader,
    EPOCHS=20,
    optimizers=[LAdam(learning_rate=1e-3, sigma=1e-3),
                LAdam(learning_rate=1e-3, sigma=0.)],
    base='./')

<ul>
    <li><tt>experiment.load_data()</tt>: load the data according to the dataloader; must be used before any training</li>
    <li><tt>experiment.run_experiment()</tt>: train the model for each optimizer</li>
    <li><tt>experiment.plot()</tt>: plot the training curves for each optimizer</li>
    <li><tt>experiment.save_data(dir)</tt>: save the training curves as <tt>csv</tt> files in the <tt>experiment.base/dir</tt> directory (the directory is created)</li>
</ul>

In [None]:
experiment.load_data()
experiment.run_experiment()
experiment.plot()

experiment.save_data('cifar10_adam')