Skip to content

LayerFolding/Layer-Folding

Repository files navigation

Layer-Folding

Official implementation of "Layer Folding: Neural Network Depth Reduction using Activation Linearization" LF-1

Introduction

This paper presents Layer Folding: an approach to reduce depth of a pre-trained deep neural network by identifying non-linear activations, such as ReLU, Tanh and Sigmoid, that can be removed, and next by folding the consequent linear layers, i.e. fully connected and convolutional, into one linear layer. The depth reduction can lead to smaller run times especially on edge devices. In addition, it can be shown that some tasks are characterized by so-called “Effective Degree of Non-Linearity (EDNL)”, which hints on how much model non-linear activations can be reduced without heavily compromising the model performance.

Requirements

To install requirements:

pip install -r requirements.txt

In addition, you should download pre-trained models, and save them in models directory.

They can be found here.

Run Experiments

You can simply execute the script to train and fold ResNet or VGG on CIFAR10:

python ResNet_Cifar10_prefold.py

or:

python VGG_Cifar10_prefold.py

The hyper-parameters can be controled by adding arguments. For example:

python ResNet_Cifar10_prefold.py -d 20 -e 100 -lr 0.001 -m 0.9 -l 0.25

Where l is a hyperparameter that balances between the task loss and the amount of layers that will be folded (λ), d is the depth of the net and the rest set the training process.

The following scripts take a prefold network and then fold its activations, create a shallower network and finally fine-tune the weights:

python ResNet_Cifar10_posfold.py

or:

python VGG_Cifar10_posfold.py

Examples of prefold networks are available in models directory, for both ResNet20 and VGG16.

Note: For training networks on CIFAR100, you should load the relevant datasets:

train_dataset = torchvision.datasets.CIFAR100(root='data/', train=True, transform=transform, download=True)

test_dataset = torchvision.datasets.CIFAR100(root='data/', train=False, transform=transforms.Compose([transforms.ToTensor(), normalize,]))

And change the number of classes as well:

ResNet:

class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=100)

VGG:

class VGG(nn.Module):
    def __init__(self, features: nn.Module, num_classes: int = 100, init_weights: bool = True)

Results

We utilize our method to optimize networks with respect to both accuracy and efficiency.

We perform our experiments on the ImageNet image classification task and measure the latency of all models on NVIDIA Titan X Pascal GPU. We consider the commonly used MobileNetV2 and EfficientNet-lite. We focus on these models for their attractiveness for hardware and edge devices, mostly credited to their competitive latency and the exclusion of squeeze-and-excite layers employed by other state-of-the-art networks. Folded netwroks and checkpoints can be found under Folded-Mobilenet directory.

MNIST

CIFAR10

CIFAR100

Alpha Progression

Progression of α values corresponding to non-linear layers in ResNet-20 and ResNet-56 throughout the pre-folding phase with λc = 0.25. As expected, all α values are either kept around zero or pushed to one.

ResNet20

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages