REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints

In this work we introduce Resource-Efficient Deep Subnetworks (REDS) to tackle model adaptation to variable resources. In contrast to the state-of-the-art, REDS use structured sparsity constructively by exploiting permutation invariance of neurons, which allows for hardware-specific optimizations. Specifically, REDS achieve computational efficiency by (1) skipping sequential computational blocks identified by a novel iterative knapsack optimizer, and (2) leveraging simple math to re-arrange the order of operations in REDS computational graph to take advantage of the data cache. REDS support conventional deep networks frequently deployed on the edge and provide computational benefits even for small and simple networks. We evaluate REDS on seven benchmark architectures trained on the Visual Wake Words, Google Speech Commands, Fashion-MNIST and CIFAR10 datasets, and test on four off-the-shelf mobile and embedded hardware platforms. We provide a theoretical result and empirical evidence for REDS outstanding performance in terms of submodels' test set accuracy, and demonstrate an adaptation time in response to dynamic resource constraints of under 40 microseconds of models deployed on Arduino Nano 33 BLE Sense through Tensorflow Lite for Microcontrollers.

Software prerequisites

Install the software packages required for reproducing the experiment by running the command: pip3 install -r requirements.txt inside the project folder.

Run the setup.sh script file to create the hierarchy of folders used to store the results of the experiments.

Install the GUROBI solver and obtain a license (see free academic license). To link the license and the solver to the programs you have to pass the arguments: --gurobi_home and --gurobi_license_file to each program. The former points to the absolute path of the installation of Gurobi and the latter to its license.

python kws_ds_convolution.py --gurobi_license_file path/to/license/gurobi.lic --gurobi_home path/to/installation//gurobi/gurobi1002/linux64

Change linux64 with your operating system version/type and remember to set the enviroment variables correctly, see here for more information.

Fine-tuning REDS models

For each program, you can specify the usage of the GPU by passing an id number from the --cuda_device argument. In the default configuration, all the experiments results are stored inside the /logs directory and printed to the screen. For each program, you can specify the solver's maximum running time per iteration by passing the value in seconds to the --solver_time_limit argument. For the DS-CNN size L, the suggested time is at least 3 hours (10800 seconds).

All the individual subnetwork architectures can be trained in isolation by running the _full_training.py files.

Fine-tune DS-CNN models

python kws_ds_convolution.py --gurobi_license_file path/to/license/gurobi.lic --gurobi_home path/to/installation//gurobi/gurobi1002/linux64

To train the REDS DS-CNN S models on CIFAR10 or Fashion-MNIST run for the former the vision_ds_convolution_fashion_mnist.py file and for the latter vision_ds_convolution_cifar10.py file. The pre-trained models are stored in the models/ folder.

Fine-tune DNN models

python kws_dnn.py --gurobi_license_file path/to/license/gurobi.lic --gurobi_home path/to/installation//gurobi/gurobi1002/linux64

Fine-tune CNN models

python kws_convolution_cnn.py --gurobi_license_file path/to/license/gurobi.lic --gurobi_home path/to/installation//gurobi/gurobi1002/linux64

Analysis results on Pixel 6 and Xiaomi Redmi Note 9 Pro

The results obtained from the subnetworks configuration are obtained from the official Google Tensorflow Lite benchmarking tool. From left to right: number of model parameters, model accuracy and model inference as a function of MAC percentage in each REDS subnetwork.

(1) Models size S

(2) Models size L

Visual Wake Words

To download the Visual Wake Words dataset please refer to this github repository. The tf-record files need to be located inside the visualwakeup_aesd/data, you do not need to convert the Tensorflow Object Detection API proto files because they are already provided as python files in the visualwakeup_aesd/lib folder.

Fine-tune MobileNet v1 model

After downloading and convert the Visual Wake Words dataset to tf-record files you can run the MobileNet v1 finetuning. Be careful that the GUROBI solver and Visual Wake Words RAM consumption can take up to 30 GB. Run the following command from the shell:

python knapsack_mobilenetv1_leaky_relu.py --gurobi_license_file path/to/license/gurobi.lic --gurobi_home path/to/installation//gurobi/gurobi1002/linux64

The peak memory usage analysis for the MobileNet v1 backbone is obtained from the tflite tools. The knapsack OR-Tools formulation can be found in the knapsack.py file in the ortools_knapsack_solver_mobilenetv1 function.

To run the solver without the peak memory usage constraint pass the --peak_memory_constraint flag to the python script. Be careful to set a timer to the solver when doing that by passing the flag --solver_time_limit followed by the number of seconds needed (ie. 40000 should be fine, in case is not increase it). Be careful that the GUROBI solver could be take up to 50 GB of RAM during the subnetwork architecture search process.

In all the analysis we conduct the peak memory usage is considered the maximum size in bytes between all the activation maps produced by the model.

TensorFlow Lite for Microcontrollers Analysis

REDS zero-overhead was assessed on Tensorflow Lite for Microcontrollers by implementing the runtime dynamic adaptation of the deployed model and by modifying the fully connected floating point kernel.

Knapsack for depth-wise convolutions

REDS's iterative knapsack for depth-wise convolutions is modelled with OR-Tools and its implementation can be found here.

BibTeX

If you found this repository useful, please consider citing our work.

@article{corti2023reds,
  title={REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints},
  author={Corti, Francesco and Maag, Balz and Schauer, Joachim and Pferschy, Ulrich and Saukh, Olga},
  journal={arXiv preprint arXiv:2311.13349},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
models		models
result/plots		result/plots
utils		utils
visualwakeup_aesd		visualwakeup_aesd
ExtendedAppendix.pdf		ExtendedAppendix.pdf
README.md		README.md
knapsack_mobilenetv1_leaky_relu.py		knapsack_mobilenetv1_leaky_relu.py
kws_convolution_cnn.py		kws_convolution_cnn.py
kws_convolution_depthwise.py		kws_convolution_depthwise.py
kws_convolution_depthwise_full_training.py		kws_convolution_depthwise_full_training.py
kws_convolution_full_training.py		kws_convolution_full_training.py
kws_dnn.py		kws_dnn.py
kws_dnn_full_training.py		kws_dnn_full_training.py
requirements.txt		requirements.txt
setup.sh		setup.sh
vision_ds_convolution_cifar10.py		vision_ds_convolution_cifar10.py
vision_ds_convolution_fashion_mnist.py		vision_ds_convolution_fashion_mnist.py

FraCorti/Deep_Subnetworks_for_Dynamic_Resource_Constraints

Folders and files

Latest commit

History

Repository files navigation

REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints

Software prerequisites

Fine-tuning REDS models

Analysis results on Pixel 6 and Xiaomi Redmi Note 9 Pro

Visual Wake Words

Fine-tune MobileNet v1 model

TensorFlow Lite for Microcontrollers Analysis

Knapsack for depth-wise convolutions

BibTeX

About

Resources

Stars

Watchers

Forks

Languages