Adacomp

A Zeroth-order Adaptive Learning Rate Method to Reduce Cost of Hyperparameters Tuning for Deep Learning

This provides the code, data, and experiments for article "A Zeroth-order Adaptive Learning Rate Method to Reduce Cost of Hyperparameters Tuning for Deep Learning", which has been submitted to journal Applied Sciences. The method, named Adacomp, adaptively adjusts the learning rate only based on values of loss function. From high abstract, Adacomp penalizes learning rate when loss value decreases and compensates learning rate in the contrast.

Aim

Anyone who is interested in Adacomp can reproduce the experimental results, or makes a further study based on the provided code.

Structure

Each folder contains files main_SGD, main_Momentum, main_Adagrad, main_RMSprop, main_Adadelta, main_Adam, main_Adamax, and main_Adacomp (ours), except that more files are in Code for CIFAR-10.

1. Code for MNIST (10 classification, epochs = 10)

(1) This provides code for Figures 1, 2 (robustness to initial learning rate), Figure 3 (robustness to batch size and network initialization), and Table 1 (convergence speed and computational efficiency). The used network architecture is borrowed from https://github.com/pytorch/examples.

(2) To obtain experimental data in Figures 1, 2, using parallelization method

parallel --eta --jobs 4 python filename.py --lr ::: 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 4.0, 5.0 > log.txt

to reduce time. Or else, run "python filename.py --lr=1e-5" to "python filename.py --lr=5.0" one by one. Here, --lr denotes learning rate, filename is selected from main_SGD, main_Momentum, main_Adagrad, main_RMSprop, main_Adadelta, main_Adam, main_Adamax, and main_Adacomp.

(3) To obtain experimental data in Figure 3, using

parallel --eta --jobs 4 python filename.py --batch-size ::: 16, 32, 64, 128 > log.txt

parallel --eta --jobs 4 python filename.py --seed ::: 16, 32, 64, 128 > log.txt

to reproduce it. Here, batch-size denotes the size of training mini-batch, and seed denotes the network initialization.

(4) Explanations about all parameters are referred to details of code.

(5) Experimental data have been rearranged in file "heat_map_mnist.m". One can directly run it to observe experimental results.

2. Code for KMNIST (10 classification, epochs = 20)

(1) This is used for part of Table 2 and Figure 6, which show the robustness to learning rate when dataset changes.

(2) The operation is similar to "1. Code for MNIST", except that dataset is KMNIST here.

(3) Experimental data have been rearranged in file "heat_map_mnist.m". One can directly run it to observe experimental results.

3. Code for Fashion-MNIST (10 classification, epochs=20)

(1) This is used for part of Table 2 and Figure 6, which show the robustness to learning rate when dataset changes.

(2) The operation similar to Code for MNIST, except using the dataset KMNIST here.

4. Code for CIFAR-10 (10 classification, epochs=100)

(1) This is used for Figures 4, 5, which show the robustness comparison with respect to network architecture. In particular, 6 out of 18 network architectures, LeNet, VGG19, ResNet18, MobileNet, SENet18, and SimpleDLA, are borrowed from https://github.com/kuangliu/pytorch-cifar.

(2) To obtain experimental data in Figures 4 and 5, typing

parallel --eta --jobs 4 python filename.py --lr ::: 0.005, 0.05, 0.6 > log.txt

where the filename is composed of algorithm and architecture. For example, main_Adacomp_LeNet means that using method Adacomp to network architecture LeNet, which is defined in https://github.com/kuangliu/pytorch-cifar.

(3) Experimental data have been rearranged in file "heat_map_cifar10". One can directly run it to observe experimental results.

5. Code for CIFAR-100 (100 classification, epochs=150)

(1) This is used for part of Table 2 and Figure 6, which show the robustness to learning rate when dataset changes. Meanwhile, this is used to compare the computational efficiency. The used network architecture is borrowed from https://github.com/junyuseu/367pytorch-cifar-models.git.

(2) To obtain experimental data in Table 2 and Figure 6, typing

parallel --eta --jobs 4 python filename.py --lr ::: 0.001 0.01 0.1 1 > log.txt

(3) Experimental data have been rearranged in file "comparison.m". One can directly run it to observe experimental results.

6. Code for Supplementary Experiments

This is used for experiments in appendix, including

(1) using accuracy, precision, recall, f1-score, on MNIST, Fashion-MNIST, and CIFAR-10；

(2) comparing Adacomp with genetic algorithm and particle swarm optimization algorithm.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.idea		.idea
1 Code for MNIST		1 Code for MNIST
2 Code for KMNIST		2 Code for KMNIST
3 Code for Fashion-MNIST		3 Code for Fashion-MNIST
4 Code for CIFAR-10		4 Code for CIFAR-10
5 Code for CIFAR-100		5 Code for CIFAR-100
Code for Appendix		Code for Appendix
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

1 Code for MNIST

1 Code for MNIST

2 Code for KMNIST

2 Code for KMNIST

3 Code for Fashion-MNIST

3 Code for Fashion-MNIST

4 Code for CIFAR-10

4 Code for CIFAR-10

5 Code for CIFAR-100

5 Code for CIFAR-100

Code for Appendix

Code for Appendix

README.md

README.md

Repository files navigation

Adacomp

Aim

Structure

1. Code for MNIST (10 classification, epochs = 10)

2. Code for KMNIST (10 classification, epochs = 20)

3. Code for Fashion-MNIST (10 classification, epochs=20)

4. Code for CIFAR-10 (10 classification, epochs=100)

5. Code for CIFAR-100 (100 classification, epochs=150)

6. Code for Supplementary Experiments

About

Releases

Packages

Languages

IoTDATALab/Adacomp

Folders and files

Latest commit

History

Repository files navigation

Adacomp

Aim

Structure

1. Code for MNIST (10 classification, epochs = 10)

2. Code for KMNIST (10 classification, epochs = 20)

3. Code for Fashion-MNIST (10 classification, epochs=20)

4. Code for CIFAR-10 (10 classification, epochs=100)

5. Code for CIFAR-100 (100 classification, epochs=150)

6. Code for Supplementary Experiments

About

Resources

Stars

Watchers

Forks

Languages