Skip to content

Quantization Results

Gabriel Falcao edited this page Jan 15, 2023 · 4 revisions

Results and hyperparameters used to achieve said results

Outline

  1. MNIST
  2. CIFAR-10
  3. ImageNet

MNIST

LeNet-5

Hyperparameters used to train LeNet-5 with the MNIST dataset and respective accuracy results.

The batch size was set to 128, the number of workers in the data loader was set to 4, the learning rate reduces 10x every 20 epochs, and the accuracy result is the best result on the validation set when training the model for 100 epochs.

The XNOR-Net quantization method uses ReLU after convolutional layers.

* - The quantization method only uses 3 of 4 possible values (ternary value quantization).

Method Quantization Bit Width Optimizer Initial LR Accuracy
Weights Activations
Baseline 32 32 SGD(momentum=0.9, weight_decay=0.0001) 0.01 99.59%
QNN 32 32 Adam(wd=0) 0.001 99.69%
8 8 0.01 99.52%
4 4 0.001 99.55%
2 32 99.63%
1 32 0.01 99.56%
2 2 0.001 99.45%
2 1 98.88%
1 2 0.01 99.22%
1 1 98.90%
DoReFa-Net 32 32 Adam(weight_decay=0) 0.01 99.67%
8 8 SGD(momentum=0.9, weight_decay=0) 0.1 99.69%
4 4 Adam(weight_decay=0) 0.001 99.68%
2 32 99.6%
1 32 0.01 99.69%
2 2 0.001 99.62%
2 1 SGD(momentum=0.9, weight_decay=0) 0.1 99.07%
1 2 99.61%
1 1 99.06%
XNOR-Net 1 32 Adam(weight_decay=0) 0.001 99.63%
1 1 99.37%
TWN 32 32 Adam(weight_decay=0) 0.001 99.63%
2 * 32 SGD(momentum=0.9, weight_decay=0) 0.1 99.63%
TTQ 2 * 32 Adam(weight_decay=0) 0.01 99.47%

CIFAR-10

ResNet-20

Hyperparameters used to train ResNet-20 with the CIFAR-10 dataset and respective accuracy results.

The batch size was set to 128, the number of workers in the data loader was set to 1, the learning rate reduces 10x every 60 epochs, and the accuracy result is the best result on the validation set when training the model for 200 epochs.

The XNOR-Net quantization method uses ReLU after convolutional layers.

* - The quantization method only uses 3 of 4 possible values (ternary value quantization).

Method Quantization Bit Width Optimizer Initial LR Accuracy
Weights Activations
Baseline 32 32 SGD(momentum=0.9, weight_decay=0.0001) 0.1 91.70%
QNN 32 32 Adam(weight_decay=0) 0.01 91.06%
8 8 89.40%
4 4 89.29%
2 32 0.001 89.19%
1 32 0.01 89.80%
2 2 83.14%
2 1 0.001 69.94%
1 2 0.01 82.73%
1 1 58.10%
DoReFa-Net 32 32 SGD(momentum=0.9, weight_decay=0) 0.1 90.40%
8 8 Adam(weight_decay=0) 0.01 89.42%
4 4 0.001 88.27%
2 32 89.56%
1 32 89.42%
2 2 0.01 88.24%
2 1 0.001 62.17%
1 2 0.01 87.11%
1 1 0.001 62.70%
XNOR-Net 1 32 Adam(weight_decay=0) 0.001 88.95%
1 1 77.40%
TWN 32 32 Adam(weight_decay=0) 0.01 90.59%
2 * 32 90.60%
TTQ 2 * 32 Adam(weight_decay=0) 0.001 89.11%

ResNet-50

Hyperparameters used to train ResNet-50 with the CIFAR-10 dataset and respective accuracy results.

The batch size was set to 128, the number of workers in the data loader was set to 1, the learning rate reduces 10x every 60 epochs, and the accuracy result is the best result on the validation set when training the model for 200 epochs.

The XNOR-Net quantization method uses ReLU after convolutional layers.

* - The quantization method only uses 3 of 4 possible values (ternary value quantization).

Method Quantization Bitwidth Optimizer Initial LR Accuracy
Weights Activations
Baseline 32 32 SGD(momentum=0.9, weight_decay=0.0001) 0.1 92.97%
QNN 32 32 Adam(weight_decay=0) 0.01 92.67%
8 8 88.83%
4 4 88.66%
2 32 91.42%
1 32 91.66%
2 2 82.92%
2 1 0.001 72.99%
1 2 0.01 81.93%
1 1 - - -
DoReFa-Net 32 32 Adam(weight_decay=0) 0.001 91.64%
8 8 0.01 89.46%
4 4 89.73%
2 32 91.26%
1 32 0.001 91.37%
2 2 0.01 89.11%
2 1 0.001 68.61%
1 2 0.01 88.78%
1 1 0.001 63.46%
XNOR-Net 1 32 Adam(weight_decay=0) 0.001 90.44%
1 1 81.90%
TWN 32 32 Adam(weight_decay=0) 0.01 92.45%
2 * 32 91.89%
TTQ 2 * 32 Adam(weight_decay=0) 0.001 90.91%

VGG-16

Hyperparameters used to train VGG-16 with the CIFAR-10 dataset and respective accuracy results.

The batch size was set to 128, the number of workers in the data loader was set to 1, the learning rate reduces 10x every 60 epochs, and the accuracy result is the best result on the validation set when training the model for 200 epochs.

The XNOR-Net quantization method uses ReLU after convolutional layers.

* - The quantization method only uses 3 of 4 possible values (ternary value quantization).

Method Quantization Bitwidth Optimizer Initial LR Accuracy
Weights Activations
Baseline 32 32 SGD(momentum=0.9, weight_decay=0.0001) 0.01 93.37%
QNN 32 32 SGD(momentum=0.9, weight_decay=0) 0.01 92.97%
8 8 88.40%
4 4 88.13%
2 32 91.99%
1 32 - - -
2 2 SGD(momentum=0.9, weight_decay=0) 0.001 86.45%
2 1 0.01 35.99%
1 2 - - -
1 1 - - -
DoReFa-Net 32 32 SGD(momentum=0.9, weight_decay=0) 0.01 92.83%
8 8 Adam(weight_decay=0) 0.00001 70.39%
4 4 69.43%
2 32 0.0001 90.84%
1 32 SGD(momentum=0.9, weight_decay=0) 0.01 92.67%
2 2 Adam(weight_decay=0) 0.00001 68.24%
2 1 - - -
1 2 SGD(momentum=0.9, weight_decay=0) 0.1 90.12%
1 1 0.01 73.96%
XNOR-Net 1 32 Adam(weight_decay=0) 0.001 92.23%
1 1 77.98%
TWN 32 32 SGD(momentum=0.9, weight_decay=0) 0.01 92.93%
2 * 32 92.30%
TTQ 2 * 32 - - -

ImageNet

AlexNet

Hyperparameters used to train AlexNet with the ImageNet dataset and respective accuracy results.

The batch size was set to 128, the number of workers in the data loader was set to 8, the learning rate reduces 10x every 30 epochs, and the accuracy result is the best result on the validation set when training the model for 100 epochs.

The XNOR-Net quantization method uses ReLU after convolutional layers.

Method Quantization Bit Width Optimizer Initial LR Accuracy
(top-1
top-5)
Weights Activations
Baseline 32 32 SGD(momentum=0.9, weight_decay=0.0001) 0.01 61.01%
82.95%
QNN 32 32 Adam(weight_decay=0) 0.001 59.26%
81.56%
8 8 55.48%
78.39%
4 4 53.18%
76.62%
2 32 46.44%
71.19%
1 32 52.57%
76.26%
2 2 0.01 37.09%
61.91%
2 1 0.001 29.51%
53.53%
1 2 45.63%
70.01%
1 1 38.42%
62.96%
DoReFa-Net 32 32 Adam(weight_decay=0) 0.0001 53.39%
72.90%
8 8 50.52%
70.56%
4 4 50.35%
70.92%
2 32 51.10%
71.67%
1 32 50.93%
71.47%
2 2 0.001 48.95%
70.38%
2 1 0.00001 30.82%
53.80%
1 2 0.0001 47.96%
70.17%
1 1 0.00001 30.40%
53.29%
XNOR-Net 1 32 SGD(momentum=0.9, weight_decay=0) 0.001 52.06%
74.89%
1 1 42.53%
66.41%
TWN 32 32 Adam(weight_decay=0) 0.001 59.22%
81.60%
2 * 32 55.36%
78.61%
TTQ 2 * 32 Adam(weight_decay=0) 0.00001 43.17%
68.07%

ResNet-18

Hyperparameters used to train ResNet-18 with the ImageNet dataset and respective accuracy results.

The number of workers in the data loader was set to 8, the learning rate reduces 10x every 30 epochs, and the accuracy result is the best result on the validation set when training the model for 100 epochs.

The XNOR-Net quantization method uses ReLU after convolutional layers.

Method Quantization Bit Width Optimizer Initial LR Batch Size Accuracy
(top-1
top-5)
Weights Activations
Baseline 32 32 SGD(momentum=0.9, weight_decay=0.0001) 0.01 128 67.05%
87.66%
QNN 32 32 SGD(momentum=0.9, weight_decay=0) 0.1 128 62.54%
84.45%
8 8 Adam(weight_decay=0) 0.001 61.43%
83.31%
4 4 SGD(momentum=0.9, weight_decay=0) 0.1 60.31%
82.67%
2 32 Adam(weight_decay=0) 0.001 53.46%
77.76%
1 32 59.09%
82.01%
2 2 0.0001 38.00%
63.82%
2 1 - - -
1 2 Adam(weight_decay=0) 0.001 32.95%
58.57%
1 1 15.18%
34.75%
DoReFa-Net 32 32 Adam(weight_decay=0) 0.001 64 62.42%
84.17%
8 8 61.38%
83.29%
4 4 48 61.11%
83.21%
2 32 64 62.09%
83.90%
1 32 62.13%
83.66%
2 2 128 60.73%
82.99%
2 1 0.00001 26.58%
49.24%
1 2 0.001 64 59.53%
82.03%
1 1 0.0001 128 41.17%
65.77%
XNOR-Net 1 32 Adam(weight_decay=0) 0.001 256 61.24%
83.33%
1 1 SGD(momentum=0.9, weight_decay=0) 0.001 49.26%
73.53%
TWN 32 32 SGD(momentum=0.9, weight_decay=0) 0.1 256 62.70%
84.46%
2 * 32 60.59%
82.80%
TTQ 2 * 32 Adam(weight_decay=0) 0.0001 128 51.62%
74.85%

VGG-16 (To Do: Finish)

Hyperparameters used to train VGG-16 with the ImageNet dataset and respective accuracy results.

The number of workers in the data loader was set to ..., the learning rate reduces 10x every ... epochs, and the accuracy result is the best result on the validation set when training the model for ... epochs.

The XNOR-Net quantization method uses ReLU after convolutional layers.

Method Quantization Bit Width Optimizer Initial LR
Weights Activations
QNN 32 32 SGD 0.001
8 8
4 4
2 32
1 32
2 2
2 1
1 2
1 1
DoReFa-Net 32 32 SGD 0.001
8 8
4 4
2 32
1 32 SGD 0.001
2 2
2 1
1 2
1 1 SGD 0.01
XNOR-Net 1 32
1 1
TWN 32 32 SGD 0.001
2 * 32
TTQ 2 * 32