-
Notifications
You must be signed in to change notification settings - Fork 2
Quantization Results
Results and hyperparameters used to achieve said results
Hyperparameters used to train LeNet-5 with the MNIST dataset and respective accuracy results.
The batch size was set to 128, the number of workers in the data loader was set to 4, the learning rate reduces 10x every 20 epochs, and the accuracy result is the best result on the validation set when training the model for 100 epochs.
The XNOR-Net quantization method uses ReLU after convolutional layers.
* - The quantization method only uses 3 of 4 possible values (ternary value quantization).
Method | Quantization Bit Width | Optimizer | Initial LR | Accuracy | |
---|---|---|---|---|---|
Weights | Activations | ||||
Baseline | 32 | 32 | SGD(momentum=0.9, weight_decay=0.0001) | 0.01 | 99.59% |
QNN | 32 | 32 | Adam(wd=0) | 0.001 | 99.69% |
8 | 8 | 0.01 | 99.52% | ||
4 | 4 | 0.001 | 99.55% | ||
2 | 32 | 99.63% | |||
1 | 32 | 0.01 | 99.56% | ||
2 | 2 | 0.001 | 99.45% | ||
2 | 1 | 98.88% | |||
1 | 2 | 0.01 | 99.22% | ||
1 | 1 | 98.90% | |||
DoReFa-Net | 32 | 32 | Adam(weight_decay=0) | 0.01 | 99.67% |
8 | 8 | SGD(momentum=0.9, weight_decay=0) | 0.1 | 99.69% | |
4 | 4 | Adam(weight_decay=0) | 0.001 | 99.68% | |
2 | 32 | 99.6% | |||
1 | 32 | 0.01 | 99.69% | ||
2 | 2 | 0.001 | 99.62% | ||
2 | 1 | SGD(momentum=0.9, weight_decay=0) | 0.1 | 99.07% | |
1 | 2 | 99.61% | |||
1 | 1 | 99.06% | |||
XNOR-Net | 1 | 32 | Adam(weight_decay=0) | 0.001 | 99.63% |
1 | 1 | 99.37% | |||
TWN | 32 | 32 | Adam(weight_decay=0) | 0.001 | 99.63% |
2 * | 32 | SGD(momentum=0.9, weight_decay=0) | 0.1 | 99.63% | |
TTQ | 2 * | 32 | Adam(weight_decay=0) | 0.01 | 99.47% |
Hyperparameters used to train ResNet-20 with the CIFAR-10 dataset and respective accuracy results.
The batch size was set to 128, the number of workers in the data loader was set to 1, the learning rate reduces 10x every 60 epochs, and the accuracy result is the best result on the validation set when training the model for 200 epochs.
The XNOR-Net quantization method uses ReLU after convolutional layers.
* - The quantization method only uses 3 of 4 possible values (ternary value quantization).
Method | Quantization Bit Width | Optimizer | Initial LR | Accuracy | |
---|---|---|---|---|---|
Weights | Activations | ||||
Baseline | 32 | 32 | SGD(momentum=0.9, weight_decay=0.0001) | 0.1 | 91.70% |
QNN | 32 | 32 | Adam(weight_decay=0) | 0.01 | 91.06% |
8 | 8 | 89.40% | |||
4 | 4 | 89.29% | |||
2 | 32 | 0.001 | 89.19% | ||
1 | 32 | 0.01 | 89.80% | ||
2 | 2 | 83.14% | |||
2 | 1 | 0.001 | 69.94% | ||
1 | 2 | 0.01 | 82.73% | ||
1 | 1 | 58.10% | |||
DoReFa-Net | 32 | 32 | SGD(momentum=0.9, weight_decay=0) | 0.1 | 90.40% |
8 | 8 | Adam(weight_decay=0) | 0.01 | 89.42% | |
4 | 4 | 0.001 | 88.27% | ||
2 | 32 | 89.56% | |||
1 | 32 | 89.42% | |||
2 | 2 | 0.01 | 88.24% | ||
2 | 1 | 0.001 | 62.17% | ||
1 | 2 | 0.01 | 87.11% | ||
1 | 1 | 0.001 | 62.70% | ||
XNOR-Net | 1 | 32 | Adam(weight_decay=0) | 0.001 | 88.95% |
1 | 1 | 77.40% | |||
TWN | 32 | 32 | Adam(weight_decay=0) | 0.01 | 90.59% |
2 * | 32 | 90.60% | |||
TTQ | 2 * | 32 | Adam(weight_decay=0) | 0.001 | 89.11% |
Hyperparameters used to train ResNet-50 with the CIFAR-10 dataset and respective accuracy results.
The batch size was set to 128, the number of workers in the data loader was set to 1, the learning rate reduces 10x every 60 epochs, and the accuracy result is the best result on the validation set when training the model for 200 epochs.
The XNOR-Net quantization method uses ReLU after convolutional layers.
* - The quantization method only uses 3 of 4 possible values (ternary value quantization).
Method | Quantization Bitwidth | Optimizer | Initial LR | Accuracy | |
---|---|---|---|---|---|
Weights | Activations | ||||
Baseline | 32 | 32 | SGD(momentum=0.9, weight_decay=0.0001) | 0.1 | 92.97% |
QNN | 32 | 32 | Adam(weight_decay=0) | 0.01 | 92.67% |
8 | 8 | 88.83% | |||
4 | 4 | 88.66% | |||
2 | 32 | 91.42% | |||
1 | 32 | 91.66% | |||
2 | 2 | 82.92% | |||
2 | 1 | 0.001 | 72.99% | ||
1 | 2 | 0.01 | 81.93% | ||
1 | 1 | - | - | - | |
DoReFa-Net | 32 | 32 | Adam(weight_decay=0) | 0.001 | 91.64% |
8 | 8 | 0.01 | 89.46% | ||
4 | 4 | 89.73% | |||
2 | 32 | 91.26% | |||
1 | 32 | 0.001 | 91.37% | ||
2 | 2 | 0.01 | 89.11% | ||
2 | 1 | 0.001 | 68.61% | ||
1 | 2 | 0.01 | 88.78% | ||
1 | 1 | 0.001 | 63.46% | ||
XNOR-Net | 1 | 32 | Adam(weight_decay=0) | 0.001 | 90.44% |
1 | 1 | 81.90% | |||
TWN | 32 | 32 | Adam(weight_decay=0) | 0.01 | 92.45% |
2 * | 32 | 91.89% | |||
TTQ | 2 * | 32 | Adam(weight_decay=0) | 0.001 | 90.91% |
Hyperparameters used to train VGG-16 with the CIFAR-10 dataset and respective accuracy results.
The batch size was set to 128, the number of workers in the data loader was set to 1, the learning rate reduces 10x every 60 epochs, and the accuracy result is the best result on the validation set when training the model for 200 epochs.
The XNOR-Net quantization method uses ReLU after convolutional layers.
* - The quantization method only uses 3 of 4 possible values (ternary value quantization).
Method | Quantization Bitwidth | Optimizer | Initial LR | Accuracy | |
---|---|---|---|---|---|
Weights | Activations | ||||
Baseline | 32 | 32 | SGD(momentum=0.9, weight_decay=0.0001) | 0.01 | 93.37% |
QNN | 32 | 32 | SGD(momentum=0.9, weight_decay=0) | 0.01 | 92.97% |
8 | 8 | 88.40% | |||
4 | 4 | 88.13% | |||
2 | 32 | 91.99% | |||
1 | 32 | - | - | - | |
2 | 2 | SGD(momentum=0.9, weight_decay=0) | 0.001 | 86.45% | |
2 | 1 | 0.01 | 35.99% | ||
1 | 2 | - | - | - | |
1 | 1 | - | - | - | |
DoReFa-Net | 32 | 32 | SGD(momentum=0.9, weight_decay=0) | 0.01 | 92.83% |
8 | 8 | Adam(weight_decay=0) | 0.00001 | 70.39% | |
4 | 4 | 69.43% | |||
2 | 32 | 0.0001 | 90.84% | ||
1 | 32 | SGD(momentum=0.9, weight_decay=0) | 0.01 | 92.67% | |
2 | 2 | Adam(weight_decay=0) | 0.00001 | 68.24% | |
2 | 1 | - | - | - | |
1 | 2 | SGD(momentum=0.9, weight_decay=0) | 0.1 | 90.12% | |
1 | 1 | 0.01 | 73.96% | ||
XNOR-Net | 1 | 32 | Adam(weight_decay=0) | 0.001 | 92.23% |
1 | 1 | 77.98% | |||
TWN | 32 | 32 | SGD(momentum=0.9, weight_decay=0) | 0.01 | 92.93% |
2 * | 32 | 92.30% | |||
TTQ | 2 * | 32 | - | - | - |
Hyperparameters used to train AlexNet with the ImageNet dataset and respective accuracy results.
The batch size was set to 128, the number of workers in the data loader was set to 8, the learning rate reduces 10x every 30 epochs, and the accuracy result is the best result on the validation set when training the model for 100 epochs.
The XNOR-Net quantization method uses ReLU after convolutional layers.
Method | Quantization Bit Width | Optimizer | Initial LR | Accuracy (top-1 top-5) |
|
---|---|---|---|---|---|
Weights | Activations | ||||
Baseline | 32 | 32 | SGD(momentum=0.9, weight_decay=0.0001) | 0.01 | 61.01% 82.95% |
QNN | 32 | 32 | Adam(weight_decay=0) | 0.001 | 59.26% 81.56% |
8 | 8 | 55.48% 78.39% |
|||
4 | 4 | 53.18% 76.62% |
|||
2 | 32 | 46.44% 71.19% |
|||
1 | 32 | 52.57% 76.26% |
|||
2 | 2 | 0.01 | 37.09% 61.91% |
||
2 | 1 | 0.001 | 29.51% 53.53% |
||
1 | 2 | 45.63% 70.01% |
|||
1 | 1 | 38.42% 62.96% |
|||
DoReFa-Net | 32 | 32 | Adam(weight_decay=0) | 0.0001 | 53.39% 72.90% |
8 | 8 | 50.52% 70.56% |
|||
4 | 4 | 50.35% 70.92% |
|||
2 | 32 | 51.10% 71.67% |
|||
1 | 32 | 50.93% 71.47% |
|||
2 | 2 | 0.001 | 48.95% 70.38% |
||
2 | 1 | 0.00001 | 30.82% 53.80% |
||
1 | 2 | 0.0001 | 47.96% 70.17% |
||
1 | 1 | 0.00001 | 30.40% 53.29% |
||
XNOR-Net | 1 | 32 | SGD(momentum=0.9, weight_decay=0) | 0.001 | 52.06% 74.89% |
1 | 1 | 42.53% 66.41% |
|||
TWN | 32 | 32 | Adam(weight_decay=0) | 0.001 | 59.22% 81.60% |
2 * | 32 | 55.36% 78.61% |
|||
TTQ | 2 * | 32 | Adam(weight_decay=0) | 0.00001 | 43.17% 68.07% |
Hyperparameters used to train ResNet-18 with the ImageNet dataset and respective accuracy results.
The number of workers in the data loader was set to 8, the learning rate reduces 10x every 30 epochs, and the accuracy result is the best result on the validation set when training the model for 100 epochs.
The XNOR-Net quantization method uses ReLU after convolutional layers.
Method | Quantization Bit Width | Optimizer | Initial LR | Batch Size | Accuracy (top-1 top-5) |
|
---|---|---|---|---|---|---|
Weights | Activations | |||||
Baseline | 32 | 32 | SGD(momentum=0.9, weight_decay=0.0001) | 0.01 | 128 | 67.05% 87.66% |
QNN | 32 | 32 | SGD(momentum=0.9, weight_decay=0) | 0.1 | 128 | 62.54% 84.45% |
8 | 8 | Adam(weight_decay=0) | 0.001 | 61.43% 83.31% |
||
4 | 4 | SGD(momentum=0.9, weight_decay=0) | 0.1 | 60.31% 82.67% |
||
2 | 32 | Adam(weight_decay=0) | 0.001 | 53.46% 77.76% |
||
1 | 32 | 59.09% 82.01% |
||||
2 | 2 | 0.0001 | 38.00% 63.82% |
|||
2 | 1 | - | - | - | ||
1 | 2 | Adam(weight_decay=0) | 0.001 | 32.95% 58.57% |
||
1 | 1 | 15.18% 34.75% |
||||
DoReFa-Net | 32 | 32 | Adam(weight_decay=0) | 0.001 | 64 | 62.42% 84.17% |
8 | 8 | 61.38% 83.29% |
||||
4 | 4 | 48 | 61.11% 83.21% |
|||
2 | 32 | 64 | 62.09% 83.90% |
|||
1 | 32 | 62.13% 83.66% |
||||
2 | 2 | 128 | 60.73% 82.99% |
|||
2 | 1 | 0.00001 | 26.58% 49.24% |
|||
1 | 2 | 0.001 | 64 | 59.53% 82.03% |
||
1 | 1 | 0.0001 | 128 | 41.17% 65.77% |
||
XNOR-Net | 1 | 32 | Adam(weight_decay=0) | 0.001 | 256 | 61.24% 83.33% |
1 | 1 | SGD(momentum=0.9, weight_decay=0) | 0.001 | 49.26% 73.53% |
||
TWN | 32 | 32 | SGD(momentum=0.9, weight_decay=0) | 0.1 | 256 | 62.70% 84.46% |
2 * | 32 | 60.59% 82.80% |
||||
TTQ | 2 * | 32 | Adam(weight_decay=0) | 0.0001 | 128 | 51.62% 74.85% |
Hyperparameters used to train VGG-16 with the ImageNet dataset and respective accuracy results.
The number of workers in the data loader was set to ..., the learning rate reduces 10x every ... epochs, and the accuracy result is the best result on the validation set when training the model for ... epochs.
The XNOR-Net quantization method uses ReLU after convolutional layers.
Method | Quantization Bit Width | Optimizer | Initial LR | |
---|---|---|---|---|
Weights | Activations | |||
QNN | 32 | 32 | SGD | 0.001 |
8 | 8 | |||
4 | 4 | |||
2 | 32 | |||
1 | 32 | |||
2 | 2 | |||
2 | 1 | |||
1 | 2 | |||
1 | 1 | |||
DoReFa-Net | 32 | 32 | SGD | 0.001 |
8 | 8 | |||
4 | 4 | |||
2 | 32 | |||
1 | 32 | SGD | 0.001 | |
2 | 2 | |||
2 | 1 | |||
1 | 2 | |||
1 | 1 | SGD | 0.01 | |
XNOR-Net | 1 | 32 | ||
1 | 1 | |||
TWN | 32 | 32 | SGD | 0.001 |
2 * | 32 | |||
TTQ | 2 * | 32 |