Quantization Results

Results and hyperparameters used to achieve said results

Outline

MNIST

LeNet-5

Hyperparameters used to train LeNet-5 with the MNIST dataset and respective accuracy results.

The batch size was set to 128, the number of workers in the data loader was set to 4, the learning rate reduces 10x every 20 epochs, and the accuracy result is the best result on the validation set when training the model for 100 epochs.

The XNOR-Net quantization method uses ReLU after convolutional layers.

* - The quantization method only uses 3 of 4 possible values (ternary value quantization).

Method	Quantization Bit Width		Optimizer	Initial LR	Accuracy
Method	Weights	Activations	Optimizer	Initial LR	Accuracy
Baseline	32	32	SGD(momentum=0.9, weight_decay=0.0001)	0.01	99.59%
QNN	32	32	Adam(wd=0)	0.001	99.69%
	8	8		0.01	99.52%
	4	4		0.001	99.55%
	2	32		0.001	99.63%
	1	32		0.01	99.56%
	2	2		0.001	99.45%
	2	1		0.001	98.88%
	1	2		0.01	99.22%
	1	1		0.01	98.90%
DoReFa-Net	32	32	Adam(weight_decay=0)	0.01	99.67%
	8	8	SGD(momentum=0.9, weight_decay=0)	0.1	99.69%
	4	4	Adam(weight_decay=0)	0.001	99.68%
	2	32		0.001	99.6%
	1	32		0.01	99.69%
	2	2		0.001	99.62%
	2	1	SGD(momentum=0.9, weight_decay=0)	0.1	99.07%
	1	2			99.61%
	1	1			99.06%
XNOR-Net	1	32	Adam(weight_decay=0)	0.001	99.63%
XNOR-Net	1	1	Adam(weight_decay=0)	0.001	99.37%
TWN	32	32	Adam(weight_decay=0)	0.001	99.63%
TWN	2 *	32	SGD(momentum=0.9, weight_decay=0)	0.1	99.63%
TTQ	2 *	32	Adam(weight_decay=0)	0.01	99.47%

CIFAR-10

ResNet-20

Hyperparameters used to train ResNet-20 with the CIFAR-10 dataset and respective accuracy results.

The batch size was set to 128, the number of workers in the data loader was set to 1, the learning rate reduces 10x every 60 epochs, and the accuracy result is the best result on the validation set when training the model for 200 epochs.

The XNOR-Net quantization method uses ReLU after convolutional layers.

* - The quantization method only uses 3 of 4 possible values (ternary value quantization).

Method	Quantization Bit Width		Optimizer	Initial LR	Accuracy
Method	Weights	Activations	Optimizer	Initial LR	Accuracy
Baseline	32	32	SGD(momentum=0.9, weight_decay=0.0001)	0.1	91.70%
QNN	32	32	Adam(weight_decay=0)	0.01	91.06%
	8	8			89.40%
	4	4			89.29%
	2	32		0.001	89.19%
	1	32		0.01	89.80%
	2	2		0.01	83.14%
	2	1		0.001	69.94%
	1	2		0.01	82.73%
	1	1		0.01	58.10%
DoReFa-Net	32	32	SGD(momentum=0.9, weight_decay=0)	0.1	90.40%
	8	8	Adam(weight_decay=0)	0.01	89.42%
	4	4		0.001	88.27%
	2	32			89.56%
	1	32			89.42%
	2	2		0.01	88.24%
	2	1		0.001	62.17%
	1	2		0.01	87.11%
	1	1		0.001	62.70%
XNOR-Net	1	32	Adam(weight_decay=0)	0.001	88.95%
XNOR-Net	1	1	Adam(weight_decay=0)	0.001	77.40%
TWN	32	32	Adam(weight_decay=0)	0.01	90.59%
TWN	2 *	32	Adam(weight_decay=0)	0.01	90.60%
TTQ	2 *	32	Adam(weight_decay=0)	0.001	89.11%

ResNet-50

Hyperparameters used to train ResNet-50 with the CIFAR-10 dataset and respective accuracy results.

The batch size was set to 128, the number of workers in the data loader was set to 1, the learning rate reduces 10x every 60 epochs, and the accuracy result is the best result on the validation set when training the model for 200 epochs.

The XNOR-Net quantization method uses ReLU after convolutional layers.

* - The quantization method only uses 3 of 4 possible values (ternary value quantization).

Method	Quantization Bitwidth		Optimizer	Initial LR	Accuracy
Method	Weights	Activations	Optimizer	Initial LR	Accuracy
Baseline	32	32	SGD(momentum=0.9, weight_decay=0.0001)	0.1	92.97%
QNN	32	32	Adam(weight_decay=0)	0.01	92.67%
	8	8			88.83%
	4	4			88.66%
	2	32			91.42%
	1	32			91.66%
	2	2			82.92%
	2	1		0.001	72.99%
	1	2		0.01	81.93%
	1	1	-	-	-
DoReFa-Net	32	32	Adam(weight_decay=0)	0.001	91.64%
	8	8		0.01	89.46%
	4	4			89.73%
	2	32			91.26%
	1	32		0.001	91.37%
	2	2		0.01	89.11%
	2	1		0.001	68.61%
	1	2		0.01	88.78%
	1	1		0.001	63.46%
XNOR-Net	1	32	Adam(weight_decay=0)	0.001	90.44%
XNOR-Net	1	1	Adam(weight_decay=0)	0.001	81.90%
TWN	32	32	Adam(weight_decay=0)	0.01	92.45%
TWN	2 *	32	Adam(weight_decay=0)	0.01	91.89%
TTQ	2 *	32	Adam(weight_decay=0)	0.001	90.91%

VGG-16

Hyperparameters used to train VGG-16 with the CIFAR-10 dataset and respective accuracy results.

The batch size was set to 128, the number of workers in the data loader was set to 1, the learning rate reduces 10x every 60 epochs, and the accuracy result is the best result on the validation set when training the model for 200 epochs.

The XNOR-Net quantization method uses ReLU after convolutional layers.

* - The quantization method only uses 3 of 4 possible values (ternary value quantization).

Method	Quantization Bitwidth		Optimizer	Initial LR	Accuracy
Method	Weights	Activations	Optimizer	Initial LR	Accuracy
Baseline	32	32	SGD(momentum=0.9, weight_decay=0.0001)	0.01	93.37%
QNN	32	32	SGD(momentum=0.9, weight_decay=0)	0.01	92.97%
	8	8			88.40%
	4	4			88.13%
	2	32			91.99%
	1	32	-	-	-
	2	2	SGD(momentum=0.9, weight_decay=0)	0.001	86.45%
	2	1	SGD(momentum=0.9, weight_decay=0)	0.01	35.99%
	1	2	-	-	-
	1	1	-	-	-
DoReFa-Net	32	32	SGD(momentum=0.9, weight_decay=0)	0.01	92.83%
	8	8	Adam(weight_decay=0)	0.00001	70.39%
	4	4		0.00001	69.43%
	2	32		0.0001	90.84%
	1	32	SGD(momentum=0.9, weight_decay=0)	0.01	92.67%
	2	2	Adam(weight_decay=0)	0.00001	68.24%
	2	1	-	-	-
	1	2	SGD(momentum=0.9, weight_decay=0)	0.1	90.12%
	1	1	SGD(momentum=0.9, weight_decay=0)	0.01	73.96%
XNOR-Net	1	32	Adam(weight_decay=0)	0.001	92.23%
XNOR-Net	1	1	Adam(weight_decay=0)	0.001	77.98%
TWN	32	32	SGD(momentum=0.9, weight_decay=0)	0.01	92.93%
TWN	2 *	32	SGD(momentum=0.9, weight_decay=0)	0.01	92.30%
TTQ	2 *	32	-	-	-

ImageNet

AlexNet

Hyperparameters used to train AlexNet with the ImageNet dataset and respective accuracy results.

The batch size was set to 128, the number of workers in the data loader was set to 8, the learning rate reduces 10x every 30 epochs, and the accuracy result is the best result on the validation set when training the model for 100 epochs.

The XNOR-Net quantization method uses ReLU after convolutional layers.

Method	Quantization Bit Width		Optimizer	Initial LR	Accuracy (top-1 top-5)
Method	Weights	Activations	Optimizer	Initial LR	Accuracy (top-1 top-5)
Baseline	32	32	SGD(momentum=0.9, weight_decay=0.0001)	0.01	61.01% 82.95%
QNN	32	32	Adam(weight_decay=0)	0.001	59.26% 81.56%
	8	8			55.48% 78.39%
	4	4			53.18% 76.62%
	2	32			46.44% 71.19%
	1	32			52.57% 76.26%
	2	2		0.01	37.09% 61.91%
	2	1		0.001	29.51% 53.53%
	1	2			45.63% 70.01%
	1	1			38.42% 62.96%
DoReFa-Net	32	32	Adam(weight_decay=0)	0.0001	53.39% 72.90%
	8	8			50.52% 70.56%
	4	4			50.35% 70.92%
	2	32			51.10% 71.67%
	1	32			50.93% 71.47%
	2	2		0.001	48.95% 70.38%
	2	1		0.00001	30.82% 53.80%
	1	2		0.0001	47.96% 70.17%
	1	1		0.00001	30.40% 53.29%
XNOR-Net	1	32	SGD(momentum=0.9, weight_decay=0)	0.001	52.06% 74.89%
XNOR-Net	1	1	SGD(momentum=0.9, weight_decay=0)	0.001	42.53% 66.41%
TWN	32	32	Adam(weight_decay=0)	0.001	59.22% 81.60%
TWN	2 *	32	Adam(weight_decay=0)	0.001	55.36% 78.61%
TTQ	2 *	32	Adam(weight_decay=0)	0.00001	43.17% 68.07%

ResNet-18

Hyperparameters used to train ResNet-18 with the ImageNet dataset and respective accuracy results.

The number of workers in the data loader was set to 8, the learning rate reduces 10x every 30 epochs, and the accuracy result is the best result on the validation set when training the model for 100 epochs.

The XNOR-Net quantization method uses ReLU after convolutional layers.

Method	Quantization Bit Width		Optimizer	Initial LR	Batch Size	Accuracy (top-1 top-5)
Method	Weights	Activations	Optimizer	Initial LR	Batch Size	Accuracy (top-1 top-5)
Baseline	32	32	SGD(momentum=0.9, weight_decay=0.0001)	0.01	128	67.05% 87.66%
QNN	32	32	SGD(momentum=0.9, weight_decay=0)	0.1	128	62.54% 84.45%
	8	8	Adam(weight_decay=0)	0.001		61.43% 83.31%
	4	4	SGD(momentum=0.9, weight_decay=0)	0.1		60.31% 82.67%
	2	32	Adam(weight_decay=0)	0.001		53.46% 77.76%
	1	32		0.001		59.09% 82.01%
	2	2		0.0001		38.00% 63.82%
	2	1	-	-		-
	1	2	Adam(weight_decay=0)	0.001		32.95% 58.57%
	1	1	Adam(weight_decay=0)	0.001		15.18% 34.75%
DoReFa-Net	32	32	Adam(weight_decay=0)	0.001	64	62.42% 84.17%
	8	8			64	61.38% 83.29%
	4	4			48	61.11% 83.21%
	2	32			64	62.09% 83.90%
	1	32			64	62.13% 83.66%
	2	2			128	60.73% 82.99%
	2	1		0.00001	128	26.58% 49.24%
	1	2		0.001	64	59.53% 82.03%
	1	1		0.0001	128	41.17% 65.77%
XNOR-Net	1	32	Adam(weight_decay=0)	0.001	256	61.24% 83.33%
XNOR-Net	1	1	SGD(momentum=0.9, weight_decay=0)	0.001	256	49.26% 73.53%
TWN	32	32	SGD(momentum=0.9, weight_decay=0)	0.1	256	62.70% 84.46%
TWN	2 *	32	SGD(momentum=0.9, weight_decay=0)	0.1	256	60.59% 82.80%
TTQ	2 *	32	Adam(weight_decay=0)	0.0001	128	51.62% 74.85%

VGG-16 (To Do: Finish)

Hyperparameters used to train VGG-16 with the ImageNet dataset and respective accuracy results.

The number of workers in the data loader was set to ..., the learning rate reduces 10x every ... epochs, and the accuracy result is the best result on the validation set when training the model for ... epochs.

The XNOR-Net quantization method uses ReLU after convolutional layers.

Method	Quantization Bit Width		Optimizer	Initial LR
Method	Weights	Activations	Optimizer	Initial LR
QNN	32	32	SGD	0.001
	8	8
	4	4
	2	32
	1	32
	2	2
	2	1
	1	2
	1	1
DoReFa-Net	32	32	SGD	0.001
	8	8
	4	4
	2	32
	1	32	SGD	0.001
	2	2
	2	1
	1	2
	1	1	SGD	0.01
XNOR-Net	1	32
XNOR-Net	1	1
TWN	32	32	SGD	0.001
TWN	2 *	32
TTQ	2 *	32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization Results

Outline

MNIST

LeNet-5

CIFAR-10

ResNet-20

ResNet-50

VGG-16

ImageNet

AlexNet

ResNet-18

VGG-16 (To Do: Finish)

Clone this wiki locally