Measure accuracy on ResNet50 #7

psyhtest · 2018-02-19T12:37:07Z

Install Intel Caffe from the ReQuEST branch.
Install or detect the full ImageNet validation dataset.
Install the ResNet50 model.
Run the following while selecting all the above if prompted:

$ ck benchmark program:caffe \
  --cmd_key=test_cpu --repetitions=1 --env.CK_CAFFE_BATCH_SIZE=64 \
  --record --record_repo=local \
  --record_uoa=ck-request-asplos18-caffe-intel-accuracy-resnet50 \
  --tags=request,request-asplos18,caffe,intel,accuracy,resnet50
...
{
  "accuracy/top-1": 0.729173, 
  "accuracy/top-5": 0.911852, 
  "post_processed": "yes"
}

The text was updated successfully, but these errors were encountered:

psyhtest · 2018-02-19T16:28:53Z

To compare with BVLC Caffe with cuDNN:

$ ck install package:lib-caffe-bvlc-master-cudnn-universal
$ ck benchmark program:caffe \
  --cmd_key=test_gpu --repetitions=1 --env.CK_CAFFE_BATCH_SIZE=32 \
  --record --record_repo=local \
  --record_uoa=ck-request-asplos18-caffe-cudnn-accuracy-resnet50 \
  --tags=request,request-asplos18,caffe,cudnn,accuracy,resnet50
...
{
  "accuracy/top-1": 0.729173, 
  "accuracy/top-5": 0.911852, 
  "post_processed": "yes"
}

NB: Due to out of memory, the batch size of 64 could not be used on the NVIDIA GTX1080 (8.0 GB RAM). At the same time, during the Intel Caffe evaluation on ResNet with the batch size of 32, the memory usage was only 2.3 GB.

NB-2: After investigating a bit further, the maximum viable batch size was 24 for the time_gpu command and 39 for the test_gpu command, with the following statistics from nvidia-smi for the latter in the steady state:

$ watch -n 0.1 nvidia-smi
Mon Feb 19 18:43:44 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.12                 Driver Version: 390.12                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:02:00.0 Off |                  N/A |
| 37%   59C    P2   126W / 180W |   7943MiB /  8116MiB |     96%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0    133858      C   caffe                                       7933MiB |
+-----------------------------------------------------------------------------+

psyhtest · 2018-02-19T17:06:50Z

When the calibration tool is supported (#8), we will measure accuracy with quantised weights.

psyhtest · 2018-02-19T18:18:41Z

The measured accuracy as actually higher than the authors reported in the submission. This may be due to them using the default imagenet_mean.binaryproto from BVLC and us using ResNet_mean.binaryproto special for ResNet:

$ ls -la /home/anton/CK_TOOLS/dataset-imagenet-ilsvrc2012-aux/imagenet_mean.binaryproto
-rwxr-xr-x 1 anton anton 786446 Feb 25  2014 /home/anton/CK_TOOLS/dataset-imagenet-ilsvrc2012-aux/imagenet_mean.binaryproto
$ ls -la /home/anton/CK_TOOLS/caffemodel-resnet50/ResNet_mean.binaryproto 
-rw-rw-r-- 1 anton anton 602126 Feb 19 14:52 /home/anton/CK_TOOLS/caffemodel-resnet50/ResNet_mean.binaryproto

psyhtest · 2018-02-20T09:59:08Z

In fact, the special ResNet_mean.binaryproto (224*224*3*4+12) was not used:

name: "ResNet-50"
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 224
    mean_file: "/home/anton/CK_TOOLS/dataset-imagenet-ilsvrc2012-aux/imagenet_mean.binaryproto"
  }
  data_param {
    source: "/data/dataset-imagenet-ilsvrc2012-val-lmdb-dataset.imagenet.val-ilsvrc2012_val_full-resize-224/data"
    batch_size: 64
    backend: LMDB
    prefetch: 2
  }
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    crop_size: 224
    mean_file: "/home/anton/CK_TOOLS/dataset-imagenet-ilsvrc2012-aux/imagenet_mean.binaryproto"
  }
  data_param {
    source: "/data/dataset-imagenet-ilsvrc2012-val-lmdb-dataset.imagenet.val-ilsvrc2012_val_full-resize-224/data"
    batch_size: 64
    backend: LMDB
  }
}

psyhtest · 2018-02-20T10:01:18Z

So, we need to fix two things for ResNet50:

Ensure it uses ResNet_mean.binaryproto
Feed it 224x224 LMDB.

Chunosov · 2018-02-21T09:56:08Z

Here is the results for ResNet50 with 224px input and ResNet_mean.binaryproto mean:

float32

{
  "accuracy/top-1": 0.707186, 
  "accuracy/top-5": 0.898327, 
  "post_processed": "yes"
}

int8

{
  "accuracy/top-1": 0.704806, 
  "accuracy/top-5": 0.896127, 
  "post_processed": "yes"
}

psyhtest · 2018-02-21T10:33:10Z

So we have validated a minor (0.22-0.24%) loss of accuracy (110-120 images) with int8.

Chunosov · 2018-02-21T11:03:13Z

Here is the results for ResNet50 with 320px input and 320px mean generated over 500 images:

float32

{
  "accuracy/top-1": 0.731754, 
  "accuracy/top-5": 0.912832, 
  "post_processed": "yes"
}

int8

{
  "accuracy/top-1": 0.727713, 
  "accuracy/top-5": 0.910831, 
  "post_processed": "yes"
}

psyhtest · 2018-02-21T11:05:15Z

The mean was generated over 500 images? Was the evaluation also over 500 images or 50,000 images?

Chunosov · 2018-02-21T11:07:12Z

Full 50,000 lmdb. It just was not prepared yet when I've generated this mean.
I'll repeat this evaluation using mean from Intel.

psyhtest · 2018-02-21T13:04:50Z

So it looks the higher the input resolution, the higher is the accuracy?

float32

Resolution	Top-1	Top-5
224	0.707186	0.898327
256	0.729173	0.911852
320	0.731754	0.912832
320'	0.736136	0.914713
Intel	0.725000	0.908700

int8

Resolution	Top-1	Top-5
224	0.704806	0.896127
320	0.727713	0.910831
320'	0.734055	0.913712
Intel	0.718400	0.904900

320' values measured with 320px input and using the mean file from Intel
NB: The "Intel" results are from Table 3 in the submitted abstract.

Chunosov · 2018-02-22T05:47:31Z

I've added results of evaluation using 320px mean file from Intel into the previous tables.
We could conclude that it's important to have a proper mean file :)

psyhtest · 2018-03-18T19:17:56Z

I also see the test_cpu command going much slower with the i8 version. Perhaps the batch size of 64 is not optimal in this case.

Intriguingly, I see only 10 threads being used in both cases, which is equivalent to the number of cores but is only half of the number of available hyperthreads. Can you confirm please?

psyhtest · 2018-03-19T16:01:49Z

Here's the experimental data obtained when following the README and using:

$ ck benchmark program:caffe \
--cmd_key=test_cpu --repetitions=1 \
--env.CK_CAFFE_BATCH_SIZE=64 \
--record --record_repo=local \
--record_uoa=ck-request-asplos18-caffe-intel-accuracy-resnet50 \
--tags=request,request-asplos18,caffe,intel,accuracy,resnet50,vfp32,vfloat32

$ ck benchmark program:caffe \
--cmd_key=test_cpu --repetitions=1 \
--env.CK_CAFFE_BATCH_SIZE=64 \
--record --record_repo=local \
--record_uoa=ck-request-asplos18-caffe-intel-accuracy-resnet50-i8 \
--tags=request,request-asplos18,caffe,intel,accuracy,resnet50,vi8,vint8

$ ck zip local:experiment:ck-request-asplos18-caffe-intel-accuracy-resnet50* \
--archive_name=~/ck-request-asplos18-caffe-intel-accuracy-resnet50.zip

Precision	Top 1	Top 5
`fp32`	0.707186	0.898327
`int8`	0.704405	0.896507

psyhtest self-assigned this Feb 19, 2018

psyhtest closed this as completed Mar 19, 2018

psyhtest mentioned this issue Apr 16, 2018

ReQuEST accuracy evaluation on ImageNet: ArmCL 18.03 vs. TensorFlow 1.7 ARM-software/ComputeLibrary#431

Closed

psyhtest mentioned this issue May 23, 2018

Formalize input preprocessing mlcommons/training#48

Closed

psyhtest mentioned this issue Jun 14, 2018

Formalize input preprocessing mlcommons/training_policies#30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measure accuracy on ResNet50 #7

Measure accuracy on ResNet50 #7

psyhtest commented Feb 19, 2018 •

edited

Loading

psyhtest commented Feb 19, 2018 •

edited

Loading

psyhtest commented Feb 19, 2018

psyhtest commented Feb 19, 2018

psyhtest commented Feb 20, 2018

psyhtest commented Feb 20, 2018

Chunosov commented Feb 21, 2018 •

edited

Loading

psyhtest commented Feb 21, 2018

Chunosov commented Feb 21, 2018 •

edited

Loading

psyhtest commented Feb 21, 2018

Chunosov commented Feb 21, 2018 •

edited

Loading

psyhtest commented Feb 21, 2018 •

edited by Chunosov

Loading

Chunosov commented Feb 22, 2018

psyhtest commented Mar 18, 2018

psyhtest commented Mar 19, 2018 •

edited

Loading

Measure accuracy on ResNet50 #7

Measure accuracy on ResNet50 #7

Comments

psyhtest commented Feb 19, 2018 • edited Loading

psyhtest commented Feb 19, 2018 • edited Loading

psyhtest commented Feb 19, 2018

psyhtest commented Feb 19, 2018

psyhtest commented Feb 20, 2018

psyhtest commented Feb 20, 2018

Chunosov commented Feb 21, 2018 • edited Loading

float32

int8

psyhtest commented Feb 21, 2018

Chunosov commented Feb 21, 2018 • edited Loading

float32

int8

psyhtest commented Feb 21, 2018

Chunosov commented Feb 21, 2018 • edited Loading

psyhtest commented Feb 21, 2018 • edited by Chunosov Loading

float32

int8

Chunosov commented Feb 22, 2018

psyhtest commented Mar 18, 2018

psyhtest commented Mar 19, 2018 • edited Loading

psyhtest commented Feb 19, 2018 •

edited

Loading

psyhtest commented Feb 19, 2018 •

edited

Loading

Chunosov commented Feb 21, 2018 •

edited

Loading

Chunosov commented Feb 21, 2018 •

edited

Loading

Chunosov commented Feb 21, 2018 •

edited

Loading

psyhtest commented Feb 21, 2018 •

edited by Chunosov

Loading

psyhtest commented Mar 19, 2018 •

edited

Loading