Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure accuracy on ResNet50 #7

Closed
4 tasks done
psyhtest opened this issue Feb 19, 2018 · 14 comments
Closed
4 tasks done

Measure accuracy on ResNet50 #7

psyhtest opened this issue Feb 19, 2018 · 14 comments
Assignees

Comments

@psyhtest
Copy link
Contributor

psyhtest commented Feb 19, 2018

$ ck benchmark program:caffe \
  --cmd_key=test_cpu --repetitions=1 --env.CK_CAFFE_BATCH_SIZE=64 \
  --record --record_repo=local \
  --record_uoa=ck-request-asplos18-caffe-intel-accuracy-resnet50 \
  --tags=request,request-asplos18,caffe,intel,accuracy,resnet50
...
{
  "accuracy/top-1": 0.729173, 
  "accuracy/top-5": 0.911852, 
  "post_processed": "yes"
}
@psyhtest psyhtest self-assigned this Feb 19, 2018
@psyhtest
Copy link
Contributor Author

psyhtest commented Feb 19, 2018

To compare with BVLC Caffe with cuDNN:

$ ck install package:lib-caffe-bvlc-master-cudnn-universal
$ ck benchmark program:caffe \
  --cmd_key=test_gpu --repetitions=1 --env.CK_CAFFE_BATCH_SIZE=32 \
  --record --record_repo=local \
  --record_uoa=ck-request-asplos18-caffe-cudnn-accuracy-resnet50 \
  --tags=request,request-asplos18,caffe,cudnn,accuracy,resnet50
...
{
  "accuracy/top-1": 0.729173, 
  "accuracy/top-5": 0.911852, 
  "post_processed": "yes"
}

NB: Due to out of memory, the batch size of 64 could not be used on the NVIDIA GTX1080 (8.0 GB RAM). At the same time, during the Intel Caffe evaluation on ResNet with the batch size of 32, the memory usage was only 2.3 GB.

NB-2: After investigating a bit further, the maximum viable batch size was 24 for the time_gpu command and 39 for the test_gpu command, with the following statistics from nvidia-smi for the latter in the steady state:

$ watch -n 0.1 nvidia-smi
Mon Feb 19 18:43:44 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.12                 Driver Version: 390.12                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:02:00.0 Off |                  N/A |
| 37%   59C    P2   126W / 180W |   7943MiB /  8116MiB |     96%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0    133858      C   caffe                                       7933MiB |
+-----------------------------------------------------------------------------+

@psyhtest
Copy link
Contributor Author

When the calibration tool is supported (#8), we will measure accuracy with quantised weights.

@psyhtest
Copy link
Contributor Author

The measured accuracy as actually higher than the authors reported in the submission. This may be due to them using the default imagenet_mean.binaryproto from BVLC and us using ResNet_mean.binaryproto special for ResNet:

$ ls -la /home/anton/CK_TOOLS/dataset-imagenet-ilsvrc2012-aux/imagenet_mean.binaryproto
-rwxr-xr-x 1 anton anton 786446 Feb 25  2014 /home/anton/CK_TOOLS/dataset-imagenet-ilsvrc2012-aux/imagenet_mean.binaryproto
$ ls -la /home/anton/CK_TOOLS/caffemodel-resnet50/ResNet_mean.binaryproto 
-rw-rw-r-- 1 anton anton 602126 Feb 19 14:52 /home/anton/CK_TOOLS/caffemodel-resnet50/ResNet_mean.binaryproto

@psyhtest
Copy link
Contributor Author

In fact, the special ResNet_mean.binaryproto (224*224*3*4+12) was not used:

name: "ResNet-50"
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 224
    mean_file: "/home/anton/CK_TOOLS/dataset-imagenet-ilsvrc2012-aux/imagenet_mean.binaryproto"
  }
  data_param {
    source: "/data/dataset-imagenet-ilsvrc2012-val-lmdb-dataset.imagenet.val-ilsvrc2012_val_full-resize-224/data"
    batch_size: 64
    backend: LMDB
    prefetch: 2
  }
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    crop_size: 224
    mean_file: "/home/anton/CK_TOOLS/dataset-imagenet-ilsvrc2012-aux/imagenet_mean.binaryproto"
  }
  data_param {
    source: "/data/dataset-imagenet-ilsvrc2012-val-lmdb-dataset.imagenet.val-ilsvrc2012_val_full-resize-224/data"
    batch_size: 64
    backend: LMDB
  }
}

@psyhtest
Copy link
Contributor Author

So, we need to fix two things for ResNet50:

  1. Ensure it uses ResNet_mean.binaryproto
  2. Feed it 224x224 LMDB.

@Chunosov
Copy link
Collaborator

Chunosov commented Feb 21, 2018

Here is the results for ResNet50 with 224px input and ResNet_mean.binaryproto mean:

float32

{
  "accuracy/top-1": 0.707186, 
  "accuracy/top-5": 0.898327, 
  "post_processed": "yes"
}

int8

{
  "accuracy/top-1": 0.704806, 
  "accuracy/top-5": 0.896127, 
  "post_processed": "yes"
}

@psyhtest
Copy link
Contributor Author

So we have validated a minor (0.22-0.24%) loss of accuracy (110-120 images) with int8.

@Chunosov
Copy link
Collaborator

Chunosov commented Feb 21, 2018

Here is the results for ResNet50 with 320px input and 320px mean generated over 500 images:

float32

{
  "accuracy/top-1": 0.731754, 
  "accuracy/top-5": 0.912832, 
  "post_processed": "yes"
}

int8

{
  "accuracy/top-1": 0.727713, 
  "accuracy/top-5": 0.910831, 
  "post_processed": "yes"
}

@psyhtest
Copy link
Contributor Author

The mean was generated over 500 images? Was the evaluation also over 500 images or 50,000 images?

@Chunosov
Copy link
Collaborator

Chunosov commented Feb 21, 2018

Full 50,000 lmdb. It just was not prepared yet when I've generated this mean.
I'll repeat this evaluation using mean from Intel.

@psyhtest
Copy link
Contributor Author

psyhtest commented Feb 21, 2018

So it looks the higher the input resolution, the higher is the accuracy?

float32

Resolution Top-1 Top-5
224 0.707186 0.898327
256 0.729173 0.911852
320 0.731754 0.912832
320' 0.736136 0.914713
Intel 0.725000 0.908700

int8

Resolution Top-1 Top-5
224 0.704806 0.896127
320 0.727713 0.910831
320' 0.734055 0.913712
Intel 0.718400 0.904900

320' values measured with 320px input and using the mean file from Intel
NB: The "Intel" results are from Table 3 in the submitted abstract.

@Chunosov
Copy link
Collaborator

I've added results of evaluation using 320px mean file from Intel into the previous tables.
We could conclude that it's important to have a proper mean file :)

@psyhtest
Copy link
Contributor Author

I also see the test_cpu command going much slower with the i8 version. Perhaps the batch size of 64 is not optimal in this case.

Intriguingly, I see only 10 threads being used in both cases, which is equivalent to the number of cores but is only half of the number of available hyperthreads. Can you confirm please?

@psyhtest
Copy link
Contributor Author

psyhtest commented Mar 19, 2018

Here's the experimental data obtained when following the README and using:

$ ck benchmark program:caffe \
--cmd_key=test_cpu --repetitions=1 \
--env.CK_CAFFE_BATCH_SIZE=64 \
--record --record_repo=local \
--record_uoa=ck-request-asplos18-caffe-intel-accuracy-resnet50 \
--tags=request,request-asplos18,caffe,intel,accuracy,resnet50,vfp32,vfloat32

$ ck benchmark program:caffe \
--cmd_key=test_cpu --repetitions=1 \
--env.CK_CAFFE_BATCH_SIZE=64 \
--record --record_repo=local \
--record_uoa=ck-request-asplos18-caffe-intel-accuracy-resnet50-i8 \
--tags=request,request-asplos18,caffe,intel,accuracy,resnet50,vi8,vint8

$ ck zip local:experiment:ck-request-asplos18-caffe-intel-accuracy-resnet50* \
--archive_name=~/ck-request-asplos18-caffe-intel-accuracy-resnet50.zip
Precision Top 1 Top 5
fp32 0.707186 0.898327
int8 0.704405 0.896507

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants