Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReQuEST accuracy evaluation on ImageNet: ArmCL 18.03 vs. TensorFlow 1.7 #431

Closed
psyhtest opened this issue Apr 16, 2018 · 8 comments
Closed
Labels

Comments

@psyhtest
Copy link

psyhtest commented Apr 16, 2018

While evaluating MobileNets using ArmCL for the ReQuEST@ASPLOS'18 competition, we observed noticeable discrepancies in predictions of ArmCL 18.03 vs. TensorFlow 1.7. We used the same pretrained MobileNets-v1 weights shared by Google in 2017.

For example, the standard accuracy metrics (Top 1 / Top 5 on the ImageNet validation set of 50,000 images in percent) of 4 models (with the width multiplier of 1.0) are given in the following tables.

Top 1

Model TensorFlow 1.1 (?) (claimed) TensorFlow 1.7 (measured) ArmCL 18.03 (measured)
MobileNet_v1_1.0_224 70.700 70.466 70.464
MobileNet_v1_1.0_192 69.300 68.824 68.830
MobileNet_v1_1.0_160 67.200 66.504 66.458
MobileNet_v1_1.0_128 64.100 63.580 63.586

Top 5

Model TensorFlow 1.1 (?) (claimed) TensorFlow 1.7 (measured) ArmCL 18.03 (measured)
MobileNet_v1_1.0_224 89.500 89.410 89.398
MobileNet_v1_1.0_192 88.900 88.466 88.474
MobileNet_v1_1.0_160 87.500 87.084 87.088
MobileNet_v1_1.0_128 85.300 84.928 84.940
@psyhtest
Copy link
Author

We first noticed accuracy degradation (up to 0.5%) when comparing the ArmCL results with the TensorFlow accuracy figures claimed by Google researchers. However, when we measured the TensorFlow accuracy ourselves using the same methodology as for ArmCL, it became clear that degradation is probably due to differences in input preprocessing (e.g. resizing to a larger input resolution and then cropping to the model resolution; cf. with the ReQuEST artifact from Intel).

@psyhtest
Copy link
Author

psyhtest commented Apr 16, 2018

Now, a fraction of a percent might not sound like much (e.g. 0.002% means just 1 image out of 50,000, 0.006% - 3 images, etc.) but this is smoothed out over 50,000 images. In fact, for the first 500 images, our ArmCL based implementation disagrees with TensorFlow (and the correct label) about 3 images:

  1. ILSVRC2012_val_00000060.JPEG
$ ck run program:image-classification-tf-py \
--env.CK_IMAGE_FILE=/home/anton/CK_TOOLS/dataset-imagenet-ilsvrc2012-val-min/ILSVRC2012_val_00000060.JPEG

---------------------------------------
ILSVRC2012_val_00000060.JPEG - (588) n03482405 hamper
0.47 - (588) n03482405 hamper
0.43 - (492) n03014705 chest
0.02 - (626) n03666591 lighter, light, igniter, ignitor
0.01 - (526) n03179701 desk
0.01 - (681) n03832673 notebook, notebook computer
---------------------------------------

$ ck run program:mobilenets-armcl-opencl \
--env.CK_IMAGE_FILE=/home/anton/CK_TOOLS/dataset-imagenet-ilsvrc2012-val-min/ILSVRC2012_val_00000060.JPEG

---------------------------------------
ILSVRC2012_val_00000060.JPEG - (588) n03482405 hamper
0.45 - (492) n03014705 chest
0.45 - (588) n03482405 hamper
0.02 - (626) n03666591 lighter, light, igniter, ignitor
0.01 - (526) n03179701 desk
0.01 - (681) n03832673 notebook, notebook computer
---------------------------------------
  1. ILSVRC2012_val_00000302.JPEG
$ ck run program:image-classification-tf-py \
--env.CK_IMAGE_FILE=/home/anton/CK_TOOLS/dataset-imagenet-ilsvrc2012-val-min/ILSVRC2012_val_00000302.JPEG

---------------------------------------
ILSVRC2012_val_00000302.JPEG - (469) n02939185 caldron, cauldron
0.47 - (469) n02939185 caldron, cauldron
0.47 - (926) n07590611 hot pot, hotpot
0.02 - (925) n07584110 consomme
0.01 - (996) n13052670 hen-of-the-woods, hen of the woods, Poly...
0.01 - (809) n04263257 soup bowl
---------------------------------------

$ ck run program:mobilenets-armcl-opencl \
--env.CK_IMAGE_FILE=/home/anton/CK_TOOLS/dataset-imagenet-ilsvrc2012-val-min/ILSVRC2012_val_00000302.JPEG

---------------------------------------
ILSVRC2012_val_00000302.JPEG - (469) n02939185 caldron, cauldron
0.48 - (926) n07590611 hot pot, hotpot
0.47 - (469) n02939185 caldron, cauldron
0.02 - (925) n07584110 consomme
0.01 - (996) n13052670 hen-of-the-woods, hen of the woods, Poly...
0.01 - (809) n04263257 soup bowl
---------------------------------------
  1. ILSVRC2012_val_00000313.JPEG
$ ck run program:image-classification-tf-py \
--env.CK_IMAGE_FILE=/home/anton/CK_TOOLS/dataset-imagenet-ilsvrc2012-val-min/ILSVRC2012_val_00000313.JPEG

---------------------------------------
ILSVRC2012_val_00000313.JPEG - (979) n09468604 valley, vale
0.17 - (979) n09468604 valley, vale
0.17 - (984) n11879895 rapeseed
0.12 - (978) n09428293 seashore, coast, seacoast, sea-coast
0.09 - (525) n03160309 dam, dike, dyke
0.07 - (975) n09332890 lakeside, lakeshore
---------------------------------------

$ ck run program:mobilenets-armcl-opencl \
--env.CK_IMAGE_FILE=/home/anton/CK_TOOLS/dataset-imagenet-ilsvrc2012-val-min/ILSVRC2012_val_00000313.JPEG

---------------------------------------
ILSVRC2012_val_00000313.JPEG - (979) n09468604 valley, vale
0.18 - (984) n11879895 rapeseed
0.17 - (979) n09468604 valley, vale
0.11 - (978) n09428293 seashore, coast, seacoast, sea-coast
0.09 - (525) n03160309 dam, dike, dyke
0.07 - (483) n02980441 castle
---------------------------------------

It's peculiar that for ILSVRC2012_val_00000060.JPEG TensorFlow separates "hamper" and "chest" considerably (47% vs. 43% probability), while ArmCL gives them roughly the same probability (45%). On the other hand, when TensorFlow gives roughly the same probability to the other two images, ArmCL separates them more (but pushes the incorrect answer to the top).

@AnthonyBarbier
Copy link
Contributor

Hi Anton,
I only went quickly through the "MobileNets using ArmCL" repo as I was triaging this bug and didn't find the answer: how is ACL integrated with TensorFlow to run the graphs ? Is it using the graph API ? Hand written networks ? Through Android NN ?

@psyhtest
Copy link
Author

psyhtest commented Apr 16, 2018

hi @AnthonyARM, the ArmCL based implementation is built on top of your graph API example. The TensorFlow implementation is a separate Python program (which uses exactly the same pretrained weights though).

I'll provide full instructions how to reproduce this issue shortly using Collective Knowledge.

@gmiodice
Copy link
Contributor

Hi @psyhtest

many thanks for reporting this. Do you know if you have the same discrepancy for both NEON and CL backend?

Thanks

@psyhtest
Copy link
Author

psyhtest commented Apr 17, 2018

Please find ArmCL instructions below. (TensorFlow instructions to follow shortly.)

Please me know if anything is unclear (or ask a friendly person from @sztaylor's team:)).

Installing artifact dependencies

$ sudo apt install python python-pip
$ sudo apt install libblas-dev liblapack-dev libatlas-base-dev
$ sudo python-numpy python-scipy
$ sudo pip install pillow
$ sudo pip install ck

Detecting GCC, Python, OpenCL

$ ck detect soft:compiler.gcc
$ ck detect soft:compiler.python
$ ck detect platform.gpgpu --opencl

Installing the MobileNets artifact

$ ck pull repo --url=https://github.com/dividiti/ck-request-asplos18-mobilenets-armcl-opencl
$ ck install ck-env:package:imagenet-2012-aux
$ ck install ck-env:package:imagenet-2012-val-min-resized
$ ck install ck-math:package:lib-armcl-opencl-18.03 --env.USE_GRAPH=ON --env.USE_NEON=ON
$ ck install ck-request-asplos18-mobilenets-armcl-opencl:package:weights-mobilenet-v1-1.0-224-npy

Running the classification program

To classify ILSVRC2012_val_00000001.JPEG:

$ ck compile program:mobilenets-armcl-opencl
$ ck run program:mobilenets-armcl-opencl

To classify another image (e.g. ILSVRC2012_val_00000060.JPEG):

$ ck virtual env --tags=imagenet,small_dataset
$ ck run program:mobilenets-armcl-opencl \
--env.CK_IMAGE_FILE=$CK_ENV_DATASET_IMAGENET_VAL/ILSVRC2012_val_00000060.JPEG
...
  (run ...)
executing code ...
Kernel path: /home/anton/CK_TOOLS/lib-armcl-opencl-18.03-gcc-6.3.0-linux-64/src/src/core/CL/cl_kernels/
Image list file: ../images-224-1-1.txt
Image count in file: 1
Batch list file: ../batches-224-1-1.txt
Batch count in file: 1

Prepare graph...

Run graph...

Batch 1 of 1
File: ../batches-224-1-1/ILSVRC2012_val_00000060.JPEG.npy
Loaded in 0.0215138 s
Classified in 0.0713635s
Test passed
-------------------------------
Graph loaded in 1.36456 s
All batches loaded in 0.0215138 s
All batches classified in 0.0713635 s
Average classification time: 0.0713635 s
-------------------------------

  (post processing: "python /home/anton/CK_REPOS/ck-request-asplos18-mobilenets-armcl-opencl/program/mobilenets-armcl-opencl/postprocess.py"

--------------------------------
Process results in predictions
---------------------------------------
ILSVRC2012_val_00000060.JPEG - (588) n03482405 hamper
0.45 - (492) n03014705 chest
0.45 - (588) n03482405 hamper
0.02 - (626) n03666591 lighter, light, igniter, ignitor
0.01 - (526) n03179701 desk
0.01 - (681) n03832673 notebook, notebook computer
---------------------------------------
Accuracy top 1: 0.000000 (0 of 1)
Accuracy top 5: 1.000000 (1 of 1)
--------------------------------

  (reading fine grain timers from tmp-ck-timer.json ...)

{
  "accuracy_top1": 0.0,
  "accuracy_top5": 1.0,
  "execution_time": 0.071363,
  "execution_time_sum": 1.45744,
  "frame_predictions": [
    {
      "accuracy_top1": "no",
      "accuracy_top5": "yes",
      "class_correct": 588,
      "class_topmost": 492,
      "file_name": "ILSVRC2012_val_00000060.JPEG"
    }
  ],
  "images_load_time_avg_s": 0.021514,
  "images_load_time_s": 0.021514,
  "prediction_time_avg_s": 0.071363,
  "prediction_time_total_s": 0.071363,
  "setup_time_s": 1.364563,
  "test_time_s ": 0.103095
}

Execution time: 0.071 sec.

@psyhtest
Copy link
Author

@gmiodice I am afraid I don't know. Our ArmCL program currently supports only OpenCL. It should not be hard to extend it to support Neon.

@psyhtest
Copy link
Author

psyhtest commented Apr 17, 2018

Please find TensorFlow instructions below (assuming CK has been installed as per ArmCL instructions above.)

Installing artifact dependencies

$ sudo apt install liblapack-dev libatlas-dev
$ sudo pip install enum34 mock pillow wheel absl-py scipy
$ ck install ck-env:package:tool-bazel-0.11.1-linux

Installing the artifact

$ ck pull repo:ck-tensorflow
$ ck install package:tensorflowmodel-mobilenet-v1-1.0-224-py
$ ck install package:lib-tensorflow-1.7.0-src-cpu [--env.CK_HOST_CPU_NUMBER_OF_PROCESSORS=1]

NB: You may want to restrict the number of build threads to 1 or 2 on a dev board with < 4 GB RAM. For example, add --env.CK_HOST_CPU_NUMBER_OF_PROCESSORS=2 on HiKey 960 (3 GB RAM with swap enabled) or --env.CK_HOST_CPU_NUMBER_OF_PROCESSORS=1 on Tegra TX1 (4 GB RAM without swap enabled).

Running the classification program

To classify e.g. ILSVRC2012_val_00000060.JPEG:

$ ck virtual env --tags=imagenet,small_dataset
$ ck run program:image-classification-tf-py \
--env.CK_IMAGE_FILE=$CK_ENV_DATASET_IMAGENET_VAL/ILSVRC2012_val_00000060.JPEG

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants