Skip to content

Latest commit

 

History

History
257 lines (218 loc) · 15.4 KB

File metadata and controls

257 lines (218 loc) · 15.4 KB

EfficientNet Model Inference

EfficientNet Inference using Intel® Extension for Pytorch. Sample uses EfficientNet model implementations from torchvision:

Model Documentation Weights
efficientnet_b0 torchvision.models.efficientnet_b0 EfficientNet_B0_Weights.IMAGENET1K_V1
efficientnet_b1 torchvision.models.efficientnet_b1 EfficientNet_B1_Weights.IMAGENET1K_V1
efficientnet_b2 torchvision.models.efficientnet_b2 EfficientNet_B2_Weights.IMAGENET1K_V1
efficientnet_b3 torchvision.models.efficientnet_b3 EfficientNet_B3_Weights.IMAGENET1K_V1
efficientnet_b4 torchvision.models.efficientnet_b4 EfficientNet_B4_Weights.IMAGENET1K_V1
efficientnet_b5 torchvision.models.efficientnet_b5 EfficientNet_B5_Weights.IMAGENET1K_V1
efficientnet_b6 torchvision.models.efficientnet_b6 EfficientNet_B6_Weights.IMAGENET1K_V1
efficientnet_b7 torchvision.models.efficientnet_b7 EfficientNet_B7_Weights.IMAGENET1K_V1

Dataset

Note

Throughtput and latency benchmarking can be done with dummy data (./run_model.sh --dummy). In such a case dataset setup can be skipped. As a donwside expect to see low accuracy on the dummy data.

Note

~13.3 GB of free disk space is required to download and extract ImageNet dataset.

ImageNet validation dataset is required to measure accuracy during inference. Visit ImageNet site and download the following files:

Note

Both dataset components must be downloaded to the same folder. This folder must be the $DATASET_DIR referenced in the following sections.

get_dataset.sh script can be used to download these files. There is no need to extract and format these files before running this sample. On the first run sample script will extract the archive with torchvision.datasets.ImageNet. Consequent runs will skip extraction.

Prerequisites

Hardware:

Software:

Run the model under container

Note

Sample requires network connection to download model from the network via HTTPS. Make sure to set https_proxy under running container if you work behind the proxy.

Pull pre-built image with the sample:

docker pull intel/image-recognition:pytorch-flex-gpu-efficientnet-inference

or build it locally:

docker build \
  $(env | grep -E '(_proxy=|_PROXY)' | sed 's/^/--build-arg /') \
  -f docker/flex-gpu/pytorch-efficientnet-inference/pytorch-flex-series-efficientnet-inference.Dockerfile \
  -t intel/image-recognition:pytorch-flex-gpu-efficientnet-inference .

Run sample as follows:

  • With dummy data:

    • Running with dummy data is recommended for performance benchmarking (throughput and latency measurements)
    • Use higher NUM_ITERATIONS and lower NUM_IMAGES values (e.g. use NUM_IMAGES=$BATCH_SIZE) for more precise performance results
    • NOTE: Accuracy will be zero when using dummy data
    mkdir -p /tmp/output && rm -f /tmp/output/* && chmod -R 777 /tmp/output
    export BATCH_SIZE=1
    docker run -it --rm --ipc=host \
      $(env | grep -E '(_proxy=|_PROXY)' | sed 's/^/-e /') \
      --cap-add SYS_NICE \
      --device /dev/dri/ \
      -e MODEL_NAME=efficientnet_b0 \
      -e PLATFORM=Flex \
      -e NUM_ITERATIONS=32 \
      -e NUM_IMAGES=${BATCH_SIZE} \
      -e BATCH_SIZE=${BATCH_SIZE} \
      -e OUTPUT_DIR=/tmp/output \
      -v /tmp/output:/tmp/output \
      intel/image-recognition:pytorch-flex-gpu-efficientnet-inference \
        /bin/bash -c "./run_model.sh --dummy"
    
  • With ImageNet dataset (assumes that dataset was downloaded to the $DATASET_DIR folder):

    • Running with dataset images is recommended for accuracy measurements
    • Use higher NUM_IMAGES (e.g. 50000 for full ImageNet set) and lower NUM_ITERATIONS for more precise (and fast) accuracy results
    • NOTE: Performance results (throughput and latency measurements) may be impacted due to data handling overhead
    mkdir -p /tmp/output && rm -f /tmp/output/* && chmod -R 777 /tmp/output
    export BATCH_SIZE=1
    docker run -it --rm --ipc=host \
      $(env | grep -E '(_proxy=|_PROXY)' | sed 's/^/-e /') \
      --cap-add SYS_NICE \
      --device /dev/dri/ \
      -e MODEL_NAME=efficientnet_b0 \
      -e PLATFORM=Flex \
      -e NUM_ITERATIONS=1 \
      -e NUM_IMAGES=50000 \
      -e BATCH_SIZE=${BATCH_SIZE} \
      -e OUTPUT_DIR=/tmp/output \
      -v /tmp/output:/tmp/output \
      -e DATASET_DIR=/dataset \
      -v $DATASET_DIR:/dataset \
      intel/image-recognition:pytorch-flex-gpu-efficientnet-inference \
        /bin/bash -c "./run_model.sh"
    

Mind the following docker run arguments:

  • HTTPS proxy is required to download model over network (-e https_proxy=<...>)
  • --cap-add SYS_NICE is required for numactl
  • --device /dev/dri is required to expose GPU device to running container
  • --ipc=host is required for multi-stream benchmarking (./run_model.sh --dummy --streams 2) or large dataset cases
  • -v $DATASET_DIR:/dataset in case where dataset is used. $DATASET_DIR should be replaced with the actual path to the ImageNet dataset.

Run the model on baremetal

Note

Sample requires network connection to download model from the network via HTTPS. Make sure to set https_proxy before running run_model.sh if you work behind proxy.

  1. Download the sample:

    git clone https://github.com/IntelAI/models.git
    cd models/models_v2/pytorch/efficientnet/inference/gpu
    
  2. Create virtual environment venv and activate it:

    python3 -m venv venv
    . ./venv/bin/activate
    
  3. Install sample python dependencies:

    python3 -m pip install -r requirements.txt
    
  4. Install Intel® Extension for PyTorch

  5. Setup required environment variables and run the sample with ./run_model.sh:

    • With dummy data:

      • Running with dummy data is recommended for performance benchmarking (throughput and latency measurements)
      • Use higher NUM_ITERATIONS and lower NUM_IMAGES values (e.g. use NUM_IMAGES=$BATCH_SIZE) for more precise performance results
      • NOTE: Accuracy will be zero when using dummy data
      export MODEL_NAME=efficientnet_b0
      export PLATFORM=Flex
      export BATCH_SIZE=1
      export NUM_ITERATIONS=32
      export NUM_IMAGES=${BATCH_SIZE}
      export OUTPUT_DIR=/tmp/output
      ./run_model.sh --dummy
      
  • With ImageNet dataset (assumes that dataset was downloaded to the $DATASET_DIR folder):

    • Running with dataset images is recommended for accuracy measurements
    • Use higher NUM_IMAGES (e.g. 50000 for full ImageNet set) and lower NUM_ITERATIONS for more precise (and fast) accuracy results
    • NOTE: Performance results (throughput and latency measurements) may be impacted due to data handling overhead
    export MODEL_NAME=efficientnet_b0
    export PLATFORM=Flex
    export BATCH_SIZE=1
    export NUM_ITERATIONS=1
    export NUM_IMAGES=50000
    export OUTPUT_DIR=/tmp/output
    export DATASET_DIR=$DATASET_DIR
    ./run_model.sh
    

Runtime arguments and environment variables

run_model.sh accepts a number of arguments to tune behavior. run_model.sh supports the use of environment variables as well as command line arguments for specifying these arguments (see the table below for details).

Before running run_model.sh script, user is required to:

  • Set OUTPUT_DIR environment variable (or use --output-dir) where script should write logs.
  • Use --dummy data or set DATASET_DIR environment variable (or use --data) pointing to ImageNet dataset.

Other arguments and/or environment variables are optional and should be used according to the actual needs (see examples above).

Argument Environment variable Valid Values Purpose
--amp AMP yes Use AMP on model conversion to the desired precision (default: yes)
no
--arch MODEL_NAME efficientnet_b0 Torchvision model to run (default: efficientnet_b0)
efficientnet_b1
efficientnet_b2
efficientnet_b3
efficientnet_b4
efficientnet_b5
efficientnet_b6
efficientnet_b7
--batch-size BATCH_SIZE >=1 Batch size to use (default: 1)
--data DATASET_DIR String Location to load images from
--dummy DUMMY Use randomly generated dummy dataset in place of --data argument
--jit JIT none JIT method to use (default: trace)
--load LOAD_PATH Local path to load model from (default: disabled)
trace
script
--num-images NUM_IMAGES >=1 Number of images to load (default: 1)
--num-iterations NUM_ITERATIONS >=1 Number of times to test each batch (default: 100)
--output-dir OUTPUT_DIR String Location to write output
--proxy https_proxy String System proxy
--precision PRECISION bp16 Precision to use for the model (default: fp32)
fp16
fp32
--save SAVE_PATH Local path to save model to (default: disabled)
--streams STREAMS >=1 Number of parallel streams to do inference on (default: 1)

For more details, check help with run_model.sh --help

Example output

Script output is written to the console as well as to the output directory in the file output.log.

For multi-stream cases per-stream results can be found in the results_[STREAM_INSTANCE].json files.

Final results of the inference run can be found in results.yaml file. More verbose results summaries are in results.json file.

The yaml file contents will look like:

results:
 - key: throughput
   value: 9199.48
   unit: img/s
 - key: latency
   value: 31.394199
   unit: ms
 - key: accuracy
   value: 76.06
   unit: percents