Skip to content

Latest commit

 

History

History
67 lines (46 loc) · 3.51 KB

mixed_precision.md

File metadata and controls

67 lines (46 loc) · 3.51 KB

Mixed Precision

  1. Introduction
  2. Mixed Precision Support Matrix
  3. Get start with Mixed Precision API
  4. Examples

Introduction

The recent growth of Deep Learning has driven the development of more complex models that require significantly more compute and memory capabilities. Several low precision numeric formats have been proposed to address the problem. Google's bfloat16 and the FP16: IEEE half-precision format are two of the most widely used sixteen bit formats. Mixed precision training and inference using low precision formats have been developed to reduce compute and bandwidth requirements.

The recently launched 3rd Gen Intel® Xeon® Scalable processor (codenamed Cooper Lake), featuring Intel® Deep Learning Boost, is the first general-purpose x86 CPU to support the bfloat16 format. Specifically, three new bfloat16 instructions are added as a part of the AVX512_BF16 extension within Intel Deep Learning Boost: VCVTNE2PS2BF16, VCVTNEPS2BF16, and VDPBF16PS. The first two instructions allow converting to and from bfloat16 data type, while the last one performs a dot product of bfloat16 pairs. Further details can be found in the hardware numerics document published by Intel.

Architecture

Mixed Precision Support Matrix

Framework BF16
TensorFlow
PyTorch
ONNX plan to support in the future
MXNet

During quantization, BF16 conversion can be executed if force enabled. Please refer to this document for its workflow.

Get start with Mixed Precision API

To get a bf16 model, users can use the Mixed Precision API step by step.

  • First, initialize an object of MixedPrecision.
from neural_compressor.experimental import MixedPrecision
converter = MixedPrecision()
  • Next, assign target precision and fp32 model to the attributes of MixedPrecision.

BF16 conversion may lead to accuracy drop. Intel® Neural Compressor provides an accuracy-aware tuning function to reduce accuracy loss, which will fallback converted ops to FP32 automatically to get better accuracy. To enable this function, users only need to provide an evaluation function (or dataloader + metric).

converter.precisions = 'bf16'
converter.model = model # model is a fp32 model
converter.eval_func = user_defined_function # optional, this function only accepts model as input and return a higher-is-better scalar as accuracy
  • Last, execute convertion and save model.
converted_model = converter()
converted_model.save(output_path)

Examples

There are 2 pre-requirements to run BF16 mixed precision examples:

  • Hardware: CPU supports avx512_bf16 instruction set.
  • Software: intel-tensorflow >= 2.3.0 or torch >= 1.11.0.

If either pre-requirement can't be met, the program would exit consequently.