Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add half precision / float16 support in Paddle #4853

Closed
kexinzhao opened this issue Oct 17, 2017 · 2 comments
Closed

Add half precision / float16 support in Paddle #4853

kexinzhao opened this issue Oct 17, 2017 · 2 comments

Comments

@kexinzhao
Copy link
Contributor

Currently, half precision floating point (float16) data type is not supported in Paddle.

Adding the float16 data type could potentially:

  • reduce storage space
  • save memory bandwidth usage
  • arithmetic speed up if supported by hardware

A brief survey of float16 support on different hardwares:

ARM processor:

float16 storage and conversion to/from float32 is generally supported in armv7 and armv8.

However float16 arithmetic is only supported since armv8.2A (Quote: "IEEE754-2008 formatted half-precision floating point data processing is added to Armv8.2-A").

There are currently very limited device using CPU of armv8.2A architecture (the only one I found is newly launched cortex-A75 which will be used in Qualcomm Snapdragon 845).

x86/x64 CPU:

float16 is only supported as a storage type and intrinsics are available for conversion between float16 and float32.

Nvidia GPU:

fp16 storage and arithmetic available since cuda 7.5 on supported GPUs (e.g. PASCAL GPUs).

@kexinzhao kexinzhao self-assigned this Oct 17, 2017
@kexinzhao
Copy link
Contributor Author

A brief survey of how float16 arithmetics work in ARM ComputeLibrary:

https://github.com/ARM-software/ComputeLibrary/blob/master/SConstruct#L125

elif env['arch'] == 'arm64-v8.2-a':
    env.Append(CXXFLAGS = ['-march=armv8.2-a+fp16+simd'])
    env.Append(CPPDEFINES = ['ARM_COMPUTE_ENABLE_FP16'])

ARM_COMPUTE_ENABLE_FP16 is defined only when the current arm processor is 64bit armv8.2-a. All the float16 arithmetics are used only when this flag is defined, such as this code.

We can follow a similar procedure to only enable float16 arithmetics on supported ARM processor.

@Xreki Xreki added the mobile label Oct 17, 2017
@kexinzhao kexinzhao added this to Neon Optimize & Low Precision in Embedded and Mobile Deployment Oct 18, 2017
@kexinzhao kexinzhao reopened this Nov 16, 2017
@kexinzhao
Copy link
Contributor Author

kexinzhao commented Nov 16, 2017

  • Add float16 data type
  • Update pybind/tensor_py.h to bind c++ float16 with numpy float16
  • Modify GetKernelType() method in framework/operator.h to make it compatible with float16
  • Create a type-casting operator that can convert the data type in tensor between float16 and other types

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Embedded and Mobile Deployment
Neon Optimize & Low Precision
Development

No branches or pull requests

2 participants