Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

General support for Float16 and other DTypes #2302

Closed
11 of 24 tasks
vchuravy opened this issue Jun 1, 2016 · 30 comments
Closed
11 of 24 tasks

General support for Float16 and other DTypes #2302

vchuravy opened this issue Jun 1, 2016 · 30 comments

Comments

@vchuravy
Copy link
Contributor

vchuravy commented Jun 1, 2016

So from what I can tell the following operators currently don't support anything than real_t e.g. Float32. I am going to work to fix the ones important for my research and I would welcome any help. I feel that having comprehensive support for other datatypes is important for MXNet.

Up for grabs

  • crop
  • slice_channel
  • softmax_activation
  • matrix_op
  • l2_normalization
  • make_loss
  • identity_attach_KL_sparse_reg
  • broadcast_reduce
  • embedding
  • smooth_l1_unary

Depending on a resolution to dmlc/mshadow#125

Done

@Godricly
Copy link
Contributor

Godricly commented Jun 2, 2016

May I ask how the half precision is supported on the CPU?
Can I view the pooling #2280 as an example to do this?

@vchuravy
Copy link
Contributor Author

vchuravy commented Jun 2, 2016

I am following the way convolution is implemented and I think on CPU Float16 is implemented by promoting to Float32. https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html

@piiswrong
Copy link
Contributor

I have a 1080 coming in next week. Let's try to merge in the common ones so we can have benchmark numbers released asap

@vchuravy
Copy link
Contributor Author

vchuravy commented Jun 3, 2016

If you start working on one, just post it here so that we don't duplicate effort.

@vchuravy
Copy link
Contributor Author

vchuravy commented Jun 3, 2016

@Godricly #2322 Is probably a good template of how to do it.

@vchuravy
Copy link
Contributor Author

I updated the list a bit.

@tqchen
Copy link
Member

tqchen commented Jun 21, 2016

I would need to mention one thing. Simply support half_t type was not enough for making things faster with fp16. Usually a explicit vectorization of code is needed. So unless things are operated together in a Packet structure with intrinsics. There is less likely to be speedup.

@Godricly
Copy link
Contributor

Godricly commented Jun 22, 2016

It sounds like the underlining mshadow needs optimization for data alignment. Another thing I'm thinking about is whether we should add a option to enable backward computation in higher precision(float) for half_t type. The half_t type cannot represent too small gradients. And it will be messy in coding to enable this.
Working on embedding now.

@Godricly
Copy link
Contributor

@vchuravy I have a updated Dtype pooling branch based on your work. Can you double check this one and submit a pr? My fork of MxNet is kind of messy now.

@xlvector
Copy link
Contributor

@Godricly Hi, do you have some examples of using fp16? Is it used in training or inference?

@Godricly
Copy link
Contributor

Godricly commented Jul 25, 2016

Not yet. Basically, u can insert some cast layer to transform input(data and label) into fp16, so the network flows in fp16. Currently, mxnet has compatible issue with fp16, so I cancelled my previous PR #2564.

If you are interested in fp16, u can follow my branch to enable fp16 param init and single machine training, the multi machine one depends ps-lite which is a little bit hard to get it work.

U also need to make some modification on optimizer, which use float type to update weights and convert them back to fp16 in network.

For the lstm case, the provided data type is need, which is painful. If you have any better solution, please let me know. 😆

BTW, the DType BN is only functional using cudnn.

@xlvector
Copy link
Contributor

@Godricly Thanks very much.

@Godricly
Copy link
Contributor

Pooling and Dropout have been merged.

@Godricly
Copy link
Contributor

@vchuravy Can you update the todo list please? Or create a new issue to track the progress?
DType regression is submitted in #3018.

@vchuravy
Copy link
Contributor Author

Updated it. What is your current status on BatchNorm? #2562 Is my latest stand but you mentioned that you made some updates?

@Godricly
Copy link
Contributor

There is a branch under my mxnet.If you are only using cuDNN BN, it should be good enough to start with.

  • It is functional with cuDNN. But not with the native mshadow one.
  • The cudnn BN of FP16 is using float for mean and variance. I haven't figure out how to get infer_type compatible both with and without cuDNN. The marco I used will break non-cuDNN version.

Considering these two issues, I didn't submit it.

@lygstate
Copy link

what's the situation of fp16 support in mxnet

@ysh329
Copy link
Contributor

ysh329 commented May 21, 2017

@lygstate inference or trainning acceleration on mobile or embed system device etc.

@lygstate
Copy link

I means the progress of fp16 support, if it's not finished what I can do for it?

@ysh329
Copy link
Contributor

ysh329 commented May 22, 2017

@lygstate you can train a fp16 model from scratch by using cast function in symbol file.
How to set int8 or float16 to predict? · Issue #5822 · dmlc/mxnet
#5822

@lygstate
Copy link

I want do traning in fp32, but predict in fp16。 are that possible?

@ysh329
Copy link
Contributor

ysh329 commented May 23, 2017

@piiswrong @Godricly

@Godricly
Copy link
Contributor

@lygstate
There is a fp16 example of image classification. You can refer to that one.
You can predict with trained fp32 model using fp16 with proper clipping, But I think the performance will drop.
However I don't think you can deploy fp16 on mobile devices with mxnet. The current one relies on cudnn backend.

@lygstate
Copy link

lygstate commented May 23, 2017

@Godricly , Yeap, I want using fp16 with cudnn for performance reason:) Thanks a lot

@lisa-imagia
Copy link

Quick fix for monitoring weights on float16: #8506

@haojin2
Copy link
Contributor

haojin2 commented Mar 15, 2018

Seems like crop, slice_channel, softmax_activation are all deprecated operators, I think maybe we can skip the support for FP16 for those operators?

@ChaiBapchya
Copy link
Contributor

@eric-haibin-lin Since we have quite a few separate (specific) requests for FP16 support. Do we merge one together and close out the redundant ones? or we keep the issues the way they are?

@eric-haibin-lin
Copy link
Member

@ChaiBapchya this list might actually be out-dated now..

@ChaiBapchya
Copy link
Contributor

Do you recommend closing this issue in that case?

@eric-haibin-lin
Copy link
Member

I do see that many ops are going to be deprecated. Closing it now. Please file separate github issue when an unsupported fp16 op is encountered.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests