New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float16 status/follow-up #2908

Open
abergeron opened this Issue May 14, 2015 · 37 comments

Comments

Projects
None yet
@abergeron
Member

abergeron commented May 14, 2015

Summary, this work with the new GPU back-end.
To use it:

Other float16 upgrade:

  • Have a cleaner way to do values_eq_approx(). (currently it transfer to the CPU)
  • List all op with _f16_ok=True
  • Make sure all ops that support float16 are tested in float16.
  • GpuElemwise suppose all scalar work with float16. Test that it is True.
  • Double check mrg with float16, are we still getting as good random number? We should check the original code for float64 config. Maybe it would help us find the float16 config numbers?
  • ScalarSoftplus c code threshold?
  • GpuCarReduceDtype make pre_scalar_op work with float16

Caveat:

  • This is beta version.
  • It have C code only on the new libgpuarray back-end.
    • It is not supported by the old GPU back-and, CPU use python code
  • There are some safeguard, so if an op isn't marked with support for float16, we won't use its c code and fall back to python code (so slow).
  • Not all GPU code in the new back-end was updated to support it.
  • The new back-end do not have all the ops from the old back-end ported to it.
  • The user code could need some modification. One example: lisa-lab/DeepLearningTutorials#86
  • This could change the best hyper-parameter and/or influence the models to be able to cope with the lower precission.
  • This could be slower, same speed, faster. You should benchmark this.

If you still want to try it:

If you have problems, tell us about it. Benchmark your code (very important), it could run but much slower.

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz May 14, 2015

Member

I added many things.

Member

nouiz commented May 14, 2015

I added many things.

@abergeron

This comment has been minimized.

Show comment
Hide comment
@abergeron

abergeron May 14, 2015

Member

I fail to see the point of making a list of all ops and then dividing in two.

GpuElemwise does not suppose scalars work with float16. It supposes that they work with float32. Also
they don't work with float16, I tried.

With the fix I've done mrg should no longer have a bias towards 0 in float16.

Member

abergeron commented May 14, 2015

I fail to see the point of making a list of all ops and then dividing in two.

GpuElemwise does not suppose scalars work with float16. It supposes that they work with float32. Also
they don't work with float16, I tried.

With the fix I've done mrg should no longer have a bias towards 0 in float16.

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz May 14, 2015

Member

We can make a list of all CPU/GPU ops and tell 2 information for each. DO they support float16? Do this case have tests?

Member

nouiz commented May 14, 2015

We can make a list of all CPU/GPU ops and tell 2 information for each. DO they support float16? Do this case have tests?

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz May 22, 2015

Member

@abergeron I updated this ticket with instruction to use it and warnings. Can you check it?

Member

nouiz commented May 22, 2015

@abergeron I updated this ticket with instruction to use it and warnings. Can you check it?

@nouiz nouiz changed the title from Float16 follow-up to Float16 status/follow-up May 25, 2015

@Thrandis

This comment has been minimized.

Show comment
Hide comment
@Thrandis

Thrandis Sep 11, 2015

Contributor

@abergeron, @nouiz Maybe it would be worth adding that Nervana only supports Maxwell and future GPU architectures. This means that it can only be used on GTX750, GTX750Ti, GTX960, GTX970, GTX980 and Titan X!

Contributor

Thrandis commented Sep 11, 2015

@abergeron, @nouiz Maybe it would be worth adding that Nervana only supports Maxwell and future GPU architectures. This means that it can only be used on GTX750, GTX750Ti, GTX960, GTX970, GTX980 and Titan X!

@abergeron

This comment has been minimized.

Show comment
Hide comment
@abergeron

abergeron Sep 12, 2015

Member

There is a new PR which makes use of cublasSgemmEx (cuda 7.5 only) to perform the dot product which should work on all supported cards. It's not merged yet.

Member

abergeron commented Sep 12, 2015

There is a new PR which makes use of cublasSgemmEx (cuda 7.5 only) to perform the dot product which should work on all supported cards. It's not merged yet.

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz Sep 12, 2015

Member

#3355 is the PR @abergeron talk about. I put a note for nervana.

Member

nouiz commented Sep 12, 2015

#3355 is the PR @abergeron talk about. I put a note for nervana.

@Thrandis

This comment has been minimized.

Show comment
Hide comment
@Thrandis

Thrandis Sep 12, 2015

Contributor

That's great! Thanks for the infos!

Contributor

Thrandis commented Sep 12, 2015

That's great! Thanks for the infos!

abergeron referenced this issue in abergeron/Theano Sep 30, 2015

@Darwin2011

This comment has been minimized.

Show comment
Hide comment
@Darwin2011

Darwin2011 Jan 18, 2016

Hi, @abergeron @Thrandis .

May I know Theano Float16 feature is only to support float16 storage or for native float16 arithmetic?

From Nvidia's Cuda feature list, http://devblogs.nvidia.com/parallelforall/new-features-cuda-7-5/, currently all the GPUs except Tegra X1 only supports FP16 storage and FP16/FP32/FP64 mix-precision computation. So what I understand is that cuda only supports FP16 storage rather than computations.

Thanks in advance.
Yan

Darwin2011 commented Jan 18, 2016

Hi, @abergeron @Thrandis .

May I know Theano Float16 feature is only to support float16 storage or for native float16 arithmetic?

From Nvidia's Cuda feature list, http://devblogs.nvidia.com/parallelforall/new-features-cuda-7-5/, currently all the GPUs except Tegra X1 only supports FP16 storage and FP16/FP32/FP64 mix-precision computation. So what I understand is that cuda only supports FP16 storage rather than computations.

Thanks in advance.
Yan

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz Jan 18, 2016

Member

For all, except TX1, yes CUDA only support float16 storage. Theano also
only support float16 storage.

On Mon, Jan 18, 2016 at 3:23 AM Dai Yan notifications@github.com wrote:

Hi, @abergeron https://github.com/abergeron @Thrandis
https://github.com/Thrandis .

May I know Theano Float16 feature is only to support float16 storage or
for native float16 arithmetic?

From Nvidia's Cuda feature list,
http://devblogs.nvidia.com/parallelforall/new-features-cuda-7-5/,
currently all the GPUs except Tegra X1 only supports FP16 storage and
FP16/FP32/FP64 mix-precision computation. So what I understand is that cuda
only supports FP16 storage rather than computations.

Thanks in advance.

Yan


Reply to this email directly or view it on GitHub
#2908 (comment).

Member

nouiz commented Jan 18, 2016

For all, except TX1, yes CUDA only support float16 storage. Theano also
only support float16 storage.

On Mon, Jan 18, 2016 at 3:23 AM Dai Yan notifications@github.com wrote:

Hi, @abergeron https://github.com/abergeron @Thrandis
https://github.com/Thrandis .

May I know Theano Float16 feature is only to support float16 storage or
for native float16 arithmetic?

From Nvidia's Cuda feature list,
http://devblogs.nvidia.com/parallelforall/new-features-cuda-7-5/,
currently all the GPUs except Tegra X1 only supports FP16 storage and
FP16/FP32/FP64 mix-precision computation. So what I understand is that cuda
only supports FP16 storage rather than computations.

Thanks in advance.

Yan


Reply to this email directly or view it on GitHub
#2908 (comment).

@abergeron

This comment has been minimized.

Show comment
Hide comment
@abergeron

abergeron Jan 18, 2016

Member

While we only support storage, we fake arithmetic using float32 computations. So you can still do stuff with your float16 data.

Member

abergeron commented Jan 18, 2016

While we only support storage, we fake arithmetic using float32 computations. So you can still do stuff with your float16 data.

@Darwin2011

This comment has been minimized.

Show comment
Hide comment
@Darwin2011

Darwin2011 commented Jan 19, 2016

Thanks! @abergeron @nouiz

@sunshineatnoon

This comment has been minimized.

Show comment
Hide comment
@sunshineatnoon

sunshineatnoon Apr 21, 2016

I don't quite understand this :"It is not supported by the old GPU back-and, CPU use python code". Does this mean if I use theano with CPU, I can't use FP16?

sunshineatnoon commented Apr 21, 2016

I don't quite understand this :"It is not supported by the old GPU back-and, CPU use python code". Does this mean if I use theano with CPU, I can't use FP16?

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz Apr 21, 2016

Member

If you use the CPU, float16 will be slow as we will our Python back-end.

On Wed, Apr 20, 2016 at 11:22 PM, SunshineAtNoon notifications@github.com
wrote:

I don't quite understand this :"It is not supported by the old GPU
back-and, CPU use python code". Does this mean if I use theano with CPU, I
can't use FP16?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#2908 (comment)

Member

nouiz commented Apr 21, 2016

If you use the CPU, float16 will be slow as we will our Python back-end.

On Wed, Apr 20, 2016 at 11:22 PM, SunshineAtNoon notifications@github.com
wrote:

I don't quite understand this :"It is not supported by the old GPU
back-and, CPU use python code". Does this mean if I use theano with CPU, I
can't use FP16?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#2908 (comment)

@sunshineatnoon

This comment has been minimized.

Show comment
Hide comment
@sunshineatnoon

sunshineatnoon Apr 21, 2016

@nouiz Thanks for your reply. So this means I can use float16, I don't care about the speed right now, just the precision. Another question is that as long as I got cuda7.5, I can use float16 on any Nvidia card, right?

sunshineatnoon commented Apr 21, 2016

@nouiz Thanks for your reply. So this means I can use float16, I don't care about the speed right now, just the precision. Another question is that as long as I got cuda7.5, I can use float16 on any Nvidia card, right?

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz Apr 21, 2016

Member

Yes. With cuda 7.5, you can use float16 on NVIDIA GPU.

For precission, on GPU, we store the data in float16, but the computation
is done in float32 as the current GPU don't support float16 (Pascal GPU
will support it, but this will request some change to Theano to support
that).

On the CPU, we let numpy handle the float16. I don't know if the
computation is in float16 or float32.

On Thu, Apr 21, 2016 at 8:24 AM, SunshineAtNoon notifications@github.com
wrote:

@nouiz https://github.com/nouiz Thanks for your reply. So this means I
can use float16, I don't care about the speed right now, just the
precision. Another question is that as long as I got cuda7.5, I can use
float16 on any Nvidia card, right?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#2908 (comment)

Member

nouiz commented Apr 21, 2016

Yes. With cuda 7.5, you can use float16 on NVIDIA GPU.

For precission, on GPU, we store the data in float16, but the computation
is done in float32 as the current GPU don't support float16 (Pascal GPU
will support it, but this will request some change to Theano to support
that).

On the CPU, we let numpy handle the float16. I don't know if the
computation is in float16 or float32.

On Thu, Apr 21, 2016 at 8:24 AM, SunshineAtNoon notifications@github.com
wrote:

@nouiz https://github.com/nouiz Thanks for your reply. So this means I
can use float16, I don't care about the speed right now, just the
precision. Another question is that as long as I got cuda7.5, I can use
float16 on any Nvidia card, right?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#2908 (comment)

@sunshineatnoon

This comment has been minimized.

Show comment
Hide comment
@sunshineatnoon

sunshineatnoon Apr 25, 2016

@nouiz What about precisions on CPU? Do you use float16 to do both computation and storages? I usedevice=cpu and floatX=float16 and then I got the following error. Thanks!

  File "/usr/lib/python2.7/site-packages/Theano-0.9.0dev0-py2.7.egg/theano/compile/function.py", line 322, in function
    output_keys=output_keys)
  File "/usr/lib/python2.7/site-packages/Theano-0.9.0dev0-py2.7.egg/theano/compile/pfunc.py", line 443, in pfunc
    no_default_updates=no_default_updates)
  File "/usr/lib/python2.7/site-packages/Theano-0.9.0dev0-py2.7.egg/theano/compile/pfunc.py", line 208, in rebuild_collect_shared
    raise TypeError(err_msg, err_sug)
TypeError: ('An update must have the same type as the original shared variable (shared_var=<TensorType(float16, matrix)>, shared_var.type=TensorType(float16, matrix), update_val=Elemwise{sub,no_inplace}.0, update_val.type=TensorType(float32, matrix)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')

sunshineatnoon commented Apr 25, 2016

@nouiz What about precisions on CPU? Do you use float16 to do both computation and storages? I usedevice=cpu and floatX=float16 and then I got the following error. Thanks!

  File "/usr/lib/python2.7/site-packages/Theano-0.9.0dev0-py2.7.egg/theano/compile/function.py", line 322, in function
    output_keys=output_keys)
  File "/usr/lib/python2.7/site-packages/Theano-0.9.0dev0-py2.7.egg/theano/compile/pfunc.py", line 443, in pfunc
    no_default_updates=no_default_updates)
  File "/usr/lib/python2.7/site-packages/Theano-0.9.0dev0-py2.7.egg/theano/compile/pfunc.py", line 208, in rebuild_collect_shared
    raise TypeError(err_msg, err_sug)
TypeError: ('An update must have the same type as the original shared variable (shared_var=<TensorType(float16, matrix)>, shared_var.type=TensorType(float16, matrix), update_val=Elemwise{sub,no_inplace}.0, update_val.type=TensorType(float32, matrix)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz Apr 25, 2016

Member

We let numpy do the computation for us in that case. I don't remember what
it does and don't have time to look. I think it do computation in float16,
but I'm not sure. If you find the information, can you post it here?
Probably other people will be interrested to know.

On Sun, Apr 24, 2016 at 10:17 PM, SunshineAtNoon notifications@github.com
wrote:

@nouiz https://github.com/nouiz What about precisions on CPU? Do you
use float16 to do both computation and storages? Thanks!


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#2908 (comment)

Member

nouiz commented Apr 25, 2016

We let numpy do the computation for us in that case. I don't remember what
it does and don't have time to look. I think it do computation in float16,
but I'm not sure. If you find the information, can you post it here?
Probably other people will be interrested to know.

On Sun, Apr 24, 2016 at 10:17 PM, SunshineAtNoon notifications@github.com
wrote:

@nouiz https://github.com/nouiz What about precisions on CPU? Do you
use float16 to do both computation and storages? Thanks!


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#2908 (comment)

@sunshineatnoon

This comment has been minimized.

Show comment
Hide comment
@sunshineatnoon

sunshineatnoon Apr 26, 2016

@nouiz Thanks for your reply. Numpy does computation in float32 too. See here.

sunshineatnoon commented Apr 26, 2016

@nouiz Thanks for your reply. Numpy does computation in float32 too. See here.

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz Apr 29, 2016

Member

thanks for the update.

On Mon, Apr 25, 2016 at 10:25 PM, SunshineAtNoon notifications@github.com
wrote:

@nouiz https://github.com/nouiz Thanks for your reply. Numpy does
computation in float32 too. See here
numpy/numpy#7571 (comment).


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#2908 (comment)

Member

nouiz commented Apr 29, 2016

thanks for the update.

On Mon, Apr 25, 2016 at 10:25 PM, SunshineAtNoon notifications@github.com
wrote:

@nouiz https://github.com/nouiz Thanks for your reply. Numpy does
computation in float32 too. See here
numpy/numpy#7571 (comment).


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#2908 (comment)

@brendanruff

This comment has been minimized.

Show comment
Hide comment
@brendanruff

brendanruff Jun 10, 2016

Hi anyone who is watching. It is now June 2016 a year after this post. Is there a resolution to Float16 in theano with at least Cuda7.5 and latest-and-greatest top end nVidia GPU assumed (and assume any other upgrade you might consider vital for an ideal system as money is not object but halving the memory footprint is very important for deep learning, and it speed increase is does not matter but clearly would be nice to have.

brendanruff commented Jun 10, 2016

Hi anyone who is watching. It is now June 2016 a year after this post. Is there a resolution to Float16 in theano with at least Cuda7.5 and latest-and-greatest top end nVidia GPU assumed (and assume any other upgrade you might consider vital for an ideal system as money is not object but halving the memory footprint is very important for deep learning, and it speed increase is does not matter but clearly would be nice to have.

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz Jun 10, 2016

Member

It work in the new gpu back-end(gpuarray):

http://deeplearning.net/software/theano/tutorial/using_gpu.html#gpuarray

The current implementation support storing in float16, but computation in
float32. When we have a GPU that do computation in float16 (we order a few
1080 for this) we can implement the computation in float16.

There is one or 2 PR that would bring float16 to a few more operation in
the new back-end.

On Fri, Jun 10, 2016 at 5:28 AM, brendanruff notifications@github.com
wrote:

Hi anyone who is watching. It is now June 2016 a year after this post. Is
there a resolution to Float16 in theano with at least Cuda7.5 and
latest-and-greatest top end nVidia GPU assumed (and assume any other
upgrade you might consider vital for an ideal system as money is not object
but halving the memory footprint is very important for deep learning, and
it speed increase is does not matter but clearly would be nice to have.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2908 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AALC-2hdo5LlXNPMXNP-XbTYSyya5qJbks5qKS41gaJpZM4Eavnf
.

Member

nouiz commented Jun 10, 2016

It work in the new gpu back-end(gpuarray):

http://deeplearning.net/software/theano/tutorial/using_gpu.html#gpuarray

The current implementation support storing in float16, but computation in
float32. When we have a GPU that do computation in float16 (we order a few
1080 for this) we can implement the computation in float16.

There is one or 2 PR that would bring float16 to a few more operation in
the new back-end.

On Fri, Jun 10, 2016 at 5:28 AM, brendanruff notifications@github.com
wrote:

Hi anyone who is watching. It is now June 2016 a year after this post. Is
there a resolution to Float16 in theano with at least Cuda7.5 and
latest-and-greatest top end nVidia GPU assumed (and assume any other
upgrade you might consider vital for an ideal system as money is not object
but halving the memory footprint is very important for deep learning, and
it speed increase is does not matter but clearly would be nice to have.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2908 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AALC-2hdo5LlXNPMXNP-XbTYSyya5qJbks5qKS41gaJpZM4Eavnf
.

@rshpeley

This comment has been minimized.

Show comment
Hide comment
@rshpeley

rshpeley Jul 15, 2016

@nouiz

When we have a GPU that do computation in float16 (we order a few
1080 for this) we can implement the computation in float16.

Do you mean nVidia did not send you early production of the 1080?

rshpeley commented Jul 15, 2016

@nouiz

When we have a GPU that do computation in float16 (we order a few
1080 for this) we can implement the computation in float16.

Do you mean nVidia did not send you early production of the 1080?

@brendanruff

This comment has been minimized.

Show comment
Hide comment
@brendanruff

brendanruff Jul 16, 2016

Hi Frederic

Thanks for the email. I may try to get 4x 1080 to try out the float 16. For now I am just saving intermediate results in lower precision to save memory. Hopefully 1080 will be plentiful soon :)!

Thanks again for your replies.

Brendan

From: rshpeley [mailto:notifications@github.com]
Sent: 15 July 2016 21:56
To: Theano/Theano Theano@noreply.github.com
Cc: brendanruff brendan@ruffguide.com; Comment comment@noreply.github.com
Subject: Re: [Theano/Theano] Float16 status/follow-up (#2908)

@nouiz https://github.com/nouiz

When we have a GPU that do computation in float16 (we order a few
1080 for this) we can implement the computation in float16.

Do you mean nVidia did not send you early production of the 1080?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub #2908 (comment) , or mute the thread https://github.com/notifications/unsubscribe-auth/AIi0x5jbo6eaxVlTunai0BcH09IxZN3eks5qV-XSgaJpZM4Eavnf . https://github.com/notifications/beacon/AIi0xwZGkQqTdO6pWSL1DyB0v2lYYyscks5qV-XSgaJpZM4Eavnf.gif

No virus found in this message.
Checked by AVG - www.avg.com http://www.avg.com
Version: 2016.0.7640 / Virus Database: 4627/12626 - Release Date: 07/16/16

brendanruff commented Jul 16, 2016

Hi Frederic

Thanks for the email. I may try to get 4x 1080 to try out the float 16. For now I am just saving intermediate results in lower precision to save memory. Hopefully 1080 will be plentiful soon :)!

Thanks again for your replies.

Brendan

From: rshpeley [mailto:notifications@github.com]
Sent: 15 July 2016 21:56
To: Theano/Theano Theano@noreply.github.com
Cc: brendanruff brendan@ruffguide.com; Comment comment@noreply.github.com
Subject: Re: [Theano/Theano] Float16 status/follow-up (#2908)

@nouiz https://github.com/nouiz

When we have a GPU that do computation in float16 (we order a few
1080 for this) we can implement the computation in float16.

Do you mean nVidia did not send you early production of the 1080?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub #2908 (comment) , or mute the thread https://github.com/notifications/unsubscribe-auth/AIi0x5jbo6eaxVlTunai0BcH09IxZN3eks5qV-XSgaJpZM4Eavnf . https://github.com/notifications/beacon/AIi0xwZGkQqTdO6pWSL1DyB0v2lYYyscks5qV-XSgaJpZM4Eavnf.gif

No virus found in this message.
Checked by AVG - www.avg.com http://www.avg.com
Version: 2016.0.7640 / Virus Database: 4627/12626 - Release Date: 07/16/16

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz Jul 19, 2016

Member

We don't get gamer GPU from NVIDIA. We rarely buy that anymore. Mostly
Titan GPUs.

We got 2 1080 now. Not sure when we can work on the computation in float16.
Just don't forget that computation in float16 will be slow on 1080. They
kept the minimal number of float16 unit to make a smaller chip (so it cost
less). I expect slowdown of using float16 on 1080 vs float32.

On Sat, Jul 16, 2016 at 9:45 AM, brendanruff notifications@github.com
wrote:

Hi Frederic

Thanks for the email. I may try to get 4x 1080 to try out the float 16.
For now I am just saving intermediate results in lower precision to save
memory. Hopefully 1080 will be plentiful soon :)!

Thanks again for your replies.

Brendan

From: rshpeley [mailto:notifications@github.com]
Sent: 15 July 2016 21:56
To: Theano/Theano Theano@noreply.github.com
Cc: brendanruff brendan@ruffguide.com; Comment <
comment@noreply.github.com>
Subject: Re: [Theano/Theano] Float16 status/follow-up (#2908)

@nouiz https://github.com/nouiz

When we have a GPU that do computation in float16 (we order a few
1080 for this) we can implement the computation in float16.

Do you mean nVidia did not send you early production of the 1080?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub <
https://github.com/Theano/Theano/issues/2908#issuecomment-233054625> , or
mute the thread <
https://github.com/notifications/unsubscribe-auth/AIi0x5jbo6eaxVlTunai0BcH09IxZN3eks5qV-XSgaJpZM4Eavnf>
. <
https://github.com/notifications/beacon/AIi0xwZGkQqTdO6pWSL1DyB0v2lYYyscks5qV-XSgaJpZM4Eavnf.gif>

No virus found in this message.
Checked by AVG - www.avg.com http://www.avg.com
Version: 2016.0.7640 / Virus Database: 4627/12626 - Release Date: 07/16/16


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2908 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AALC-6kNLILNzHJmaK6g8WW13a7MEbRbks5qWOCNgaJpZM4Eavnf
.

Member

nouiz commented Jul 19, 2016

We don't get gamer GPU from NVIDIA. We rarely buy that anymore. Mostly
Titan GPUs.

We got 2 1080 now. Not sure when we can work on the computation in float16.
Just don't forget that computation in float16 will be slow on 1080. They
kept the minimal number of float16 unit to make a smaller chip (so it cost
less). I expect slowdown of using float16 on 1080 vs float32.

On Sat, Jul 16, 2016 at 9:45 AM, brendanruff notifications@github.com
wrote:

Hi Frederic

Thanks for the email. I may try to get 4x 1080 to try out the float 16.
For now I am just saving intermediate results in lower precision to save
memory. Hopefully 1080 will be plentiful soon :)!

Thanks again for your replies.

Brendan

From: rshpeley [mailto:notifications@github.com]
Sent: 15 July 2016 21:56
To: Theano/Theano Theano@noreply.github.com
Cc: brendanruff brendan@ruffguide.com; Comment <
comment@noreply.github.com>
Subject: Re: [Theano/Theano] Float16 status/follow-up (#2908)

@nouiz https://github.com/nouiz

When we have a GPU that do computation in float16 (we order a few
1080 for this) we can implement the computation in float16.

Do you mean nVidia did not send you early production of the 1080?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub <
https://github.com/Theano/Theano/issues/2908#issuecomment-233054625> , or
mute the thread <
https://github.com/notifications/unsubscribe-auth/AIi0x5jbo6eaxVlTunai0BcH09IxZN3eks5qV-XSgaJpZM4Eavnf>
. <
https://github.com/notifications/beacon/AIi0xwZGkQqTdO6pWSL1DyB0v2lYYyscks5qV-XSgaJpZM4Eavnf.gif>

No virus found in this message.
Checked by AVG - www.avg.com http://www.avg.com
Version: 2016.0.7640 / Virus Database: 4627/12626 - Release Date: 07/16/16


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2908 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AALC-6kNLILNzHJmaK6g8WW13a7MEbRbks5qWOCNgaJpZM4Eavnf
.

@brendanruff

This comment has been minimized.

Show comment
Hide comment
@brendanruff

brendanruff Jul 19, 2016

Hi Frederic

I think I need next to follow up with nVidia about the status of float16. The working assumption is that the float32 units would reconfigure to 2x float16 units. Since this is not the case then float32 is the way forward. I also am interested in lower precision and even integer (eg int16 or int8). But this I will need to follow up with direct use of CUDA libraries I think.

Thanks for the help !

Brendan Ruff

From: Frédéric Bastien [mailto:notifications@github.com]
Sent: 19 July 2016 16:32
To: Theano/Theano Theano@noreply.github.com
Cc: brendanruff brendan@ruffguide.com; Comment comment@noreply.github.com
Subject: Re: [Theano/Theano] Float16 status/follow-up (#2908)

We don't get gamer GPU from NVIDIA. We rarely buy that anymore. Mostly
Titan GPUs.

We got 2 1080 now. Not sure when we can work on the computation in float16.
Just don't forget that computation in float16 will be slow on 1080. They
kept the minimal number of float16 unit to make a smaller chip (so it cost
less). I expect slowdown of using float16 on 1080 vs float32.

On Sat, Jul 16, 2016 at 9:45 AM, brendanruff <notifications@github.com mailto:notifications@github.com >
wrote:

Hi Frederic

Thanks for the email. I may try to get 4x 1080 to try out the float 16.
For now I am just saving intermediate results in lower precision to save
memory. Hopefully 1080 will be plentiful soon :)!

Thanks again for your replies.

Brendan

From: rshpeley [mailto:notifications@github.com]
Sent: 15 July 2016 21:56
To: Theano/Theano <Theano@noreply.github.com mailto:Theano@noreply.github.com >
Cc: brendanruff <brendan@ruffguide.com mailto:brendan@ruffguide.com >; Comment <
comment@noreply.github.com mailto:comment@noreply.github.com >
Subject: Re: [Theano/Theano] Float16 status/follow-up (#2908)

@nouiz https://github.com/nouiz

When we have a GPU that do computation in float16 (we order a few
1080 for this) we can implement the computation in float16.

Do you mean nVidia did not send you early production of the 1080?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub <
https://github.com/Theano/Theano/issues/2908#issuecomment-233054625> , or
mute the thread <
https://github.com/notifications/unsubscribe-auth/AIi0x5jbo6eaxVlTunai0BcH09IxZN3eks5qV-XSgaJpZM4Eavnf>
. <
https://github.com/notifications/beacon/AIi0xwZGkQqTdO6pWSL1DyB0v2lYYyscks5qV-XSgaJpZM4Eavnf.gif>

No virus found in this message.
Checked by AVG - www.avg.com http://www.avg.com http://www.avg.com
Version: 2016.0.7640 / Virus Database: 4627/12626 - Release Date: 07/16/16


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2908 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AALC-6kNLILNzHJmaK6g8WW13a7MEbRbks5qWOCNgaJpZM4Eavnf
.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub #2908 (comment) , or mute the thread https://github.com/notifications/unsubscribe-auth/AIi0x26l4o_O8UP0XWQ6-y6NWSvnlSphks5qXN_LgaJpZM4Eavnf . https://github.com/notifications/beacon/AIi0xzx78uWKy8ptWQFvpKgTLeMU2glCks5qXN_LgaJpZM4Eavnf.gif

No virus found in this message.
Checked by AVG - www.avg.com http://www.avg.com
Version: 2016.0.7688 / Virus Database: 4627/12644 - Release Date: 07/19/16

brendanruff commented Jul 19, 2016

Hi Frederic

I think I need next to follow up with nVidia about the status of float16. The working assumption is that the float32 units would reconfigure to 2x float16 units. Since this is not the case then float32 is the way forward. I also am interested in lower precision and even integer (eg int16 or int8). But this I will need to follow up with direct use of CUDA libraries I think.

Thanks for the help !

Brendan Ruff

From: Frédéric Bastien [mailto:notifications@github.com]
Sent: 19 July 2016 16:32
To: Theano/Theano Theano@noreply.github.com
Cc: brendanruff brendan@ruffguide.com; Comment comment@noreply.github.com
Subject: Re: [Theano/Theano] Float16 status/follow-up (#2908)

We don't get gamer GPU from NVIDIA. We rarely buy that anymore. Mostly
Titan GPUs.

We got 2 1080 now. Not sure when we can work on the computation in float16.
Just don't forget that computation in float16 will be slow on 1080. They
kept the minimal number of float16 unit to make a smaller chip (so it cost
less). I expect slowdown of using float16 on 1080 vs float32.

On Sat, Jul 16, 2016 at 9:45 AM, brendanruff <notifications@github.com mailto:notifications@github.com >
wrote:

Hi Frederic

Thanks for the email. I may try to get 4x 1080 to try out the float 16.
For now I am just saving intermediate results in lower precision to save
memory. Hopefully 1080 will be plentiful soon :)!

Thanks again for your replies.

Brendan

From: rshpeley [mailto:notifications@github.com]
Sent: 15 July 2016 21:56
To: Theano/Theano <Theano@noreply.github.com mailto:Theano@noreply.github.com >
Cc: brendanruff <brendan@ruffguide.com mailto:brendan@ruffguide.com >; Comment <
comment@noreply.github.com mailto:comment@noreply.github.com >
Subject: Re: [Theano/Theano] Float16 status/follow-up (#2908)

@nouiz https://github.com/nouiz

When we have a GPU that do computation in float16 (we order a few
1080 for this) we can implement the computation in float16.

Do you mean nVidia did not send you early production of the 1080?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub <
https://github.com/Theano/Theano/issues/2908#issuecomment-233054625> , or
mute the thread <
https://github.com/notifications/unsubscribe-auth/AIi0x5jbo6eaxVlTunai0BcH09IxZN3eks5qV-XSgaJpZM4Eavnf>
. <
https://github.com/notifications/beacon/AIi0xwZGkQqTdO6pWSL1DyB0v2lYYyscks5qV-XSgaJpZM4Eavnf.gif>

No virus found in this message.
Checked by AVG - www.avg.com http://www.avg.com http://www.avg.com
Version: 2016.0.7640 / Virus Database: 4627/12626 - Release Date: 07/16/16


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2908 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AALC-6kNLILNzHJmaK6g8WW13a7MEbRbks5qWOCNgaJpZM4Eavnf
.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub #2908 (comment) , or mute the thread https://github.com/notifications/unsubscribe-auth/AIi0x26l4o_O8UP0XWQ6-y6NWSvnlSphks5qXN_LgaJpZM4Eavnf . https://github.com/notifications/beacon/AIi0xzx78uWKy8ptWQFvpKgTLeMU2glCks5qXN_LgaJpZM4Eavnf.gif

No virus found in this message.
Checked by AVG - www.avg.com http://www.avg.com
Version: 2016.0.7688 / Virus Database: 4627/12644 - Release Date: 07/19/16

@guoxuesong

This comment has been minimized.

Show comment
Hide comment
@guoxuesong

guoxuesong Oct 6, 2016

I have a TX1. Theano require CUDA7.5 for float16 dot, but TX1 only has CUDA7.0
As mentioned in "https://devtalk.nvidia.com/default/topic/931678/jetson-tx1/64-bit-cuda-7-5-on-the-tx1/"
Instead of releasing a separate CUDA toolkit 7.5 for Jetson TX1, features and maintenance fixes relevant to TX1 were backported and released as v7.0
The following code in theano/gpuarray/opt.py disabled float16 dot support for TX1:

if nvcc_compiler.nvcc_version < '7.5':
    _logger.warning("Not performing dot of float16 on the GPU since "
                    "cuda 7.5 is not available. Updating could speed up "
                    "your code.")
    return

I tried commenting this and get theano works in my TX1. Using the lasagne mnist.py example, the best result is, for mlp it works and faster than float32, for cnn it works but a bit slower than float32, I changed dnn.conv.precision=as_input_f32 to float16, it's much slower. But memory was saved, this is good.

But now it's broken, GPU is idle, only CPU is used by theano. 20161006.

guoxuesong commented Oct 6, 2016

I have a TX1. Theano require CUDA7.5 for float16 dot, but TX1 only has CUDA7.0
As mentioned in "https://devtalk.nvidia.com/default/topic/931678/jetson-tx1/64-bit-cuda-7-5-on-the-tx1/"
Instead of releasing a separate CUDA toolkit 7.5 for Jetson TX1, features and maintenance fixes relevant to TX1 were backported and released as v7.0
The following code in theano/gpuarray/opt.py disabled float16 dot support for TX1:

if nvcc_compiler.nvcc_version < '7.5':
    _logger.warning("Not performing dot of float16 on the GPU since "
                    "cuda 7.5 is not available. Updating could speed up "
                    "your code.")
    return

I tried commenting this and get theano works in my TX1. Using the lasagne mnist.py example, the best result is, for mlp it works and faster than float32, for cnn it works but a bit slower than float32, I changed dnn.conv.precision=as_input_f32 to float16, it's much slower. But memory was saved, this is good.

But now it's broken, GPU is idle, only CPU is used by theano. 20161006.

@astooke

This comment has been minimized.

Show comment
Hide comment
@astooke

astooke Apr 19, 2017

Wondering what is the latest on Float16 computation on GPUs that support it? CuDNN v6 seems to have pretty good Float16 support by now, with at least one ConvolutionForward operation and ConvlutionBackward operation truly computing in Float16 (CuDNN User Guide has the details). For my purposes I'm interested in the computation speedup, nothing about memory.

Received a note here Theano/libgpuarray#411 that no Float16 computations are supported yet?

astooke commented Apr 19, 2017

Wondering what is the latest on Float16 computation on GPUs that support it? CuDNN v6 seems to have pretty good Float16 support by now, with at least one ConvolutionForward operation and ConvlutionBackward operation truly computing in Float16 (CuDNN User Guide has the details). For my purposes I'm interested in the computation speedup, nothing about memory.

Received a note here Theano/libgpuarray#411 that no Float16 computations are supported yet?

@astooke

This comment has been minimized.

Show comment
Hide comment
@astooke

astooke Apr 19, 2017

Found a good summary from earlier today by @nouiz under #5868 ...

The way Theano added the support of float16 to the GPU ops, is that when we
read/write to the GPU memory, we convert to float32 and do the computation
in float32. This way, all GPUs can have the storage saving.

Sadly, it is harder to implement float16 computation, as the c language
don't define a float16 type. So instead of doing x+y, we need to call
functions like cuda_add_float16(x, y). A work around is to make a class
that implement + and other such operators. This isn't done.

Getting the speed up from CUBLAS and CUDNN is probably trivial, just pass
the right parameter to those functions or call the right function. But we
didn't do it yet. We aren't sure of the user interface for now.

But anyway, the first step is to make it work for float16 storage! Only
when that one is done moving to float16 computation make sence

warning, I have heard it isn't trivial to use float16 for computation. It
have not enough precission in many cases and need adjustment to the model.
In GTC, NVIDIA have a talk about that.

astooke commented Apr 19, 2017

Found a good summary from earlier today by @nouiz under #5868 ...

The way Theano added the support of float16 to the GPU ops, is that when we
read/write to the GPU memory, we convert to float32 and do the computation
in float32. This way, all GPUs can have the storage saving.

Sadly, it is harder to implement float16 computation, as the c language
don't define a float16 type. So instead of doing x+y, we need to call
functions like cuda_add_float16(x, y). A work around is to make a class
that implement + and other such operators. This isn't done.

Getting the speed up from CUBLAS and CUDNN is probably trivial, just pass
the right parameter to those functions or call the right function. But we
didn't do it yet. We aren't sure of the user interface for now.

But anyway, the first step is to make it work for float16 storage! Only
when that one is done moving to float16 computation make sence

warning, I have heard it isn't trivial to use float16 for computation. It
have not enough precission in many cases and need adjustment to the model.
In GTC, NVIDIA have a talk about that.

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz Apr 19, 2017

Member
Member

nouiz commented Apr 19, 2017

@astooke

This comment has been minimized.

Show comment
Hide comment
@astooke

astooke Apr 20, 2017

Sure! I will try that soon.

astooke commented Apr 20, 2017

Sure! I will try that soon.

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz Apr 20, 2017

Member
Member

nouiz commented Apr 20, 2017

@aditbhrgv

This comment has been minimized.

Show comment
Hide comment
@aditbhrgv

aditbhrgv Jul 25, 2017

Hello,

I am looking for FP16/INT8 implementation in Theano for quantized (FP32 to INT8) matrix-matrix multiplications with TensorRT which has support for both along with cuDNN v6. I am intending to use GTX 1080 ti Nvidia GPU with DP4A and DP2A support.
Can anyone please let me know what's latest with Theano ?

Thanks
Adit

aditbhrgv commented Jul 25, 2017

Hello,

I am looking for FP16/INT8 implementation in Theano for quantized (FP32 to INT8) matrix-matrix multiplications with TensorRT which has support for both along with cuDNN v6. I am intending to use GTX 1080 ti Nvidia GPU with DP4A and DP2A support.
Can anyone please let me know what's latest with Theano ?

Thanks
Adit

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz Aug 8, 2017

Member
Member

nouiz commented Aug 8, 2017

@aditbhrgv

This comment has been minimized.

Show comment
Hide comment
@aditbhrgv

aditbhrgv Sep 9, 2017

Hello,
I want to implement a theano op T.dot which can multiply in 8-bit quantized calculations(weights, bias and activations) to make use of Pascal arc GPUs like 1080 TI which has 4 8-bit multiplications feature. dp4a product.

Currently , we typecast the the input and weights to fp32.

Can you please let me know how to implement this below ?

//TODO: write a custom 8-bit dot product
zq = T.dot(T.cast(xq,'float32'),T.cast(wq,'float32'))

aditbhrgv commented Sep 9, 2017

Hello,
I want to implement a theano op T.dot which can multiply in 8-bit quantized calculations(weights, bias and activations) to make use of Pascal arc GPUs like 1080 TI which has 4 8-bit multiplications feature. dp4a product.

Currently , we typecast the the input and weights to fp32.

Can you please let me know how to implement this below ?

//TODO: write a custom 8-bit dot product
zq = T.dot(T.cast(xq,'float32'),T.cast(wq,'float32'))

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz Oct 2, 2017

Member

Using a unrelated issue don't help to bring the attention to your need.

Normally, I would suggest to open a new issue about that.

But due to https://groups.google.com/d/topic/theano-users/7Poq8BZutbY/discussion, I don't think we will do it.

But if someone make a PR, we would review it.

What need to be done:

  • modify the op GpuDot22 and/or GpuGemm in gpuarray/blas.py to accept int8 as input. (doing first dot22 would be easier and probably meet your need, the GpuGemm is mostly an optimization mostly used during training).
  • make an optimization in gpuarray/opt.py from the op tensor.basic.Dot() to GpuDot22 (int8 won't have all the CPU optimizations that convert it to dot22 as CPU blas don't support that).
  • Modify libgpuarray to support it.
Member

nouiz commented Oct 2, 2017

Using a unrelated issue don't help to bring the attention to your need.

Normally, I would suggest to open a new issue about that.

But due to https://groups.google.com/d/topic/theano-users/7Poq8BZutbY/discussion, I don't think we will do it.

But if someone make a PR, we would review it.

What need to be done:

  • modify the op GpuDot22 and/or GpuGemm in gpuarray/blas.py to accept int8 as input. (doing first dot22 would be easier and probably meet your need, the GpuGemm is mostly an optimization mostly used during training).
  • make an optimization in gpuarray/opt.py from the op tensor.basic.Dot() to GpuDot22 (int8 won't have all the CPU optimizations that convert it to dot22 as CPU blas don't support that).
  • Modify libgpuarray to support it.
@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz Oct 2, 2017

Member

Note, I consider dot in int8 support a different feature then float16 support.

Member

nouiz commented Oct 2, 2017

Note, I consider dot in int8 support a different feature then float16 support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment