Configurable dtypes for all layers and initializers #513

eamartin · 2018-12-09T01:26:07Z

Hi
I thought Flux could benefit from easier configuration of dtypes, so I tried implementing something myself.
I based the FloatX default dtype variable off of Theano: http://deeplearning.net/software/theano/library/config.html .

Things I'm looking for feedback on:

Currently there's no good way to set FloatX. Should we make FloatX configurable by environment variable or from a config variable? Is there a good way the user could set FloatX before all of the functions with optional argument dtype=FloatX get interpreted?
This will break any user's initializers that do not have a (::Type, size...) method. Is that OK at this point?
This is some of the first Julia code I've written, so interested in anywhere I violated best practice / code hygiene / etc.

Changes:

All layers and initializers now optionally take a dtype argument.
There is a new variable Flux.FloatX that defines the default
floating point type (currently Float32).
defined eltype(TrackedReal{T}) = T

Changes: * All layers and initializers now optionally take a dtype argument. * There is a new variable Flux.FloatX that defines the default floating point type (currently Float32). * defined eltype(TrackedReal{T}) = T

eamartin · 2018-12-09T02:06:13Z

I realized that I forgot to modify the optimizers to take an optional dtype argument that is defaulted to FloatX.
After a brief look, I do not fully understand what's going on (particularly with the deprecations). If we're going to go with this system of configuring floating point precision, I think it should also be extended to the optimizers at some point.

Roger-luo · 2018-12-09T16:18:40Z

I personally prefer just use a type parameter. It might not be type stable with dtype. And this is consistent with most other Julia objects, e.g Array

mcabbott · 2018-12-20T16:39:28Z

The function gpu converts to Float32, and acts on arrays, layers, models. Perhaps it and cpu should take a type argument, like Dense(784,128) |> cpu(Float16)?

eamartin · 2018-12-22T22:47:41Z

@Roger-luo Would adding a type parameter still still support a user directly passing in their own weight matrices? For instance, if a user tries to create a Dense layer with Float32 weights and Float64 bias then what should happen? (pardon my lack of Julia knowledge here)

@mcabbott I like this syntax. It would be nice to avoid the initial allocation in float64 (or any non-desired precision). This could be particularly painful with large embedding layers and/or running in a memory-limited environment.

Roger-luo · 2018-12-27T22:18:24Z

Yes, it will prevent user from using different precision within a single layer, .e.g

Dense can only has one type, it is either Float64 or Float32. I don't think using Float32 and Float64 will be a common case, in most cases I think within a single layer, the element type should be the same. And this should always error at the construction time, (this is sometimes painful in PyTorch, since it errors util runtime).

Apparently, when you pass an instance of Dense to cpu(Float32), you mean you want to convert the whole element to Float32 on cpu, and it is rare to convert Dense{Float64, Float32} to something else like Dense{Float32, Float64}. Thus I believe it is not necessary to allow this flexibility as builtin feature.

If someone want to use Float64 and Float32 within a single layer, it is always easy to create a custom wrapper around that.

And I'll vote for @mcabbott 's syntax as well.

mcabbott · 2018-12-28T01:57:08Z

One more thought is that, if I understand right, Dense() etc. always creates the object on the CPU. If things are so large that creating and then converting / moving is an issue, then perhaps the first problem to solve is not creating it with Float16, but with CuArray{Float16}. Could some macro @gpu Float16 Chain( Dense( ... do this?

MikeInnes · 2019-01-25T10:19:19Z

Thanks for this patch, and apologies for not getting back on it sooner. I think this is generally the right idea and very useful functionality but I decided to do it a little differently; in particular, rather than adding options to layers we can make this more similar to gpu and cpu, i.e. functions of any model that will work even with user-defined layers.

Thanks for pushing this along anyhow! Let me know if I can help out with any other contributions.

eamartin added 2 commits December 8, 2018 19:11

Make dtypes configurable.

e522673

Changes: * All layers and initializers now optionally take a dtype argument. * There is a new variable Flux.FloatX that defines the default floating point type (currently Float32). * defined eltype(TrackedReal{T}) = T

Finish renaming test_dtype_works

8415d6d

eamartin mentioned this pull request Dec 9, 2018

MaxPool not working on latest master branch #501

Closed

Roger-luo mentioned this pull request Jan 5, 2019

Type conversion interface QuantumBFS/CuYao.jl#4

Open

MikeInnes mentioned this pull request Jan 25, 2019

Numeric precision utilities #572

Merged

MikeInnes closed this Mar 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable dtypes for all layers and initializers #513

Configurable dtypes for all layers and initializers #513

eamartin commented Dec 9, 2018

eamartin commented Dec 9, 2018

Roger-luo commented Dec 9, 2018

mcabbott commented Dec 20, 2018

eamartin commented Dec 22, 2018

Roger-luo commented Dec 27, 2018 •

edited

Loading

mcabbott commented Dec 28, 2018

MikeInnes commented Jan 25, 2019

Configurable dtypes for all layers and initializers #513

Configurable dtypes for all layers and initializers #513

Conversation

eamartin commented Dec 9, 2018

eamartin commented Dec 9, 2018

Roger-luo commented Dec 9, 2018

mcabbott commented Dec 20, 2018

eamartin commented Dec 22, 2018

Roger-luo commented Dec 27, 2018 • edited Loading

mcabbott commented Dec 28, 2018

MikeInnes commented Jan 25, 2019

Roger-luo commented Dec 27, 2018 •

edited

Loading