Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable dtypes for all layers and initializers #513

Closed
wants to merge 2 commits into from

Conversation

eamartin
Copy link

@eamartin eamartin commented Dec 9, 2018

Hi
I thought Flux could benefit from easier configuration of dtypes, so I tried implementing something myself.
I based the FloatX default dtype variable off of Theano: http://deeplearning.net/software/theano/library/config.html .

Things I'm looking for feedback on:

  • Currently there's no good way to set FloatX. Should we make FloatX configurable by environment variable or from a config variable? Is there a good way the user could set FloatX before all of the functions with optional argument dtype=FloatX get interpreted?
  • This will break any user's initializers that do not have a (::Type, size...) method. Is that OK at this point?
  • This is some of the first Julia code I've written, so interested in anywhere I violated best practice / code hygiene / etc.

Changes:

  • All layers and initializers now optionally take a dtype argument.
  • There is a new variable Flux.FloatX that defines the default
    floating point type (currently Float32).
  • defined eltype(TrackedReal{T}) = T

Changes:
* All layers and initializers now optionally take a dtype argument.
* There is a new variable Flux.FloatX that defines the default
  floating point type (currently Float32).
* defined eltype(TrackedReal{T}) = T
@eamartin
Copy link
Author

eamartin commented Dec 9, 2018

I realized that I forgot to modify the optimizers to take an optional dtype argument that is defaulted to FloatX.
After a brief look, I do not fully understand what's going on (particularly with the deprecations). If we're going to go with this system of configuring floating point precision, I think it should also be extended to the optimizers at some point.

@Roger-luo
Copy link
Contributor

I personally prefer just use a type parameter. It might not be type stable with dtype. And this is consistent with most other Julia objects, e.g Array

@mcabbott
Copy link
Member

The function gpu converts to Float32, and acts on arrays, layers, models. Perhaps it and cpu should take a type argument, like Dense(784,128) |> cpu(Float16)?

@eamartin
Copy link
Author

@Roger-luo Would adding a type parameter still still support a user directly passing in their own weight matrices? For instance, if a user tries to create a Dense layer with Float32 weights and Float64 bias then what should happen? (pardon my lack of Julia knowledge here)

@mcabbott I like this syntax. It would be nice to avoid the initial allocation in float64 (or any non-desired precision). This could be particularly painful with large embedding layers and/or running in a memory-limited environment.

@Roger-luo
Copy link
Contributor

Roger-luo commented Dec 27, 2018

Yes, it will prevent user from using different precision within a single layer, .e.g

Dense can only has one type, it is either Float64 or Float32. I don't think using Float32 and Float64 will be a common case, in most cases I think within a single layer, the element type should be the same. And this should always error at the construction time, (this is sometimes painful in PyTorch, since it errors util runtime).

Apparently, when you pass an instance of Dense to cpu(Float32), you mean you want to convert the whole element to Float32 on cpu, and it is rare to convert Dense{Float64, Float32} to something else like Dense{Float32, Float64}. Thus I believe it is not necessary to allow this flexibility as builtin feature.

If someone want to use Float64 and Float32 within a single layer, it is always easy to create a custom wrapper around that.

And I'll vote for @mcabbott 's syntax as well.

@mcabbott
Copy link
Member

One more thought is that, if I understand right, Dense() etc. always creates the object on the CPU. If things are so large that creating and then converting / moving is an issue, then perhaps the first problem to solve is not creating it with Float16, but with CuArray{Float16}. Could some macro @gpu Float16 Chain( Dense( ... do this?

@MikeInnes
Copy link
Member

Thanks for this patch, and apologies for not getting back on it sooner. I think this is generally the right idea and very useful functionality but I decided to do it a little differently; in particular, rather than adding options to layers we can make this more similar to gpu and cpu, i.e. functions of any model that will work even with user-defined layers.

Thanks for pushing this along anyhow! Let me know if I can help out with any other contributions.

@MikeInnes MikeInnes closed this Mar 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants