Skip to content
This repository has been archived by the owner on May 21, 2022. It is now read-only.

Save/load model/state and Specs #9

Open
tbreloff opened this issue Oct 11, 2016 · 0 comments
Open

Save/load model/state and Specs #9

tbreloff opened this issue Oct 11, 2016 · 0 comments

Comments

@tbreloff
Copy link
Member

Frequently it's not enough to have an in-memory representation of a model or the optimization algorithm state. We want to serialize the structure, parameters, and state of the models and sub-learners that we're working with.

I'd like to take this one step further, and define a generic "spec" for these stateful items that we can both serialize/deserialize, but which could be loaded into another "backend" easily. I'm using the same terminology as Plots on purpose... I think there are a lot of similarities in the problems we're looking to solve.

Some examples of backends:

  • Transformations, etc
  • Optim
  • TensorFlow
  • Theano
  • BayesNet

The idea is that, where there is overlapping functionality, there is the opportunity to generalize. Suppose we have a general concept "I want a 3 layer neural net with these numbers of nodes and relu activations, and these initial weight values". Lots of software implements this. If we build this information into a generic spec (similar to the Plot object in Plots), then we only need to connect the spec to a backend's constructor, and we have the ability to convert and transfer models between backends.

The same goes for optimization routines... many times there is a 1-1 mapping between, for example, an Adam optimizer in TensorFlow and Theano. But each backend reimplements the concept in a different way, and reinvents the wheel completely. The Plots model applies here as well. Define a generic concept Adam updater, then map from that to a backend's implementation.

The end result is that I'd like for models and learning algorithms to be built from specs, which then get mapped to sub-learners specific to a particular spec. This would allow us to serialize/deserialize a backend-agnostic spec, not actual Julia objects.

For example, it would allow us to experiment with structures/models/algos in something more flexible (JuliaML I hope), and then convert to a TensorFlow graph for pounding the GPU or sharing with other stubborn researchers that aren't using Julia.

This is closely related to JuliaPlots/Plots.jl#390, and I think design decisions can be used for both.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant