Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VarInfo refactor #5

Closed
willtebbutt opened this issue Nov 27, 2019 · 5 comments
Closed

VarInfo refactor #5

willtebbutt opened this issue Nov 27, 2019 · 5 comments

Comments

@willtebbutt
Copy link
Contributor

Directly copied from discussion in TuringLang/Turing.jl#965 to avoid side-tracking the PR:

I'm trying to work through this compiler change and have some thoughts on VarInfo that I would appreciate some feedback on. I've also left some naming-related comments that I believe would be helpful to address for readability's sake.

Regarding the internals of VarInfo / MetaData, there are a lot of things where you need to pull elements out of a field of metadata based on metadata.idcs[vn] e.g.

md.ranges[md.idcs[vn]]]
md.orders[md.idcs[vn]]
md.gids[md.idcs[vn]]
md.dists[md.idcs[vn]]
It's also the case that everything is linearised when running the model. While it's clearly necessary to be able to linearise the state of the model at some point for the current implementation of HMC, it's not clear to me that it need to be the default mode of operation, and appears to add considerable complexity to the internals.

As such I would like to propose the following design change for discussion (perhaps a separate issue is required - again, advice appreciated):

introduce a new type Variable with fields val, gid (possibly not this? Presumably it really belongs in the gibbs sampler object), name, dist, and any other information that needs to be associated with a particular random variable on a particular run. The key point here is that information specific to a particular RV is stored with a particular RV, so there would be no need to linearise the data into a vector or construct an index map from location(s) in vector to random variable.
refactor VarInfo to contain a Vector (or other container, obviously one that is amenable to Zygote-based AD and type-stability) of Variables, in addition to logp and num_produce.
This would:

enhance readability and reduce mental overhead (imho)
enable better mixing of types. Currently if the values of every variable have the same type (usually Float64) the vals vector is very simple. The proposed design avoids the need to store all of the things in one monolithic vector, thus avoiding the need for a widely typed Vector of values.
separate out linearisation of model state from running of the model. We would implement a separate function, call it to_vec, which would efficiently convert a VarInfo instance into a vector of numbers (potentially of a number of different types, or perhaps to a single mutually acceptable type), in addition to a function that converts a vector of the same length as the one produced into a VarInfo object that is identical to the current VarInfo object in every way other than the vals of the Variables. Separating things out like this has the distinct advantage that if linearisation is not required for a particular algorithm (e.g. MH, SMC, Gibbs sampling, particle MCMC etc) then linearisation never needs to happen. Conversely, if linearisation is required then it can still be achieved efficiently. Moreover, I anticipate linearisation becoming a less important feature over time as the AD systems develop and begin to rely on ChainRules / ChainRulesCore, which play nicely with interestingly-structured data types. It's also worth noting that indexing, especially scalar, is a particularly inefficient operation for most reverse-mode AD systems (inc. Zygote, Tracker) and this is unlikely to fundamentally change any time soon. As such, avoiding the conversion between linear Vector and structured data may well be desirable from a performance perspective as it is indexing-heavy.
create opportunities for mixing static and dynamic structure. It seems like this aught to make mixing of static and dynamic model components more straightforward, but this is more of a hunch than anything else as I'm not sure what that mixing really looks like at the minute.
Choice over the type of the Variable "vector". It could actually be a Vector, but this doesn't seem likely to lead to type stable code. Alternatively the variables could be stored in a NamedTuple if we're able to enumerate all possible variable names that could crop up in a programme from the programme code itself (this seems feasible to me, but I might well be missing something). If there exists an example where this doesn't hold (i.e. the number of possible variable names is dynamic -- it's fine for some of them not to exist dynamically) that would be very much appreciated.
Would appreciate any feedback. If I'm missing something glaringly obvious here, then possibly I'm missing some general piece of understanding regarding the internals.

@mohamed82008
Copy link
Member

So the Vector of variables thing won't be type stable as you said. A NamedTuple of Variables would. Currently, we use Metadata to basically refer to Variable but it's all linearized. Looking for ways to avoid the linearization is an interesting direction to pursue, but the main problem here is that we don't know the shape and number of random variables ahead of time and the order in which they pop up. In particle samplers, variables can also be deleted halfway and so VarInfo was designed to be able to handle such cases.

There is room for having alternative implementations of Metadata or Variable though which make certain assumptions about the model. For example, if we can assume the size of x is constant between one model run and the next, one can infer the shape and type of x after a single run of the model. This kind of restriction requires an explicit "consent" from the user in the form of another macro like @fixed_size_model for example or something similar. Another idea would be to provide a way for the user to specify the type and size of the random variable x used in x[i] ~ .... That is to tell the user to use some other line instead of x = Vector{T}(undef, ...), e.g. x = Vector{T}(undef, ...) |> FixedSize. Then in the ~ lines, we can check whether x is FixedSize or not and handle it differently using dispatch. Variables which only appear in the form x ~ dist and not x[..] ~ dist can also be assumed to be fixed size by default. This is the kind of thing that DynamicPPL should make easier to experiment with. Ideally, we should have a minimal API between the Core module and the rest of Turing that doesn't depend on the implementation details of Core. That way I am free to swap things around in Core without (significantly) affecting the rest of Turing. I will play with this some more during the Christmas break.

@mohamed82008
Copy link
Member

This is also in line with TuringLang/Turing.jl#888.

@mohamed82008 mohamed82008 transferred this issue from TuringLang/Turing.jl Dec 25, 2019
@devmotion
Copy link
Member

I think we should prioritize this issue, see also my comment in TuringLang/Turing.jl#1199 (comment):

I'm just dreaming but maybe we could avoid these issues with mutation and type instabilities by replacing VarInfo with a strictly typed structure that's basically an extended NamedTuple such that the type tells us from which VarNames we have sampled and of which type their samples are? Instead of push!ing we would just create a new VarInfo with one additional sample (like merging/extending a NamedTuple). Additionally, the samples would not be vectorized and would not be transformed but we would just store the samples as they are and get rid of all the vectorization and transformation problems currently. Transformation or vectorization would just be done when it is actually needed (e.g., when computing HMC steps), since then all information about the samples (such as dimension etc) is available at all times without any hacks. When executing assume statements with SampleFromPrior or SampleUniform we would just dispatch to different methods that either just return the current sample or add a new sample. It seems then every execution of the model could be type stable and computing gradients should be fine with Zygote as well.

Regarding @mohamed82008's comment above,

Looking for ways to avoid the linearization is an interesting direction to pursue, but the main problem here is that we don't know the shape and number of random variables ahead of time and the order in which they pop up.

I don't think it's necessary to know anything about the order or shape or number of random variables ahead of time. We just shouldn't try to create a container ahead of time but add samples whenever they appear. We just run the model function, and whenever a new sample is needed, we add it to the "NamedTuple on steroids". At this time point we know the type and the value of the sample.

@phipsgabler
Copy link
Member

"NamedTuple on steroids": so, persistent, strongly-typed extensible records? With lenses?

@phipsgabler
Copy link
Member

Closing in favour of AbstractPPL discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants