Skip to content

Variable ordering in VarInfo #833

@mhauru

Description

@mhauru

The order of the variables in a VarInfo is important, because in sampling we frequently vectorise and unflatten: vi_new = unflatten(vi, do_a_thing_with_vec(vi[:])). There's an implicit assumption in sample that the VarInfo has the same ordering from one iteration to the next. However, we don't have a very robust way of specifying the order in VarInfo. It's currently based on the order in which values are inserted into the VarInfo. subset used to mess with the order before #832, which caused a bug in Gibbs. subset isn't the only problem though, for instance the following could cause all sorts of trouble:

@model function f()
    z ~ Normal()
    if z > 0
        x ~ Normal(1.0)
        y ~ Normal(2.0)
    else
        y ~ Normal(2.0)
        x ~ Normal(1.0)
    end
end

In general the current reliance on insertion order feels fragile.

Could we develop a more robust approach to variable ordering? For instance, could we say that variables are ordered alphabetically by their VarName? This would make inserting new variables slower, but shouldn't have a performance impact once all variables are in place, which is the vast majority of sampling.


Only partially related, but note that even fixing the ordering wouldn't fix the below problem:

@model function f()
    z ~ Normal()
    if z > 0
        x_dim = 1
        y_dim = 2
    else
        x_dim = 2
        y_dim = 1
    end
    x = Vector{Float64}(undef, x_dim)
    y = Vector{Float64}(undef, y_dim)
    x .~ Normal(1.0)
    y .~ Normal(2.0)
end

Sampling this currently crashes for various reasons, but there's still a philosophical issue of if a sampler gives you a state for this model that is [0.1, 0.2, 0.3, 0.4], you can not know whether that should be taken as z = 0.1, x=[0.2], y=[0.3, 0.4] or z = 0.1, x=[0.2, 0.3], y=[0.4] without inspecting the f in detail. I'm not sure there's any way to fix this as long as samplers only see vectors of floats and are blind to varnames, but we should keep this case in mind in case we can think of a redesign that would fix it. Any such redesign would fix many variable ordering problems too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions