## Silverbox

Silverbox is the name of a nonlinear system identification benchmark, proposed in 2004. Data, baselines and more info can be found at http://nonlinearbenchmark.org/#Silverbox.

State-space formulation of Silverbox's dynamics:

$$\begin{align}
\mu \frac{d^2 x(t)}{dt^2} + \nu \frac{d x(t)}{dt} + \kappa(x(t)) x(t) =&\ u(t) + w(t) \\
\kappa(x(t)) =&\ \alpha + \beta x^2(t) \\
y(t) =&\ x(t) + e(t)
\end{align}$$

where
$$\begin{align}
\mu     =&\ \text{mass} \\
\nu     =&\ \text{viscous damping} \\
\kappa(x(t)) =&\ \text{nonlinear spring} \\
y(t)    =&\ \text{observation (displacement)} \\
x(t)    =&\ \text{state (displacement)} \\
u(t)    =&\ \text{force} \\
e(t)    =&\ \text{measurement noise} \\
w(t)    =&\ \text{process noise}
\end{align}$$

### Steps to solve

I now take a series of steps to re-write this problem:

#### 1. Assume constant spring coefficient κ

$$ \mu \frac{d^2 x(t)}{dt^2} + \nu \frac{d x(t)}{dt} + κ x(t) = u(t) + w(t)$$

#### 2. Divide by leading coefficient

$$ \frac{d^2 x(t)}{dt^2} + \frac{\nu}{\mu} \frac{d x(t)}{dt} + \frac{κ}{\mu} x(t) = \frac{1}{\mu} u(t) + \frac{1}{\mu} w(t)$$

#### 3. Substitute standard variables

$$ \frac{d^2 x(t)}{dt^2} + 2\zeta \omega_0 \frac{d x(t)}{dt} + \omega_0^2 x(t) - \frac{u(t)}{\mu} = \frac{w(t)}{\mu}$$

where $$\begin{align} 
\zeta =&\ \frac{\nu}{2\sqrt{\mu \kappa}} \\ 
\omega_0 =&\ \sqrt{\frac{\kappa}{\mu}} \, .
\end{align}$$

#### 4. Apply Euler's method to obtain difference equation (step size is 1)

-> Forward Euler:

$$\begin{align}
\frac{x(t+2h)-2x(t+h)+x(t)}{h^2} + 2\zeta \omega_0 \frac{x(t+h)-x(t)}{h} + \omega_0^2 x(t) - \frac{u(t)}{\mu} =&\ \frac{w(t)}{\mu} \\
x(t+2) + 2(\zeta \omega_0 - 1) x(t+1) + (1 - 2 \zeta \omega_0 + \omega_0^2) x(t) - \frac{u(t)}{\mu} =&\ \frac{w(t)}{\mu} 
\end{align}$$

-> Backward Euler:

$$\begin{align}
\frac{x(t)-2x(t-h)+x(t-2h)}{h^2} - 2\zeta \omega_0 \frac{x(t)-x(t-h)}{h} - \omega_0^2 x(t) - \frac{u(t)}{\mu} =&\ \frac{w(t)}{\mu}  \\
(1 + 2 \zeta \omega_0 + \omega_0^2)x(t) - 2(1 + \zeta \omega_0)x(t-1) - x(t-2) - \frac{u(t)}{\mu} =&\ \frac{w(t)}{\mu} \\
x(t) - \frac{2(1 + \zeta \omega_0)}{(1 + 2\zeta \omega_0 + \omega_0^2)}x(t-1) - \frac{1}{(1 + 2\zeta \omega_0 + \omega_0^2)}x(t-2) - \frac{u(t)}{\mu(1 + 2\zeta \omega_0 + \omega_0^2)} =&\ \frac{w(t)}{\mu(1 + 2\zeta \omega_0 + \omega_0^2)}
\end{align}$$
    
Change to shorthand notation:

$$x_t - \alpha x_{t-1} - \beta x_{t-2} - \gamma u_t = \gamma w_t$$

where 
$$\begin{align} 
\alpha =&\ \frac{2(1 + \zeta \omega_0)}{1 + 2\zeta \omega_0 + \omega_0^2} \\
\beta =&\ \frac{1}{1 + 2\zeta \omega_0 + \omega_0^2} \\
\gamma =&\ \frac{1}{\mu(1 + 2 \zeta \omega_0 + \omega_0^2)}
\end{align}$$

#### 5. Convert to multivariate first-order difference form

Stick to backward Euler (matches AR structure)
- Backward Euler:

    $$z_t = M z_{t-1} + N u_t + N w_t$$

    where $z_t = [x_t\ \ x_{t-1}]$, $M = [α \ \ β; 1\ \ 0]$, $N = [γ\ \ 0]$

#### 6. Convert to Gaussian probability

- Backward Euler:

$$z_t \sim \mathcal{N}(M z_{t-1} + N u_t, N \tau)$$

where $w_t \sim \mathcal{N}(0, \tau)$

#### 7. Observation likelihood

$$y_t \sim \mathcal{N}(c z_t, σ)$$

where $e_t \sim \mathcal{N}(0, \sigma)$, $c = [1\ \ 0]$

Now, I need priors for $\alpha$, $\beta$, $\gamma$, $\tau$, $\sigma$. Given three equations and three unknowns, I can recover $\zeta$, $\omega_0$ and $\mu$ from $\alpha$, $\beta$, and $\gamma$. The variables are all strictly positive, which means they can be modeled by gamma distributions:

$$\begin{align}
\alpha \sim&\ Γ(1, 1e3) \\
\beta \sim&\ Γ(1, 1e3) \\
\gamma \sim&\ Γ(1, 1e3) \\
\tau \sim&\ Γ(1, 1e3) \\
\sigma \sim&\ Γ(1, 1e3) 
\end{align}$$

--> Implementation with ForneyLab and AR node

In [1]:
# Generate time-series
using CSV
using DataFrames
using Plots

# Read data from CSV file
df = CSV.read("../data/SNLS80mV.csv", ignoreemptylines=true)
df = select(df, [:V1, :V2])

# Shorthand
input = df[:,1]
output = df[:,2]

# Time horizon
T = size(df, 1)

131072

In [2]:
using ForneyLab
using LAR
using LAR.Node, LAR.Data
using ProgressMeter

I will introduce another shorthand: $\theta = [\alpha\ \ \beta]$ and use the Nonlinear node to provide the AR node with a Gaussian form for $\theta$.

In [8]:
# Start graph
graph = FactorGraph()

# Static parameters
@RV η ~ GaussianMeanPrecision(placeholder(:η_m, dims=(2,)), placeholder(:η_w, dims=(2,2)))
@RV τ ~ Gamma(placeholder(:τ_a), placeholder(:τ_b))

# Nonlinear function
g(x) = exp.(x)
g_inv(x) = log.(x)

# Nonlinear node
# @RV θ ~ Nonlinear(log_θ; g=g, g_inv=g_inv, dims=(2,))
@RV θ ~ Nonlinear(η; g=g, dims=(2,))

# I'm fixing measurement noise σ
σ = 100.

# Observation selection variable
c = [1, 0]

# State prior
@RV z_t ~ GaussianMeanPrecision(placeholder(:z_m, dims=(2,)), placeholder(:z_w, dims=(2, 2)))

# Autoregressive node
@RV x_t ~ Autoregressive(θ, z_t, τ)

# Specify likelihood
@RV y_t ~ GaussianMeanPrecision(dot(c, x_t), σ)

# Placeholder for data
placeholder(y_t, :y_t)

# Draw time-slice subgraph
ForneyLab.draw(graph)

In [9]:
# Infer an algorithm
q = PosteriorFactorization(z_t, x_t, θ, η, τ, ids=[:z, :x, :θ, :η, :τ])
algo = variationalAlgorithm(q, free_energy=true)
source_code = algorithmSourceCode(algo, free_energy=true)
eval(Meta.parse(source_code));
println(source_code)

begin

function stepτ!(data::Dict, marginals::Dict=Dict(), messages::Vector{Message}=Array{Message}(undef, 2))

messages[1] = ruleVBGammaOut(nothing, ProbabilityDistribution(Univariate, PointMass, m=data[:τ_a]), ProbabilityDistribution(Univariate, PointMass, m=data[:τ_b]))
messages[2] = ruleVariationalARIn3PPPN(marginals[:x_t], marginals[:z_t], marginals[:θ], nothing)

marginals[:τ] = messages[1].dist * messages[2].dist

return marginals

end

function stepz!(data::Dict, marginals::Dict=Dict(), messages::Vector{Message}=Array{Message}(undef, 2))

messages[1] = ruleVBGaussianMeanPrecisionOut(nothing, ProbabilityDistribution(Multivariate, PointMass, m=data[:z_m]), ProbabilityDistribution(MatrixVariate, PointMass, m=data[:z_w]))
messages[2] = ruleVariationalARIn1PNPP(marginals[:x_t], nothing, marginals[:θ], marginals[:τ])

marginals[:z_t] = messages[1].dist * messages[2].dist

return marginals

end

function initθ()

messages = Array{Message}(undef, 4)


return messages

end

function ste

In [10]:
# Inference parameters
num_iterations = 10

# Initialize marginal distribution and observed data dictionaries
data = Dict()
marginals = Dict()

# Initialize arrays of parameterizations
params_x = (zeros(2,T+1), repeat(1. .*float(eye(2)), outer=(1,1,T+1)))
params_z = (zeros(2,T+1), repeat(1. .*float(eye(2)), outer=(1,1,T+1)))
params_θ = (zeros(2,T+1), repeat(1. .*float(eye(2)), outer=(1,1,T+1)))
params_η = (zeros(2,T+1), repeat(1. .*float(eye(2)), outer=(1,1,T+1)))
params_τ = (1e-3.*ones(1,T+1), ones(1,T+1))

# Start progress bar
p = Progress(T, 1, "At time ")

# Perform inference at each time-step
for t = 1:T

    # Update progress bar
    update!(p, t)

    # Initialize marginals
    marginals[:x_t] = ProbabilityDistribution(Multivariate, GaussianMeanPrecision, m=params_x[1][:,t], w=params_x[2][:,:,t])
    marginals[:z_t] = ProbabilityDistribution(Multivariate, GaussianMeanPrecision, m=params_z[1][:,t], w=params_z[2][:,:,t])
    marginals[:η] = ProbabilityDistribution(Multivariate, GaussianMeanPrecision, m=params_η[1][:,t], w=params_η[2][:,:,t])
    marginals[:θ] = ProbabilityDistribution(Multivariate, GaussianMeanPrecision, m=params_θ[1][:,t], w=params_θ[2][:,:,t])
    marginals[:τ] = ProbabilityDistribution(Univariate, Gamma, a=params_τ[1][1,t], b=params_τ[2][1,t])
    
    data = Dict(:y_t => output[t],
                :θ_m => params_θ[1][:,t],
                :η_m => params_η[1][:,t],
                :z_m => params_z[1][:,t],
                :θ_w => params_θ[2][:,:,t],
                :η_w => params_η[2][:,:,t],
                :z_w => params_z[2][:,:,t],
                :τ_a => params_τ[1][1,t],
                :τ_b => params_τ[2][1,t])

    # Iterate variational parameter updates
    for i = 1:num_iterations

        stepx!(data, marginals)
        stepz!(data, marginals)
        stepθ!(data, marginals)
        stepη!(data, marginals)
        stepτ!(data, marginals)
    end

    # Store current parameterizations of marginals
    params_x[1][:,t+1] = mean(marginals[:x_t])
    params_z[1][:,t+1] = mean(marginals[:z_t])
    params_η[1][:,t+1] = mean(marginals[:η])
    params_θ[1][:,t+1] = mean(marginals[:θ])
    params_x[2][:,:,t+1] = marginals[:x_t].params[:w]
    params_z[2][:,:,t+1] = marginals[:z_t].params[:w]
    params_η[2][:,:,t+1] = marginals[:η].params[:w]
    params_θ[2][:,:,t+1] = marginals[:θ].params[:w]
    params_τ[1][1,t+1] = marginals[:τ].params[:a]
    params_τ[2][1,t+1] = marginals[:τ].params[:b]

end

[32mAt time   0%|                                           |  ETA: 1 days, 11:42:21[39m

ErrorException: mean(𝒩(xi=[1.50, 0.07], w=[[3.80e+13, 1.36e+02][1.36e+02, 52.75]])
) is undefined because the distribution is improper.

In [6]:
function stepτ!(data::Dict, marginals::Dict=Dict(), messages::Vector{Message}=Array{Message}(undef, 2))

messages[1] = ruleVBGammaOut(nothing, ProbabilityDistribution(Univariate, PointMass, m=data[:τ_a]), ProbabilityDistribution(Univariate, PointMass, m=data[:τ_b]))
messages[2] = ruleVariationalARIn3PPPN(marginals[:x_t], marginals[:z_t], marginals[:θ], nothing)

marginals[:τ] = messages[1].dist * messages[2].dist

return marginals

end

function stepz!(data::Dict, marginals::Dict=Dict(), messages::Vector{Message}=Array{Message}(undef, 2))

messages[1] = ruleVBGaussianMeanPrecisionOut(nothing, ProbabilityDistribution(Multivariate, PointMass, m=data[:z_m]), ProbabilityDistribution(MatrixVariate, PointMass, m=data[:z_w]))
messages[2] = ruleVariationalARIn1PNPP(marginals[:x_t], nothing, marginals[:θ], marginals[:τ])

marginals[:z_t] = messages[1].dist * messages[2].dist

return marginals

end

function stepθ!(data::Dict, marginals::Dict=Dict(), messages::Vector{Message}=Array{Message}(undef, 4))

messages[1] = ruleVBGaussianMeanPrecisionOut(nothing, ProbabilityDistribution(Multivariate, PointMass, m=data[:η_m]), ProbabilityDistribution(MatrixVariate, PointMass, m=data[:η_w]))
messages[2] = ruleSPNonlinearUTOutNG(g, nothing, messages[1])
messages[3] = ruleVariationalARIn2PPNP(marginals[:x_t], marginals[:z_t], nothing, marginals[:τ])
messages[4] = ruleSPNonlinearUTIn1GG(g, g_inv, messages[3], nothing)

marginals[:η] = messages[1].dist * messages[4].dist
marginals[:θ] = messages[2].dist * messages[3].dist

return marginals

end

function stepη!(data::Dict, marginals::Dict=Dict(), messages::Vector{Message}=Array{Message}(undef, 4))

messages[1] = ruleVBGaussianMeanPrecisionOut(nothing, ProbabilityDistribution(Multivariate, PointMass, m=data[:η_m]), ProbabilityDistribution(MatrixVariate, PointMass, m=data[:η_w]))
messages[2] = ruleSPNonlinearUTOutNG(g, nothing, messages[1])
messages[3] = ruleVariationalARIn2PPNP(marginals[:x_t], marginals[:z_t], nothing, marginals[:τ])
messages[4] = ruleSPNonlinearUTIn1GG(g, g_inv, messages[3], nothing)

marginals[:η] = messages[1].dist * messages[4].dist
marginals[:θ] = messages[2].dist * messages[3].dist

return marginals

end

function stepx!(data::Dict, marginals::Dict=Dict(), messages::Vector{Message}=Array{Message}(undef, 4))

messages[1] = ruleVariationalAROutNPPP(nothing, marginals[:z_t], marginals[:θ], marginals[:τ])
messages[2] = ruleVBGaussianMeanPrecisionM(ProbabilityDistribution(Univariate, PointMass, m=data[:y_t]), nothing, ProbabilityDistribution(Univariate, PointMass, m=100.0))
messages[3] = ruleSPDotProductIn1GNP(messages[2], nothing, Message(Multivariate, PointMass, m=[1, 0]))
messages[4] = ruleSPDotProductOutNGP(nothing, messages[1], Message(Multivariate, PointMass, m=[1, 0]))

marginals[:variable_1] = messages[4].dist * messages[2].dist
marginals[:x_t] = messages[1].dist * messages[3].dist

return marginals

end

stepx! (generic function with 3 methods)