# Design note

## Motivation:
Ultimately, we want to compile the model definition into GraphPPL's `Model`, which gives us the following tasks to complete.
1. Because each array element can have different types, every element need to be individual node in the final graph, so we 
    need a way to support this. 
2. Figure out if the model is fully defined, most notably, given the input param/data, all the loop bounds (and if conditions)
    can be figured out by partial evaluation
3. Doing some static checking, including cases like: the RHS of stochastic assignments need to be distributions only, the 
    number of arguments for the distributions are correct, etc.
4. ...

## Design: 
### Partial Evaluation
I plan to rely on [Symbolics.substitute](https://symbolics.juliasymbolics.org/dev/manual/expression_manipulation/#SymbolicUtils.substitute) 
to do the partial evaluations. One important observation is that we don't care about stochastic assignments as they can be translated 
directly (more or less). So the idea is to create a dictionary of symbolic rules using input param/data and logical assignments. And just
use `substitute` to try to evaluate to a concrete value. 
```julia
using Symbolics: substitute

@variables a b
c = a + b
# the dictionary in the following line is called "rules"
substitute(c, Dict(a => 10, b => 5)) # 15
```

### Array
In short, all the array elements will be represented by a symbolic variable, and I plan to use array of symbolic variables to organize them. 
i.e., something like `:(:ref, :g, 3)` will give back a symbolics variable. 

### Passes
After some thoughts, I think one sensible way to achieve the functionalities is adopting the idea of writing compiler-like passes.
Here are the passes I plan to implement at the moment (in order):
1. Analyze the input param/data, save them into a global dictionary used as `rule`s for `substitute` 
2. Unroll the loops with constant loop bounds. The rationale behind this unroll is that user may define constant arrays using for loops, 
    and we may need these values to determine loop bounds.
3. Resolve `if` statements.
3. For all the logical assignment not in a loop right now (after first unrolling), for all the variable that are not defined, 
    make a symbolic variable with the variable name, then add `LHS => RHS` to the `rule` dictionary. If RHS is constant, then simply 
    add this mapping into the `rule` dictionary. 
4. Now we should be able to evaluate variable loop bounds to a number, given the definition is correct. So we unroll again. (This approach has an
    underlying assumption, that is all loop bounds variables are defined outside other for loops with variable bounds. I have an intuition that there 
    are some contrived way to write program that can break this, but it should cover almost all the sensible programs.) 
5. At this moment, we should have a program with only simple statements, with all the array references replaced by simple variables. So we can 
    just simply translate this into GraphPPL input line by line.

### Come Later
1. link functions
2. `if` statements

## Examples demonstrate current progress

In [12]:
using BugsModels;
using PrettyPrinting

In [2]:
# Data is given in the form of a dictionary
data = Dict(
    :a => 1,
    :n => 3,
    :g => [1, 2, 3]
)
arrays = Dict()
rules = Dict()
analyze_data!(data, rules, arrays)

In [3]:
@show rules;

rules = Dict{Any, Any}(a => 1, var"g[1]" => 1, n => 3, var"g[2]" => 2, var"g[3]" => 3)


In [4]:
@show arrays;

arrays = Dict{Any, Any}(:g => Symbolics.Num[var"g[1]", var"g[2]", var"g[3]"])


In [5]:
# Partial evaluation is done with `Symbolic.substitute`. 
# `resolve` is a wrapper around `Symbolic.substitute` and handle cases when 
# the thing to resolve is just a number;
# `get_sym_var` take argument that is a Symbol, returns a symbolic variable
BugsModels.resolve(get_sym_var(:n), rules)
# n is provided in the `data` dictionary, here it's 3

3

### Unroll
Just as discussed in Design section. The reason for unrolling is to make the model definition only contain simple expressions, so translation to GraphPPL is automatic.  

An example contain several cases for unrolling.

In [6]:
## tests for unroll
expr = bugsmodel"""      
### Likelihood
    # dummy assignment for easy understanding
    variable.0 <- 1

    # nested loop
    for (i in 1:3) {
        # constant assignment
        array.variable.0[i] <- 1
        # assignment using loop variable
        array.variable.1[i] <- i + 1
        
        # nested loops in another for loop
        for (j in 1:2) {
            # loop bound depend on loop variable
            for (k in 1:j) {
                array.variable.2[i, j, k] = 2
            }
        }
    }

    # variable loop bound that can be resolve from user input
    for (i in 1:n) {
        array.variable.3[i] <- i
    }

    # dummy assignment for easy understanding
    variable.1 <- 1
"""
BugsModels.unroll_for_loops!(expr, rules);

In [7]:
# After unrolling
# Note the loop with variable loop bound `n` is also unrolled
expr

quote
    var"variable.0" = 1
    var"array.variable.0"[1] = 1
    var"array.variable.1"[1] = 1 + 1
    var"array.variable.2"[1, 1, 1] = 2
    var"array.variable.2"[1, 2, 1] = 2
    var"array.variable.2"[1, 2, 2] = 2
    var"array.variable.0"[2] = 1
    var"array.variable.1"[2] = 2 + 1
    var"array.variable.2"[2, 1, 1] = 2
    var"array.variable.2"[2, 2, 1] = 2
    var"array.variable.2"[2, 2, 2] = 2
    var"array.variable.0"[3] = 1
    var"array.variable.1"[3] = 3 + 1
    var"array.variable.2"[3, 1, 1] = 2
    var"array.variable.2"[3, 2, 1] = 2
    var"array.variable.2"[3, 2, 2] = 2
    var"array.variable.3"[1] = 1
    var"array.variable.3"[2] = 2
    var"array.variable.3"[3] = 3
    var"variable.1" = 1
end

### Parse logical assignments and add them to `rules`

In [10]:
expr = bugsmodel"""
    v[2] <- h[3] + (x[2] * d) * c
    w[6] <- f[2] / (y[4] + e)
"""

println("Before parsing:")
print("arrays = \n")
pprint(arrays)
print("\nrules = \n")
pprint(rules)

BugsModels.parse_logical_assignments!(expr, arrays, rules)

Before parsing:
arrays = 
Dict(:g => [var"g[1]", var"g[2]", var"g[3]"])
rules = 
Dict(a => 1, var"g[1]" => 1, n => 3, var"g[2]" => 2, var"g[3]" => 3)

In [11]:
println("After parsing:")
print("arrays = \n")
pprint(arrays)
print("\nrules = \n")
pprint(rules)

After parsing:
arrays = 
Dict(:f => [var"f[1]", var"f[2]"],
     :w => [var"w[1]", var"w[2]", var"w[3]", var"w[4]", var"w[5]", var"w[6]"],
     :y => [var"y[1]", var"y[2]", var"y[3]", var"y[4]"],
     :v => [var"v[1]", var"v[2]"],
     :h => [var"h[1]", var"h[2]", var"h[3]"],
     :g => [var"g[1]", var"g[2]", var"g[3]"],
     :x => [var"x[1]", var"x[2]"])
rules = 
Dict(a => 1,
     var"g[1]" => 1,
     n => 3,
     var"g[2]" => 2,
     var"w[6]" => var"f[2]" / (e + var"y[4]"),
     var"g[3]" => 3,
     var"v[2]" => var"h[3]" + c*d*var"x[2]")

Most notably, the array is automatically handled: all the arrays in the expression will be sized either by the input, or the largest index provided in the model definition.  
And logical assignment is added to the `rules` dictionary.