# Juice Examples
This notebook includes:
- Logic Circuits
    - Construct a circuit
    - Structure Properties
    - Transformations
    - Queries
- Probabilistic Circuits
## Logic Circuits

In [57]:
using Revise # TODO remove 

In [58]:
using LogicCircuits

### Construct/ Load/ Compile a logic circuit
1. Handcradft a logic circuit

In [59]:
lit1, litn1 = compile(PlainLogicCircuit, Lit(1)), compile(PlainLogicCircuit, Lit(- 1))
lit2, litn2 = compile(PlainLogicCircuit, Lit(2)), compile(PlainLogicCircuit, Lit(- 2))
and1 = lit1 & lit2
true_node = compile(PlainLogicCircuit, true)
and2 = conjoin([litn1, litn2, true_node])
c1 = and1 | and2
println(tree_formula_string(c1))

((1 ⋀ 2) ⋁ (-1 ⋀ -2 ⋀ true))


In [60]:
# structured-decomposble circuit and its vtree
v1, v2 = Vtree(Var(1)), Vtree(Var(2))
v_and = Vtree(v1, v2)
sl1, sln1 = compile(StructLogicCircuit, v1, Lit(1)), compile(StructLogicCircuit, v1, Lit(- 1))
sl2, sln2 = compile(StructLogicCircuit, v2, Lit(2)), compile(StructLogicCircuit, v2, Lit(- 2))
c2 = v_and(v_and(sl1 & sl2) | v_and(sln1 & sln2))
println(tree_formula_string(c2))

((1 ⋀ 2) ⋁ (-1 ⋀ -2))


2. Load from file/module zoo

In [61]:
lc = load_logic_circuit(zoo_sdd_file("random.sdd"))
lc2 = load_smooth_logic_circuit(zoo_psdd_file("nltcs.psdd"))
lc, v = load_struct_smooth_logic_circuit(zoo_psdd_file("nltcs.psdd"), zoo_vtree_file("nltcs.vtree"))

(PlainStruct⋁Node(997515069187283418), PlainVtreeInnerNode(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16))

3. fully factorized circuit

In [62]:
c3 = fully_factorized_circuit(PlainLogicCircuit, 3)
c4 = fully_factorized_circuit(PlainStructLogicCircuit, v_and)
println(tree_formula_string(c3))
println(tree_formula_string(c4))

(((1 ⋁ -1) ⋀ (2 ⋁ -2) ⋀ (3 ⋁ -3)))
((((1 ⋁ -1) ⋀ (2 ⋁ -2))))


4. SDD compilation 

In [63]:
mgr1 = SddMgr(3, :balanced)

cnf1 = zoo_cnf("easy/C17_mince.cnf")
vtree1 = zoo_vtree("easy/C17_mince.min.vtree");

mgr2 = SddMgr(vtree1)
cnfΔ = compile(mgr2, cnf1)
println(tree_formula_string(cnfΔ))

((-3 ⋀ ((((2 ⋀ true) ⋁ (-2 ⋀ -4)) ⋀ false) ⋁ (((2 ⋀ false) ⋁ (-2 ⋀ 4)) ⋀ ((((7 ⋀ 8) ⋁ (-7 ⋀ -8)) ⋀ false) ⋁ (((-7 ⋀ 8) ⋁ (7 ⋀ false)) ⋀ ((-6 ⋀ ((-9 ⋀ ((10 ⋀ ((11 ⋀ ((12 ⋀ ((13 ⋀ ((14 ⋀ ((-15 ⋀ ((-16 ⋀ 17) ⋁ (16 ⋀ false))) ⋁ (15 ⋀ false))) ⋁ (-14 ⋀ false))) ⋁ (-13 ⋀ ((-14 ⋀ ((15 ⋀ ((-16 ⋀ 17) ⋁ (16 ⋀ false))) ⋁ (-15 ⋀ false))) ⋁ (14 ⋀ false))))) ⋁ (-12 ⋀ false))) ⋁ (-11 ⋀ false))) ⋁ (-10 ⋀ false))) ⋁ (9 ⋀ false))) ⋁ (6 ⋀ false))) ⋁ (((7 ⋀ -8) ⋁ (-7 ⋀ false)) ⋀ ((-6 ⋀ ((9 ⋀ ((-10 ⋀ ((11 ⋀ ((-12 ⋀ ((13 ⋀ ((14 ⋀ ((-15 ⋀ ((-16 ⋀ 17) ⋁ (16 ⋀ false))) ⋁ (15 ⋀ false))) ⋁ (-14 ⋀ false))) ⋁ (-13 ⋀ ((-14 ⋀ ((15 ⋀ ((16 ⋀ -17) ⋁ (-16 ⋀ false))) ⋁ (-15 ⋀ false))) ⋁ (14 ⋀ false))))) ⋁ (12 ⋀ false))) ⋁ (-11 ⋀ false))) ⋁ (10 ⋀ false))) ⋁ (-9 ⋀ false))) ⋁ (6 ⋀ false))))))) ⋁ (3 ⋀ ((1 ⋀ ((((2 ⋀ 4) ⋁ (-2 ⋀ true)) ⋀ false) ⋁ (((2 ⋀ -4) ⋁ (-2 ⋀ false)) ⋀ ((((-7 ⋀ -8) ⋁ (7 ⋀ true)) ⋀ false) ⋁ (((-7 ⋀ 8) ⋁ (7 ⋀ false)) ⋀ ((-5 ⋀ ((-6 ⋀ ((9 ⋀ ((-10 ⋀ ((11 ⋀ ((-12 ⋀ ((13 ⋀ ((14 ⋀ ((-15 ⋀ ((-16 ⋀ 17) ⋁ (16 ⋀ fals

2. Circuit Statistics

In [64]:
println(tree_formula_string(c1))
println("Number of variables : ", num_variables(c1))
println("Number of nodes : ", num_nodes(c1))
println("Number of edges : ", num_edges(c1))

((1 ⋀ 2) ⋁ (-1 ⋀ -2 ⋀ true))
Number of variables : 2
Number of nodes : 8
Number of edges : 7


In [65]:
println("Number of children of root node : ", num_children(c1))
println("Inner nodes : ", innernodes(c1), "\n")
println("Leaf nodes : ", leafnodes(c1), "\n")
println("Conjunction/And nodes : ", and_nodes(c1), "\n")
println("Disjunction/Or nodes : ", or_nodes(c1), "\n")
println("Literal nodes : ", literal_nodes(c1), "\n")
println("Constant nodes : ", canonical_constants(c1), "\n")

Number of children of root node : 2
Inner nodes : PlainLogicInnerNode[Plain⋀Node(5808418406115376116), Plain⋀Node(10715694377331970021), Plain⋁Node(8474986747586460844)]

Leaf nodes : PlainLogicCircuit[PlainLiteralNode(3268897131303528460), PlainLiteralNode(3491761497713639586), PlainLiteralNode(14744975093845599863), PlainLiteralNode(17360079655964980085), PlainTrueNode(16845230597971374003)]

Conjunction/And nodes : Plain⋀Node[Plain⋀Node(5808418406115376116), Plain⋀Node(10715694377331970021)]

Disjunction/Or nodes : Plain⋁Node[Plain⋁Node(8474986747586460844)]

Literal nodes : PlainLiteralNode[PlainLiteralNode(3268897131303528460), PlainLiteralNode(3491761497713639586), PlainLiteralNode(14744975093845599863), PlainLiteralNode(17360079655964980085)]

Constant nodes : (nothing, PlainTrueNode(16845230597971374003))



3. Structure properties

In [66]:
println(isfalse(compile(PlainLogicCircuit, false)))
println(istrue(true_node))
println(isliteralgate(lit1))

true
true
true


In [67]:
println("Is ", tree_formula_string(c1), " decomposable : ", isdecomposable(c1))
t = lit1 & litn1
println("Is ", tree_formula_string(t), " decomposable : ", isdecomposable(t))

Is ((1 ⋀ 2) ⋁ (-1 ⋀ -2 ⋀ true)) decomposable : true
Is (1 ⋀ -1) decomposable : false


In [68]:
println("Is ", tree_formula_string(c1), " smooth : ", issmooth(c1))
t = lit1 | lit2
println("Is ", tree_formula_string(t), " smooth : ", issmooth(t))

Is ((1 ⋀ 2) ⋁ (-1 ⋀ -2 ⋀ true)) smooth : true
Is (1 ⋁ 2) smooth : false


In [69]:
println("Is ", tree_formula_string(c1), " deterministic : ", isdeterministic(c1))
t = (lit1 & lit2) | (lit1 & lit2)
println("Is ", tree_formula_string(t), " deterministic : ", isdeterministic(t))

Is ((1 ⋀ 2) ⋁ (-1 ⋀ -2 ⋀ true)) deterministic : true
Is ((1 ⋀ 2) ⋁ (1 ⋀ 2)) deterministic : false


In [70]:
println("Is ", tree_formula_string(c1), " structured decomposable : ", isstruct_decomposable(c1))

Is ((1 ⋀ 2) ⋁ (-1 ⋀ -2 ⋀ true)) structured decomposable : false


In [71]:
iscanonical(c1, 10)

true

4. Transformations

In [72]:
c2 = lit1 | lit2
issmooth(c2)
println(tree_formula_string(c2))
println(tree_formula_string(smooth(c2)))

(1 ⋁ 2)
((1 ⋀ (2 ⋁ -2)) ⋁ (2 ⋀ (1 ⋁ -1)))


In [73]:
c3 = lit1 | lit2
c4 = forget(c3, x -> x == 1)
println(tree_formula_string(c3))
println(tree_formula_string(c4))

(1 ⋁ 2)
(true ⋁ 2)


In [74]:
c5 = conjoin([litn1, litn2, true_node]) | conjoin([true_node, lit2, compile(PlainLogicCircuit, false)])
println(tree_formula_string(c5))
println(tree_formula_string(propagate_constants(c5)))

((-1 ⋀ -2 ⋀ true) ⋁ (true ⋀ 2 ⋀ false))
((-1 ⋀ -2))


In [75]:
c6 = (lit1 | litn1) & (lit2 | litn2)
println(tree_formula_string(c6))
c7 = condition(c6, Lit(1))
println(tree_formula_string(c7))

((1 ⋁ -1) ⋀ (2 ⋁ -2))
(1 ⋀ (2 ⋁ -2))


In [76]:
c8 = disjoin([c6])
c9, _ = split(c8, (c8, c8.children[1]), Var(2))
println(tree_formula_string(c8))
println(tree_formula_string(c9))

(((1 ⋁ -1) ⋀ (2 ⋁ -2)))
(((1 ⋁ -1) ⋀ 2) ⋁ ((1 ⋁ -1) ⋀ -2))


In [77]:
or = lit1 | litn1
and1, and2 = conjoin([or]), conjoin([or])
c10 = and1 | and2
c11 = clone(c10, and1, and2, or)
println(tree_formula_string(c10))
println(tree_formula_string(c11))
println(num_nodes(c10))
println(num_nodes(c11))

(((1 ⋁ -1)) ⋁ ((1 ⋁ -1)))
(((1 ⋁ -1)) ⋁ ((1 ⋁ -1)))
6
7


In [78]:
or1 = lit1 | (litn1 & true_node)
or2 = lit1 | litn1
c12 = disjoin([or1 & or2])
c13 = merge(c12, or1, or2)
println(tree_formula_string(c12))
println(tree_formula_string(c13))

(((1 ⋁ (-1 ⋀ true)) ⋀ (1 ⋁ -1)))
(((1 ⋁ -1) ⋀ (1 ⋁ -1)))


In [79]:
c12 = deepcopy(c11, typemax(Int))
all(isliteralgate, intersect(linearize(c12), linearize(c11)))

true

In [80]:
# struct learn 
c0 = fully_factorized_circuit(PlainLogicCircuit, 5)
println(tree_formula_string(c0))
c1 = struct_learn(c0, maxiter=10)
println(tree_formula_string(c1))

(((1 ⋁ -1) ⋀ (2 ⋁ -2) ⋀ (3 ⋁ -3) ⋀ (4 ⋁ -4) ⋀ (5 ⋁ -5)))
((1 ⋀ -2 ⋀ (3 ⋁ -3) ⋀ 4 ⋀ 5) ⋁ (1 ⋀ -2 ⋀ (3 ⋁ -3) ⋀ -4 ⋀ 5) ⋁ (-1 ⋀ -2 ⋀ 3 ⋀ -4 ⋀ -5) ⋁ (-1 ⋀ -2 ⋀ -3 ⋀ -4 ⋀ -5) ⋁ ((1 ⋁ -1) ⋀ 2 ⋀ -3 ⋀ 4 ⋀ 5) ⋁ ((1 ⋁ -1) ⋀ 2 ⋀ -3 ⋀ -4 ⋀ 5) ⋁ (-1 ⋀ -2 ⋀ (3 ⋁ -3) ⋀ 4 ⋀ -5) ⋁ (-1 ⋀ 2 ⋀ (3 ⋁ -3) ⋀ (4 ⋁ -4) ⋀ -5) ⋁ (1 ⋀ (2 ⋁ -2) ⋀ (3 ⋁ -3) ⋀ (4 ⋁ -4) ⋀ -5) ⋁ ((1 ⋁ -1) ⋀ 2 ⋀ 3 ⋀ (4 ⋁ -4) ⋀ 5) ⋁ (-1 ⋀ -2 ⋀ (3 ⋁ -3) ⋀ (4 ⋁ -4) ⋀ 5))


5. Queries

In [81]:
using DataFrames
r = fully_factorized_circuit(PlainLogicCircuit, 3)
input = DataFrame(BitArray([1 0 1]))
println(tree_formula_string(r))
r(input)

(((1 ⋁ -1) ⋀ (2 ⋁ -2) ⋀ (3 ⋁ -3)))


1-element BitArray{1}:
 1

In [82]:
model_count(r)

8

In [83]:
sat_prob(r)

1//1

In [84]:
prob_equiv_signature(r, 1)

Dict{Union{UInt32, Node},Array{Rational{BigInt},1}} with 14 entries:
  PlainLiteralNode(14596195639733900933) => Rational{BigInt}[1//2041]
  0x00000002                             => Rational{BigInt}[1//3870]
  Plain⋁Node(8825307834801993994)        => Rational{BigInt}[1//1]
  PlainLiteralNode(5110327579983448590)  => Rational{BigInt}[2040//2041]
  Plain⋀Node(8120995872607302037)        => Rational{BigInt}[1//1]
  PlainLiteralNode(8432648329670027548)  => Rational{BigInt}[1//7228]
  PlainLiteralNode(5468631190534968285)  => Rational{BigInt}[3869//3870]
  0x00000003                             => Rational{BigInt}[1//7228]
  PlainLiteralNode(1837171924962064366)  => Rational{BigInt}[7227//7228]
  Plain⋁Node(1519525625220096303)        => Rational{BigInt}[1//1]
  Plain⋁Node(4856940599559684659)        => Rational{BigInt}[1//1]
  Plain⋁Node(5726244210746914356)        => Rational{BigInt}[1//1]
  PlainLiteralNode(14625622398862940725) => Rational{BigInt}[1//3870]
  0x00000001               

6. Visulizations

In [89]:
using TikzGraphs
v = PlainVtree(10, :balanced)
plot(v)

TikzPictures.TikzPicture("\\graph [layered layout, ] {\n1/\"1\" [],\n2/\"2\" [],\n3/\"3\" [],\n4/\"4\" [],\n5/\"5\" [],\n6/\"6\" [],\n7/\"7\" [],\n8/\"8\" [],\n9/\"9\" [],\n10/\"10\" [],\n11/\".\" [],\n12/\".\" [],\n13/\".\" [],\n14/\".\" [],\n15/\".\" [],\n16/\".\" [],\n17/\".\" [],\n18/\".\" [],\n19/\".\" [],\n;\n11 -> [,] 12;\n11 -> [,] 16;\n12 -> [,] 13;\n12 -> [,] 14;\n13 -> [,] 1;\n13 -> [,] 2;\n14 -> [,] 3;\n14 -> [,] 15;\n15 -> [,] 4;\n15 -> [,] 5;\n16 -> [,] 17;\n16 -> [,] 18;\n17 -> [,] 6;\n17 -> [,] 7;\n18 -> [,] 8;\n18 -> [,] 19;\n19 -> [,] 9;\n19 -> [,] 10;\n};\n", "", "\\usepackage{fontspec}\n\\setmainfont{Latin Modern Math}\n\\usetikzlibrary{arrows}\n\\usetikzlibrary{graphs}\n\\usetikzlibrary{graphdrawing}\n\n% from: https://tex.stackexchange.com/questions/453132/fresh-install-of-tl2018-no-tikz-graph-drawing-libraries-found\n\\usepackage{luacode}\n\\begin{luacode*}\n\tfunction pgf_lookup_and_require(name)\n\tlocal sep = package.config:sub(1,1)\n\tlocal function lookup(na

In [90]:
lc = fully_factorized_circuit(PlainLogicCircuit, 5)
plot(lc)

TikzPictures.TikzPicture("\\graph [layered layout, ] {\n1/\"⋁\" [],\n2/\"⋀\" [],\n3/\"⋁\" [],\n4/\"⋁\" [],\n5/\"⋁\" [],\n6/\"⋁\" [],\n7/\"⋁\" [],\n8/\"1\" [],\n9/\"-1\" [],\n10/\"2\" [],\n11/\"-2\" [],\n12/\"3\" [],\n13/\"-3\" [],\n14/\"4\" [],\n15/\"-4\" [],\n16/\"5\" [],\n17/\"-5\" [],\n;\n1 -> [,] 2;\n2 -> [,] 3;\n2 -> [,] 4;\n2 -> [,] 5;\n2 -> [,] 6;\n2 -> [,] 7;\n3 -> [,] 8;\n3 -> [,] 9;\n4 -> [,] 10;\n4 -> [,] 11;\n5 -> [,] 12;\n5 -> [,] 13;\n6 -> [,] 14;\n6 -> [,] 15;\n7 -> [,] 16;\n7 -> [,] 17;\n};\n", "", "\\usepackage{fontspec}\n\\setmainfont{Latin Modern Math}\n\\usetikzlibrary{arrows}\n\\usetikzlibrary{graphs}\n\\usetikzlibrary{graphdrawing}\n\n% from: https://tex.stackexchange.com/questions/453132/fresh-install-of-tl2018-no-tikz-graph-drawing-libraries-found\n\\usepackage{luacode}\n\\begin{luacode*}\n\tfunction pgf_lookup_and_require(name)\n\tlocal sep = package.config:sub(1,1)\n\tlocal function lookup(name)\n\tlocal sub = name:gsub('%.',sep)  \n\tif kpse.find_file(sub, 'l

## Probabilistic Circuits

In [91]:
using ProbabilisticCircuits

In [92]:
pc = fully_factorized_circuit(ProbCircuit, 5)
uniform_parameters(pc)
plot(pc)

TikzPictures.TikzPicture("\\graph [layered layout, ] {\n1/\"+\" [],\n2/\"*\" [],\n3/\"+\" [],\n4/\"+\" [],\n5/\"+\" [],\n6/\"+\" [],\n7/\"+\" [],\n8/\"1\" [],\n9/\"-1\" [],\n10/\"2\" [],\n11/\"-2\" [],\n12/\"3\" [],\n13/\"-3\" [],\n14/\"4\" [],\n15/\"-4\" [],\n16/\"5\" [],\n17/\"-5\" [],\n;\n1 -> [,edge label={1.0},] 2;\n2 -> [,] 3;\n2 -> [,] 4;\n2 -> [,] 5;\n2 -> [,] 6;\n2 -> [,] 7;\n3 -> [,edge label={0.5},] 8;\n3 -> [,edge label={0.5},] 9;\n4 -> [,edge label={0.5},] 10;\n4 -> [,edge label={0.5},] 11;\n5 -> [,edge label={0.5},] 12;\n5 -> [,edge label={0.5},] 13;\n6 -> [,edge label={0.5},] 14;\n6 -> [,edge label={0.5},] 15;\n7 -> [,edge label={0.5},] 16;\n7 -> [,edge label={0.5},] 17;\n};\n", "", "\\usepackage{fontspec}\n\\setmainfont{Latin Modern Math}\n\\usetikzlibrary{arrows}\n\\usetikzlibrary{graphs}\n\\usetikzlibrary{graphdrawing}\n\n% from: https://tex.stackexchange.com/questions/453132/fresh-install-of-tl2018-no-tikz-graph-drawing-libraries-found\n\\usepackage{luacode}\n\\begin

helper function

In [93]:
# You can skip this part. Includes helper functions to make partial observations from arrays of strings
# so its easier to present.

# Make one observation from list of string describing the observation
#
# For example, ["smoker", "male"] sets 
#   1) The mentioned features to the correct values.
#   2) Every feature not mentioned to missing values. 
FEATURES = 36;
function make_one_observation(obs)
    result = missings(Bool, FEATURES)
    for k in obs
        # Smoking
        if lowercase(k) == "smoker"
            result[7:8] .= [0, 1]
        elseif lowercase(k) == "!smoker"
            result[7:8] .= [1, 0]
        # Gender
        elseif lowercase(k) == "male"
            result[13:14] .= [1, 0]
        elseif lowercase(k) == "female"
            result[13:14] .= [0, 1]
        # Region
        elseif lowercase(k) == "southeast"
            result[9:12] .= [0, 0, 1, 0]
        elseif lowercase(k) == "southwest"
            result[9:12] .= [0, 1, 0, 0]
        # Child
        elseif lowercase(k) == "1-child"
            result[1:6] .= [0,1,0,0,0,0]
        end
    end
    result
end;

function make_observations(obs)
    count = size(obs)[1]
    result = missings(Bool, count, FEATURES)
    for i=1:count
        result[i, :] .= make_one_observation(obs[i])
    end
    DataFrame(result)
end;

function flip_coin(d::DataFrame)
    m = missings(Bool, num_examples(data), num_features(data))
    flag = rand(num_examples(data), num_features(data)) .<= keep_prob
    m[flag] .= missing
    DataFrame(m)
end

load data

In [94]:
using CSV
train_x = DataFrame(BitArray(Matrix(CSV.read("insurance/insurance_train_x.csv"))))
println("\"Insurance\" training set has $(num_features(train_x)) variables and $(num_examples(train_x)) samples.")
train_y = CSV.read("insurance/insurance_train_y.csv");

"Insurance" training set has 36 variables and 935 samples.


Here for the purpose of this demo, we load a pretrained probabilistic circuit:

In [95]:
pc = load_prob_circuit(zoo_psdd_file("insurance.psdd"))
println("Probablistic Circuit with $(num_nodes(pc)) nodes.")

Probablistic Circuit with 27493 nodes.


### Queries
#### EVI: Complete Evidence Query

All features are observed, we want to compute the probability: $$ P(x) $$

In [96]:
log_likelihood_avg(pc, train_x)

-9.71171374929969

#### MAR: Marginal Query (partial evidence)

Now, what happens if we only observe a subset of the features $X^o$? We want to compute:

$$ P(X^o) = \sum_{x^m} P(X^o X^m) $$

**Problem:** Computing above query is usually not tractable as it involves summing over exponential (infinite) possible worlds.

**Good News:** In probabilistic circuits, if the circuit is **smooth** and **decomposable**, we can do this tractably. No need to enumerate all possible worlds.

In [97]:
marg_data = make_observations([["smoker"], 
                       ["female"], 
                       ["female", "smoker"], 
                       ["southeast", "male", "1-child", "smoker"]],
                    )
prob = exp.(marginal(pc, marg_data))
println("Probability of being smoker? ", prob[1])
println("Probability of being female smoker? ", prob[3])
println("Probability of being male smoker with one child living in the southeast? ", prob[4])

Probability of being smoker? 0.18403563
Probability of being female smoker? 0.096237846
Probability of being male smoker with one child living in the southeast? 0.0009639263


#### CON: Conditional Queries

Given some observations $X^o$, we want to compute probabilities conditioned on the observations:

$$ P(Q \mid X^o) $$

if we can do marginals tractably, we can also do conditionals tractably:

$$ P(Q \mid X^o) = \cfrac{P(Q, X^o)}{P(X^o)} $$

In [98]:
println(" P('smoker' | 'female') = $(prob[3]/prob[2])")

 P('smoker' | 'female') = 0.20398754


#### MPE: Most Probable Explanation
aka Maximum A Posteriori (MAP)

Given some observations $X^o$, we want to compute the event which is most likely to heppen

$$ argmax_{q}P(q \mid X^o) $$

In probabilistic circuits, if the circuit is **deterministic** and **decomposable**, we can do this tractably.

In [99]:
# TODO

#### Advanced queries: expected predictions
What about reasoning about predictive models such as regression models:


We are interested in computing **expected predictions**

- Appears all the time in machine learning, such as handling missing data
- We can do this tractably!


$$ \Large \mathbb{E}_{\mathbf{x}^m\ \sim\ p(\mathbf{x}^m\ \mid\ \mathbf{x}^o )}\left[\ f( \mathbf{x}^o \mathbf{x}^m) \ \right] $$

- In above equation $ \mathbf{x}^m $ = missing features, and $ \mathbf{x}^o $ = observed features.

- We have two separate models $p$ and $f$.

- Expected Prediction useful for:
  - Handling missing values at test time
  - Reasoning about behaviour of predictive models

In [100]:
rc = load_logistic_circuit(zoo_lc_file("insurance.circuit"), 1)
println("Regression Circuit with $(num_nodes(rc)) nodes.")

Regression Circuit with 1076 nodes.


##### Sample Queries
1. How different are the insurance costs between smokers and non smokers?

In [101]:
data = make_observations([["!smoker"], 
                 ["smoker"]])
exps, exp_cache = Expectation(pc, rc, data)
println("Smoker    : \$ $(exps[2])");
println("Non-Smoker: \$ $(exps[1])");
println("Difference: \$ $(exps[2] - exps[1])");

Smoker    : $ 31355.332794478192
Non-Smoker: $ 8741.747204995649
Difference: $ 22613.585589482544


2. Is the predictive model biased by gender?

In [102]:
data = make_observations([["male"],
                 ["female"]])
exps, exp_cache = Expectation(pc, rc, data);
println("Female  : \$ $(exps[2])");
println("Male    : \$ $(exps[1])");
println("Diff    : \$ $(exps[2] - exps[1])");

Female  : $ 14170.128975800484
Male    : $ 13196.549488456776
Diff    : $ 973.579487343708


3. Expecation and standard devation of few subpopulations

In [103]:
data = make_observations( [["southeast", "male", "1-child", "smoker"], 
                 ["southwest", "male", "1-child", "smoker"]])
exps, exp_cache = Expectation(pc, rc, data);
# Computes the second moment
mom2, mom_cache = Moment(pc, rc, data, 2);
# Computing Standard Deviation
stds = sqrt.( mom2 - exps.^2 );
# Living in South East, Smoker, Male, One child
println("mu: $(round(exps[1])), std = $(round(stds[1]))")

mu: 30975.0, std = 11229.0


## Benchmarks
#### Load circuit and data

In [113]:
using BenchmarkTools
pc = zoo_psdd("plants.psdd")
println("Load a circuit with $(num_nodes(pc)) nodes and $(num_parameters(pc)) parameters.")
data, _, _ = twenty_datasets("plants")
println("Load a data with $(num_features(data)) features and $(num_examples(data)) examples.")

Load a circuit with 153021 nodes and 91380 parameters.
Load a data with 69 features and 17412 examples.


#### If the data satisty logical constrains ?

In [109]:
@btime satisfies(pc, data);

  242.001 ms (926339 allocations: 189.61 MiB)


##### Gpu computation

In [115]:
gpu_data = to_gpu(data)
@btime satisfies(pc, gpu_data);

  102.614 ms (930755 allocations: 57.47 MiB)


#### EVI queries

In [116]:
@btime EVI(pc, data);

  529.217 ms (926493 allocations: 328.13 MiB)


##### Gpu computation

In [117]:
@btime EVI(pc, gpu_data);

  95.877 ms (933369 allocations: 63.70 MiB)


#### MAR queries

In [None]:
@btime MAR(pc, data)