# Learning as Conditional Inference

March 2015  
[Wannes Meert](mailto:wannes.meert@cs.kuleuven.be)  
[Anton Dries](mailto:anton.dries@cs.kuleuven.be)

Based on [Church](http://projects.csail.mit.edu/church/wiki/Church)'s [ProbMods book](https://probmods.org/learning-as-conditional-inference.html).

In [2]:
import sys, os

sys.path.append(os.path.abspath('../..'))

## Example: Learning About Coins

This example computes the probability that it is a fair coin give a sequence of observations. What it computes is Bayes' rule:
\begin{equation*}
    Pr(fair \mid data) = \frac{Pr(data \mid fair) \cdot Pr(fair)}{Pr(data)}
\end{equation*}

with $Pr(data \mid fair)$ the expression that is modelled by the ProbLog program. And $Pr(fair \mid data)$ is asked for using the `query/1` and `evidence/2` predicates.

In [23]:
%%script problog-cli.py

%% Prior
0.999::fair_coin.
%% Normal and biased coin
0.50::coin(h,T) ; 0.50::coin(t,T) :-   fair_coin.
0.95::coin(h,T) ; 0.05::coin(t,T) :- \+fair_coin.

%% Probability of a sequence of coin tosses
tosses(Cs) :- tosses(Cs,0).
tosses([],T).
tosses([C|R],T) :-
    coin(C,T),
    Tn is T + 1,
    tosses(R,Tn).

%% What is the observed sequence of coin tosses?
evidence(tosses([h,h,h,h,h]), true).
%evidence(tosses([h,h,h,h,h,h,h,h,h,h]), true).
%evidence(tosses([h,h,h,h,h,h,h,h,h,h,h,h,h,h,h]), true).

query(fair_coin).

	fair_coin : 0.9758137


## Learning a Continuous Parameter

Suppose that the probability of how much the coin is biased is a continuous number. What is the posterior distribution over this probability given a sequence of observations. This result can be used to update our prior belief over the bias of the coin.

In [32]:
%%script problog-cli.py

%% Uniform prior on coin weights (cw)
%% Discretised for ProbLog (TODO: can we do better?)
P::cw(0) ; P::cw(0.25) ; P::cw(0.5) ; P::cw(0.75) ; P::cw(1.0) :- P is 1.0/5.
%% Normal and biased coin
Ph::coin(h,T) ; Pt::coin(t,T) :- cw(Ph), Pt is 1.0-Ph.

%% Probability of a sequence of coin tosses
tosses(Cs) :- tosses(Cs,0).
tosses([],T).
tosses([C|R],T) :-
    coin(C,T),
    Tn is T + 1,
    tosses(R,Tn).

%% What is the observed sequence of coin tosses?
evidence(tosses([h,h,h,h,h]), true).

query(cw(V)).

	   cw(0) : 0
	cw(0.25) : 0.00076923077
	 cw(0.5) : 0.024615385
	cw(0.75) : 0.18692308
	 cw(1.0) : 0.78769231


### Bayesian Learning

We can integrate Bayesian learning into the ProbLog system by combining ProbLog and Python.

First, we can sample from a simple program to generate data:

In [1]:
!cat biased_coin_sample.pl


0.9::coin(h) ; 0.1::coin(t).

query(coin(S)).



In [30]:
!../../examples/example_sampling_alt.py -N 4 biased_coin_sample.pl

coin(h).
% Probability: 0.9
coin(h).
% Probability: 0.9
coin(h).
% Probability: 0.9
coin(h).
% Probability: 0.9


Or we can sample a list of random length:

In [12]:
!cat biased_coin_sample2.pl


0.9::coin(h,T) ; 0.1::coin(t,T).

0.1::stop(T).

tosses(C) :- tosses(C,0).
tosses([],T) :- stop(T).
tosses([H|R],T) :- \+stop(T), coin(H,T), Tn is T+1, tosses(R,Tn).

query(tosses(C)).



In [15]:
!../../examples/example_sampling_alt.py -N 1 biased_coin_sample2.pl

tosses([h, h, h, h, h, h, h, h, t, h, h, h, h, h]).
% Probability: 0.00022358488


Then we can sample an increasing number of samples to apply the same Bayesian rule approach as above:

**TODO**: Use Python output of ProbLog

In [None]:
from problog.program import PrologString

# TODO: integrate ProbLog better into Python
#model = PrologString("""
model_str = """
P::cw(0); P::cw(0.125); P::cw(0.25); P::cw(0.375); P::cw(0.5); P::cw(0.625); P::cw(0.75);P::cw(0.875); P::cw(1.0) :- P is 1.0/9.
Ph::coin(h,T) ; Pt::coin(t,T) :- cw(Ph), Pt is 1.0-Ph.

tosses(Cs) :- tosses(Cs,0).
tosses([],T).
tosses([C|R],T) :-
    coin(C,T),
    Tn is T + 1,
    tosses(R,Tn).

query(cw(V)).

"""

sample_str = """
0.9::coin(h,T) ; 0.1::coin(t,T).
"""

results = []

#for observed_data_size in [1, 3, 6, 10, 20, 30, 50, 70, 100]:
for observed_data_size in [1]:
    # TODO: grab sampled data from sample_str
    observed_data = "[h,t,t,h,h,h,t,h,h,t,h,h,h]"
    observation_str = "evidence(tosses({})).\n".format(observed_data)
    model = PrologString(model_str+observation_str)
    #TODO: use 'model.evaluate()', why doesn't it work?
    result = model.evaluate()
    results.append(result)
    print(result)

#TODO: plot results

### Beta prior



The prior (discretized):

In [14]:
%%script problog-cli.py

:- load_external('beta.py').

%% Prior on coin weights (cw)
%% TODO: make better/easier
P0::cw(0) ; P25::cw(0.25) ; P50::cw(0.5) ; P75::cw(0.75) ; P100::cw(1.0) :-
    call_external(beta_pdf(10,10,[0.0,0.25,0.50,0.75,1.00]), R),
    [P0,P25,P50,P75,P100] = R.

query(cw(V)).

	   cw(0) : 0
	cw(0.25) : 0.26459401
	 cw(0.5) : 3.523941
	cw(0.75) : 0.26459401
	 cw(1.0) : 0


The beta distribution is load from Python with:

In [1]:
!cat beta.py

from scipy.stats import beta

def beta_pdf(a, b, values):
    pdf = beta.pdf(values, a, b)
    return pdf.tolist()


*The discretisation is ugly now.*

**TODO**: figure out easier syntax/method

<div style="background-color:#FB9496;margin:5px 0;padding:3px;">Proposal:</div>

    _::cw(_) :- call_external(beta_pdf(10,10,0.25), R), expandhead(R).

- beta_pdf returns binary tuples [(prob,real)]
- `expandhead/1` is a predicate that maps the head atom to the given list where the number of elements in a tuple match the underscores in the head atom.
- How similar is this internally to `P::a :- P is 1/2.` ?


And after observing evidence:

In [11]:
%%script problog-cli.py

:- load_external('beta.py').

%% Prior on coin weights (cw)
P0::cw(0) ; P25::cw(0.25) ; P50::cw(0.5) ; P75::cw(0.75) ; P100::cw(1.0) :-
    call_external(beta_pdf(10,10,[0.0,0.25,0.50,0.75,1.00]), R),
    [P0,P25,P50,P75,P100] = R.

%% Normal and biased coin
Ph::coin(h,T) ; Pt::coin(t,T) :- cw(Ph), Pt is 1.0-Ph.

%% Probability of a sequence of coin tosses
tosses(Cs) :- tosses(Cs,0).
tosses([],T).
tosses([C|R],T) :-
    coin(C,T),
    Tn is T + 1,
    tosses(R,Tn).

%% What is the observed sequence of coin tosses?
evidence(tosses([h,h,h,h,h]), true).

query(cw(V)).

	   cw(0) : 0
	cw(0.25) : 0.0014921243
	 cw(0.5) : 0.63592166
	cw(0.75) : 0.36258621
	 cw(1.0) : 0


### Example: Estimating Causal Power

...

In [19]:
%%script problog-cli.py

%% causal power of C to cause E (prior)
P::cpw(0) ; P::cpw(0.25) ; P::cpw(0.5) ; P::cpw(0.75) ; P::cpw(1.0) :- P is 1.0/5.
%% background probability of E (prior)
P::bw(0) ; P::bw(0.25) ; P::bw(0.5) ; P::bw(0.75) ; P::bw(1.0) :- P is 1.0/5.

%P::cp(T) :- cpw(P).
%P::b(T) :- bw(P).
%e_if_c(C,T) :- cp(T), C=true.
%e_if_c(C,T) :- b(T).

P::e_if_c(C,T) :- cpw(P), C=true.
P::e_if_c(C,T) :- bw(P).

% TODO: represent as list?
evidence(e_if_c(true,0),  true).
evidence(e_if_c(true,1),  true).
evidence(e_if_c(false,2), false).
evidence(e_if_c(true,3),  true).


query(cpw(V)).

	   cpw(0) : 0.03546988
	cpw(0.25) : 0.066048193
	 cpw(0.5) : 0.13551807
	cpw(0.75) : 0.26946988
	 cpw(1.0) : 0.49349398


We can express this more ProbLog like. This results in a much easier model to comprehend compared to Church:

(Also slower, too many variables?)

In [21]:
%%script problog-cli.py

%% causal power of C to cause E (prior)
P::cpw(0) ; P::cpw(0.25) ; P::cpw(0.5) ; P::cpw(0.75) ; P::cpw(1.0) :- P is 1.0/5.
%% background probability of E (prior)
P::bw(0) ; P::bw(0.25) ; P::bw(0.5) ; P::bw(0.75) ; P::bw(1.0) :- P is 1.0/5.

0.5::c(T). % Prior on c. Will not be important because fully observed.

P::e(T) :- cpw(P), c(T).
P::e(T) :- bw(P).

% TODO: represent as list?
evidence(c(0), true).
evidence(e(0), true).
evidence(e(1), true).
evidence(c(1), true).
evidence(e(2), false).
evidence(c(2), false).
evidence(e(3), true).
evidence(c(3), true).

query(cpw(V)).

	   cpw(0) : 0.03546988
	cpw(0.25) : 0.066048193
	 cpw(0.5) : 0.13551807
	cpw(0.75) : 0.26946988
	 cpw(1.0) : 0.49349398


## Grammar-based Concept Induction


### Example: Inferring an Arithmetic Expression

For this we would also integrate ProbLog and Python. ProbLog for the probabilistic reasoning, Python for the arithmetic expressions (note that in principle everything available in Python can be used).

In [43]:
!cat arithmetic_expression_sample.pl

0.7::leaf(T).
0.5::operator('+',T) ; 0.5::operator('-',T).
Px::l('x',T); P::l(0,T) ; P::l(1,T) ; P::l(2,T) ; P::l(3,T) ; P::l(4,T) ; P::l(5,T) ; P::l(6,T) ; P::l(7,T) ; P::l(8,T) ; P::l(9,T) :- P is 0.5/10, Px is 0.5.

expr(L) :- expr(L,0,Tr).
expr([L,T],T,T) :- leaf(T), l(L,T).
expr([L,[O,T],R],T,Tr) :-
    \+leaf(T), operator(O,T),
    Tn1 is T+1, expr(L,Tn1,Tr1),
    Tn2 is Tr1+1, expr(R,Tn2,Tr).

query(expr(E)).



In [40]:
!../../examples/example_sampling_alt.py -N 4 arithmetic_expression_sample.pl

expr(['x', 0]).
% Probability: 0.35
expr(['x', 0]).
% Probability: 0.35
expr([['x', 1], ['+', 0], [3, 2]]).
% Probability: 0.0018375
expr([[[5, 2], ['+', 1], [[[8, 5], ['-', 4], ['x', 6]], ['+', 3], ['x', 7]]], ['+', 0], [3, 8]]).
% Probability: 2.6589199e-09


**TODO**: Parse sampled expressions and filter those who evaluate to 3 for x=1.

### Example: Rational Rules

*Note*: this is an example that shows that Psychologists assumed that we learn concepts by combining logical concepts. Unfortunately such a deterministic rule based system did not genearlize well. Afterwards researchers turned towards probabilistic models that were able to predict behavioral data very well, but lacked compositional conceptual structure. The point of Church is that it combines both. We should be able to do the same in ProbLog, no?

**TODO**: Isn't this a good application for ProbFoil? We could sample random DNF formulas but it is more interesting to learn a ProbLog theory directly from the given data.

## More Information

See the  [ProbLog website](https://dtai.cs.kuleuven.be/problog).