In [1]:
import numpy as np

### Factors: What are they?
Earlier the conditional probability tables (e.g $p(A|B,C)$) were refered to as factors.  But factors are actually a much more general way of defining a distribution, so need a proper introduction: <br>
To start from the very most basic formulation, factors are a way of writing a probability down as the multiplication of functions of the variables. By their nature they constrain what the probability function can be. For instance, say I have 3 binary variables $A$,$B$,$C$. Say I also define two functions: $\phi_1(A,B)$,$\phi_2(B,C)$ which we call factors. Then we can describe a probability function with these, say: <br>
$$p(A,B,C) = \frac{1}{Z} \phi_1(A,B)\phi_2(B,C)$$ <br>
Where $Z$ is a constant so the joint probability sums to 1. By definition, $Z=\sum_A \sum_B \sum_C \phi_1(A,B)\phi_2(B,C)$ <br>
Here the multiplication is called factor multiplication, but behaves just like normal multiplication. <br>
**An example with the above: (note, factors can be any function)** <br>
$\phi_1(0,0)=2$ <br>
$\phi_1(0,1)=0$ <br>
$\phi_1(1,0)=1$ <br>
$\phi_1(1,1)=1$ <br>

$\phi_2(0,0)=0$ <br>
$\phi_2(0,1)=1$ <br>
$\phi_2(1,0)=2$ <br>
$\phi_2(1,1)=1$ <br>

Then we can calculate $Z$:

In [2]:
def phi_1(A,B):
    if(A==0 and B==0): return 2
    if(A==0 and B==1): return 0
    if(A==1 and B==0): return 1
    if(A==1 and B==1): return 1
    
def phi_2(B,C):
    if(B==0 and C==0): return 0
    if(B==0 and C==1): return 1
    if(B==1 and C==0): return 2
    if(B==1 and C==1): return 1

total = 0
for A in [0,1]:
    for B in [0,1]:
        for C in [0,1]:
            total+=phi_1(A,B)*phi_2(B,C)
            
print(total)

6


Therefore the normalization of $p(A,B,C)$ is $\frac{1}{6}$. <br>
We can also calculate this by conviniently moving the sums into the formula:<br>
$Z=\sum_A \sum_B \sum_C \phi_1(A,B)\phi_2(B,C)$ <br>
$Z=\sum_A \sum_B \phi_1(A,B) \sum_C \phi_2(B,C)$ <br>

$\sum_C \phi_2(0,C) = 1$ <br>
$\sum_C \phi_2(1,C) = 3$<br>

$\phi_1(0,0) \times 1 = 2\times 1 = 2$<br>
$\phi_1(0,1) \times 3 = 0\times 3 = 0$<br>
$\phi_1(1,0) \times 1 = 1\times 1 = 1$<br>
$\phi_1(1,1) \times 3 = 1\times 3 = 3$<br>

$ \sum_A \sum_B \sum_C \phi_1(A,B)\phi_2(B,C) = 2+0+1+3 = 6 $

So, probability functions can be defined by factors. You often have to calculate the normalization $Z$ though, which can be intractable.  <br>
We can write any probability as a function of factors, and this then conveys conditional information. For instance, in the above case: 
$$p(A,B,C) = \frac{1}{6} \phi_1(A,B)\phi_2(B,C)$$ <br>
If we know $B$ then the probability of $A$ and $C$ are independent:
$$p(A,B=0,C) = \frac{1}{6} \phi_1(A,0)\phi_2(0,C)$$ <br>
If we rewrite $\tau_1(A) = \phi_1(A,0)$, and
if we rewrite $\tau_2(C) = \phi_2(0,C)$, then:
$$p(A,B=0,C) = \frac{1}{6} \tau_1(A)\tau_2(C)$$ <br>
Thus the above factor description of a probability distribution conveys conditional independence between $A$ and $C$ given $B$. In that sense it is the graph: $A-B-C$ <br>
There are some nice proofs putting this all in a logical format, but I exclude them here.-

### 3 operations
There are 3 operations which can be done on factors: 
1. Factor multiplication. Defined: $\phi_1(A,B)\cdot \phi_2(B,C)=(\phi_1\cdot \phi_2)(A,B,C)=\phi_1(A,B)\phi_2(B,C)$. As shown above, using this is just like normal multiplication and acts as a limit on the possible functions which can decribe $p(A,B,C)$. That limit is that a function which has $A$ and $C$ dependent even given $B$ cannot be written with the above formulation. 
2. Marginalization. This just means summing out the values for one particular variable. E.g, you want to know p(A) but only see $p(C)$ not $p(B)$. Then you sum out $B$: $p(A)=\sum_B p(A,B)$.
3. Conditioning. This refers to the process of taking a factor and reducing it using "seen" values. I.e $\phi_1(A,B)$ becomes some $\tau_1(A)$ when $B$ is known. Conditioning is to return a function where certain values in the function are set in place. E.g $f(x,y) = x^2 + y^2$ becomes $f(x) = x^2 + 4$ when $y=2$.

Factors are a very powerful way of describing a probability distribution. We can change any expanded probability distribution into a factor multiplication easily, e.g: <br>
*normal expansion:*
$$p(A,B,C)=p(C|A,B)p(B|A)p(A)$$
*conditional assumptions:*
$$p(A,B,C)=p(C|A)p(B)p(A)$$
*Factor representation which accounts for the same conditions:*
$$p(A,B,C)=\frac{1}{Z} \phi_1(C,A)\phi_2(B)\phi_3(A)$$

*In the case that $\phi_1(C,A) = p(C|A)$, and $\phi_2(B)$, $\phi_3(A)$ sum to 1*:
$$p(A,B,C)=\phi_1(C,A)\phi_2(B)\phi_3(A)$$

This is why factors in a conditional diagram are simply the conditional probabilities. There is no need to calculate $Z$, as it equals 1. For a proof, see pushing the sum into the formula.

So: <br>
Factors can be anything. They are a way of describing a probability that can make calculation easier. However, finding the normalization $Z$ can be difficult if there is a large number of variables involved. Luckily, if the factors are the conditional distribution functions (sum over all variables not being conditioned on gives 1), $Z$ is normalized by default. <br>