## Structured CPDs

**Tabular representations can be impractical**, e.g. when a variable has many possible parents. Even if everything is binary, the number of entries goes up by $O(2^k)$ in the case of $k$ parents.

So we can use a general CPD that can specify the distribution over each variable given its parents.

We can use any function to specify a factor $\phi (X, Y_1, \dotsc , Y_k)$ such that

$$ \Sigma_x \phi (x, y_1, \dotsc , y_k) = 1 \text{for all } y_1, \dotsc , y_k$$

There are many ways to do this:

* Deterministic CPDs

* Tree-structured CPDs

* Logistic CPDs

* Linear Gaussians

### Context-Specific Independence

$$P \models (X \perp_c Y \ | \ Z, c)$$ 

E.g. $P(X,Y \ | \ Z,c) = P(X \ | \ Z,c)P(Y \ | \ Z,c)$

### Tree-structured CPDs

<img src="imgs/tree_pgm.png" width=80%>

Here we only need to represent 4 variables instead of 8

Some tree-structured CPDs can also represent **non-context specific** independencies:

<img src="imgs/ncs_treepgm.png" width=80%>

This structure is a **multiplexer CPD** 

<img src="imgs/multiplex_cpd.png" width=80%>

i.e. A tells us which version of Y do we need to copy


## Independence of Causal Influence

### Noisy OR CPD

<img src="imgs/noisy_or.png" width=85%>


$$\large P(Y = 0 \mid X_1, \dotsc , X_k) = (1 - \lambda_0 ) \prod_{i:X_i = 1} (1 - \lambda_i)$$

$z_0$ is the *leak term* 

Knowing that $Y = 0$ **blocks** the trail of influence between Xs

<img src="imgs/noisyor_condind.png" width=85%>

### Aggregation:


<img src="imgs/aggregate.png" width=85%>


### Sigmoid CPD

If $x_i$ is discrete, $Z$ is continuous and influenced by $x_i$ by *weight $w_i$*

$Z = w_0 + \sum_{i=1}^{k} w_i X_i$

We need to turn $Z$ into a sigmoid function:

$\sigma (z) = \Large \frac{e^z}{1+e^z}$