# Causal and statistical dependence
---

## Causal Dependence
Probabilistic programs encode knowledge about the world in the form of causal models, and it is useful to understand how their function relates to their structure by thinking about some of the intuitive properties of causal relations. Causal relations are local, modular, and directed. They are *modular* in the sense that any two arbitrary events in the world are most likely causally unrelated, or independent. If they are related, or dependent, the relation is only very weak and liable to be ignored in our mental models. Causal structure is *local* in the sense that many events that are related are not related directly: They are connected only through causal chains of several steps, a series of intermediate and more local dependencies. And the basic dependencies are *directed*: when we say that A causes B, it means something different than saying that B causes A. The *causal influence* flows only one way along a causal relation—we expect that manipulating the cause will change the effect, but not vice versa—but *information* can flow both ways—learning about either event will give us information about the other.

Let’s examine this notion of “causal dependence” a little more carefully. What does it mean to believe that A depends causally on B? Viewing cognition through the lens of probabilistic programs, the most basic notions of causal dependence are in terms of the structure of the program and the flow of evaluation (or “control”) in its execution. We say that expression A causally depends on expression B if it is necessary to evaluate B in order to evaluate A. (More precisely, expression A depends on expression B if it is ever necessary to evaluate B in order to evaluate A.) For instance, in this program `A` depends on `B` but not on `C` (the final expression depends on both `A` and `C`):

In [6]:
using Gen: bernoulli
include("auxilery.jl")

c = bernoulli(0.5)
b = bernoulli(0.5)
a = b ? bernoulli(0.1) : bernoulli(0.4)

a || c

false

Note that causal dependence order is weaker than a notion of ordering in time—one expression might happen to be evaluated before another in time (for instance `C` before `A`), but without the second expression requiring the first. (This notion of causal dependence is related to the notion of flow dependence in the programming language literature.)

For example, consider a simpler variant of our medical diagnosis scenario:

In [None]:
# var marg = Infer({method: 'enumerate'}, function() {
#      smokes = bernoulli(0.2)
#      lungDisease = (smokes && bernoulli(0.1)) || bernoulli(0.001)
#      cold = bernoulli(0.02)
#      cough = (cold && bernoulli(0.5)) || (lungDisease && bernoulli(0.5)) || bernoulli(0.001)
#      fever = (cold && bernoulli(0.3)) || bernoulli(0.01)
#      chestPain = (lungDisease && bernoulli(0.2)) || bernoulli(0.01)
#      shortnessOfBreath = (lungDisease && bernoulli(0.2)) || bernoulli(0.01)

#    condition(cough)
#    return {cold: cold, lungDisease: lungDisease}
# })

# viz.marginals(marg)


@gen function marg()
    smokes = @trace(bernoulli(0.2), :smokes)
    lungDisease = (smokes && @trace(bernoulli(0.1)), :lungDisease) || @trace(bernoulli(0.001), :lungDisease)
    cold = @trace(bernoulli(0.02), :cold)
    cough = (cold && @trace(bernoulli(0.5), :cough)) || (lungDisease && @trace(bernoulli(0.5), :cough)) || @trace(bernoulli(0.001), :cough)
    fever = (cold && @trace(bernoulli(0.3), :fever)) || @trace(bernoulli(0.01), :fever)
    chestPain = (lungDisease && @trace(bernoulli(0.2), :chestPain)) || @trace(bernoulli(0.01), :chestPain)
    shortnessOfBreath = (lungDisease && @trace(bernoulli(0.2), :shortnessOfBreath)) || @trace(bernoulli(0.01), :shortnessOfBreath)

end

viz.marginals(marg)

Here, cough depends causally on both `lungDisease` and `cold`, while `fever` depends causally on `cold` but not `lungDisease`. We can see that `cough` depends causally on `smokes` but only indirectly: although `cough` does not call `smokes` directly, in order to evaluate whether a patient coughs, we first have to evaluate the expression `lungDisease` that must itself evaluate `smokes`.

We haven’t made the notion of “direct” causal dependence precise: do we want to say that `cough` depends directly on `cold`, or only directly on the expression `(cold && bernoulli(0.5)) || ...?` This can be resolved in several ways that all result in similar intuitions. For instance, we could first re-write the program into a form where each intermediate expression is named (called A-normal form) and then say direct dependence is when one expression immediately includes the name of another.

There are several special situations that are worth mentioning. In some cases, whether expression A requires expression B will depend on the value of some third expression C. For example, here is a particular way of writing a noisy-AND relationship:

In [14]:
C = bernoulli(0.5)
B = bernoulli(0.5)
A = (C ? (B ? bernoulli(0.85) : false) : false)
A

true

A always requires C, but only evaluates B if C returns true. Under the above definition of causal dependence A depends on B (as well as C). However, one could imagine a more fine-grained notion of causal dependence that would be useful here: we could say that A depends causally on B only in certain *contexts* (just those where C happens to return true and thus A calls B).

Another nuance is that an expression that occurs inside a function body may get evaluated several times in a program execution. In such cases it is useful to speak of causal dependence between specific evaluations of two expressions. (However, note that if a specific evaluation of A depends on a specific evaluation of B, then any other specific evaluation of A will depend on *some* specific evaluation of B. Why?)

## Detecting Dependence Through Intervention
The causal dependence structure is not always immediately clear from examining a program, particularly where there are complex functions calls. Another way to detect (or according to some philosophers, such as Jim Woodward, to *define*) causal dependence is more operational, in terms of “difference making”: If we manipulate A, does B tend to change? By *manipulate* here we don’t mean an assumption in the sense of `condition`. Instead we mean actually edit, or *intervene on*, the program in order to make an expression have a particular value independent of its (former) causes. If setting A to different values in this way changes the distribution of values of B, then B causally depends on A.

In [None]:
# var BdoA = function(Aval) {
#     return Infer({method: 'enumerate'}, function() {
#       var C = flip()
#       var A = Aval //we directly set A to the target value
#       var B = A ? flip(.1) : flip(.4)
#       return {B: B}
#     })
#   }
  
#   viz(BdoA(true))
#   viz(BdoA(false))

This method is known in the causal Bayesian network literature as the “do operator” or graph surgery (Pearl, 1988). It is also the basis for interesting theories of counterfactual reasoning by Pearl and colleagues (Halpern, Hitchcock and others).

For example, this code represents whether a patient is likely to have a cold or a cough *a priori*, without conditions or observations:

In [None]:
# var medicalDist = Infer({method: 'enumerate'}, function() {
#   var smokes = flip(.2)
#   var lungDisease = flip(0.001) || (smokes && flip(0.1))
#   var cold = flip(0.02)

#   var cough = (cold && flip(0.5)) || (lungDisease && flip(0.5)) || flip(0.001)
#   var fever = (cold && flip(0.3)) || flip(0.01)
#   var chestPain = (lungDisease && flip(0.2)) || flip(0.01)
#   var shortnessOfBreath = (lungDisease && flip(0.2)) || flip(0.01)

#   return {cough: cough, cold: cold}
# })
# viz.marginals(medicalDist)

Imagine we now give our hypothetical patient a cold—for example, by exposing him to a strong cocktail of cold viruses. We should not model this as an observation (e.g. by conditioning on having a cold), because we have taken direct action to change the normal causal structure. Instead we implement intervention by directly editing the program: try to first do var cold = true, then do var cold = false:

In [None]:
# var medicalDist = Infer({method: 'enumerate'}, function() {
#   var smokes = flip(.2)
#   var lungDisease = flip(0.001) || (smokes && flip(0.1))
#   var cold = true // we intervene to make cold true

#   var cough = (cold && flip(0.5)) || (lungDisease && flip(0.5)) || flip(0.001)
#   var fever = (cold && flip(0.3)) || flip(0.01)
#   var chestPain = (lungDisease && flip(0.2)) || flip(0.01)
#   var shortnessOfBreath = (lungDisease && flip(0.2)) || flip(0.01)

#   return {cough: cough, cold: cold}
# })
# viz.marginals(medicalDist)

You should see that the distribution on cough changes: coughing becomes more likely if we know that a patient has been given a cold by external intervention. But the reverse is not true: Try forcing the patient to have a cough (e.g., with some unusual drug or by exposure to some cough-inducing dust) by writing var cough = true: the distribution on cold is unaffected. We have captured a familiar fact: treating the symptoms of a disease directly doesn’t cure the disease (taking cough medicine doesn’t make your cold go away), but treating the disease does relieve the symptoms.

Verify in the program above that the method of manipulation works also to identify causal relations that are only indirect: for example, force a patient to smoke and show that it increases their probability of coughing, but not vice versa.

If we are given a program representing a causal model, and the model is simple enough, it is straightforward to read off causal dependencies from the program code itself. However, the notion of causation as difference-making may be easier to compute in much larger, more complex models—and it does not require an analysis of the program code. As long as we can modify (or imagine modifying) the definitions in the program and can run the resulting model, we can compute whether two events or functions are causally related by the difference-making criterion.