# Causal inference applied to an electric circuit
This will be a case of causal inference applied to a simple electric circuit. The primary function of this circuit will be defined by a specific causal relationship, i.e. the position of the switch should *cause* the light to be on or off. Failure modes will *cause* a specific perturbation in the primary function. Statistical associations are generally insufficient to identify causalities. A Directed Acyclic Graph (DAG) will be introduced as a representation of the expert knowledge that should be presumed to identify a causality by a statistical association. This case will *apply* machine learning to detect faults using causal inference. The practical challenges rather than the theoretical foundations will be important here.

This case will firstly introduce the circuit. It will proceed by inferring a causality between the switch and the light. Then, it will introduce a single failure mode that *causes* a perturbation in the primary function. Finally, the case will be generalised to multiple failure modes. The script will revisit the case using extended, realistic time series using a random forest algorithm and k-means clustering.


## Introduction to the electric circuit

<table>
    <thead>
        <tr>
            <th> <th>Time <th>V<sub>0</sub> <th> V<sub>1</sub> <th> S<sub>1</sub>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td rowspan=11> <img src="figures/CausalInference01.png" width="350"/>  <td>[  0:  4] <td> 0,0 <td> 0,0 <td> 0
        </tr>
        <tr>
            <td>[  5:14] <td> 3,4 <td> 1,5 <td> 1
        </tr>
        <tr>
            <td>[15:18] <td> 0,0 <td> 0,0 <td> 0
        </tr>
        <tr>
            <td>[19:23] <td> 3,4 <td> 1,5 <td> 1
        </tr>
        <tr>
            <td>[24:30] <td> 0,0 <td> 0,1 <td> 0
        </tr>
        <tr>
            <td>[31:36] <td> 3,5 <td> 1,4 <td> 1
        </tr>
        <tr>
            <td>[37:38] <td> 0,0 <td> 0,0 <td> 0
        </tr>
        <tr>
            <td>[39:40] <td> 0,0 <td> 0,0 <td> 0
        </tr>
        <tr>
            <td>[41:49] <td> 2,5 <td> 2,5 <td> 1
        </tr>
        <tr>
            <td>[50:53] <td> 3,5 <td> 1,5 <td> 1
        </tr>
        <tr>
            <td>[54:60] <td> 0,0 <td> 0,0 <td> 0
        </tr>
    </tbody>    
</table>

The electric circuit in Figure 1 consists of a light, two resistors, a ground connection and a switch. From this circuit, the following measurements have been recorded: 
- the position of the switch (S<sub>1</sub>), 
- the voltage V<sub>0</sub> over one resistor and the light,
- the voltage V<sub>1</sub> over one resistor.

Let us presume that the time series at the right hand side in Figure 1 have been collected by non-experimental research, i.e. the time series just represent part of the circuit's course of operations. 

![image.png](figures/CausalInference02.png)

Figure 2 shows that these time series cluster. These clusters may indicate the state of the circuit. The state of the circuit will be defined by:
- the position of the switch $S_{1}$;
- the presence or absence of four failure modes $F_{1},F_{2},F_{3},F_{4}$.

These elements $S_{1},F_{1},F_{2},F_{3},F_{4}$ are thought to be *causes* of the voltages. So, a decision maker who *cannot* directly observe the failure modes $F_{1},F_{2},F_{3},F_{4}$, may still learn about $F_{1},F_{2},F_{3},F_{4}$ by observing their (voltage) effect. Therefore, the clusters in Figure 2 may indicate the hidden state of the circuit.

In the absence of any failure modes, the switch $S_{1}$ should *cause* a very specific voltage $V_{0}$. This causality will be represented:
- by the Directed Acyclic Graph (DAG) $S_{1} \rightarrow V_{0}$, or 
- by the difference in potential outcomes (Rubin) $V_{0}(S_{1}=1)- V_{0}(S_{1}=0)$. 

A causality is *not* a statistical association. For example, claiming the statistical independence $Pr(V_{0}|S_{1})=Pr(V_{0})$ is equivalent to claiming the statistical independence $Pr(S_{1}|V_{0})=Pr(S_{1})$. However, claiming that the voltage $V_{0}$ does not cause the switch position $S_{1}$ does not exclude the possibility that the switch position $S_{1}$ causes the voltage $V_{0}$. This example illustrates that a causality is *generally not* equal to a statistical dependency. Still, a causality is *sometimes* equal to a statistical dependency. In the specific case that the causal interactions can be represented in a DAG, Judea Pearl showed how to identify a causality from a statistical dependency. This is an important result because a statistical dependency can be estimated from observed frequencies.

A final technical note, only statistical expectations $E[.]$ rather than full probability distributions $Pr(.)$ will be estimated in the case below.


## Example of a causal inference

This Section will infer a causality from the time series in Figure 1 together with the presumed DAG $V_{0} \leftarrow S_{1} \rightarrow V_{1}$. This DAG implies that the position of the switch $S_{1}$ *causes* the voltages. The direct assessment of a causality is problematic as the counterfactual reality remains unobserved. For example, the time series show that the switch $S_{1}$ closed at a time $t=5$, but they do not show what would have happened if the switch were *not* closed at the time $t=5$. Therefore, the effect of that specific closure of the switch is *not* directly observable. So, the time series just list the values that *occurred* but not the values that *could have occurred* had the switch been in another position. The challenge now is to infer the causal effect of an intervention on the switch. 

Still, an average voltage at given a switch position can be calculated. For example, the average voltage $\overline{V}_{0}$ given an open switch $S_{1}=0$ is given by:

$  [\overline{V}_{0}|S_{1}=0]=0,0V $

The average voltage $V_{0}$ given a closed switch $S_{1}=1$ is given by:

$ [\overline{V}_{0}|S_{1}=1]=3,2V $

From the above equations a difference in expectations can be estimated:

$ E[V_{0}|S_{1}=1]-E[V_{0}|S_{1}=0] = [\overline{V}_{0}|S_{1}=1] - [\overline{V}_{0}|S_{1}=0] = 3,2 - 0,0 = 3,2V $

In this example, the voltage recordings at a closed switch are completely separated from those at an open switch as illustrated in Figure 2. Therefore, these voltages are likely to come from different random experiments which makes the difference in means above *statistically significant*. Still, this statistical dependency may be explained in various ways. By presuming the DAG $V_{0} \leftarrow S_{1} \rightarrow V_{1}$, the difference in expectation equals the average treatment effect (ATE) of the switch $S_{1}$:

$ E[V_{0}|S_{1}=1]-E[V_{0}|S_{1}=0] = 3,2V \;\;\;\;\;\;\;-^{DAG}\rightarrow\;\;\;\;\;\;\;  ATE= E[V_{0}(S_{1}=1)]-E[V_{0}(S_{1}=0)] = 3,2V $

The ATE of the switch $S_{1}$ on the voltage $V_{0}$ indicates a *causal* response to the switch. This move from a difference in expectations to an ATE is *not* trivial. For an in depth explanation on how to identify an ATE from statistical associations, a textbook on causal inference should be consulted. Here, it is just stated as a fact that the additional knowledge represented by the DAG $V_{0} \leftarrow S_{1} \rightarrow V_{1}$ is needed to identify the ATE of the switch $S_{1}$ on the voltage $V_{0}$. This knowledge may seem evident to those who took classes in electronics and one may design various experiments to substantiate this DAG, but the DAG does *not* follow from the observed time series in Figure 1. For example, the DAG $V_{1} \leftarrow V_{0}\rightarrow S_{1}$ may well produce the same time series but an *intervention* on the switch $S_{1}$ would not change the voltages then. Therefore, the time series by themselves do not suffice to identify the ATE of the switch. Presumptions represented in the DAG are essential.


## Example of causal inference extended with a single failure mode

A more careful analysis of the time series in Figure 1 reveals that the voltages at a closed switch $S_{1}=1$ cluster differently. In particular, the voltages during the time interval $t=[41:49]$ differ from the other ones. A *causal explanation* is desirable for this anomalous behaviour. As constant causes cannot reveal their effects, this anomalous behaviour of the voltages cannot be explained by the switch position, there must be an unobserved background variable $B$ at work. This Section will extend the example with the background variable $B$ presuming a DAG that implies that both $S_{1}$ and $B$ are confounders of the voltages. So, the extended DAG is:

$V_{0} \leftarrow S_{1} \rightarrow V_{1}$ and $V_{0} \leftarrow B \rightarrow V_{1}$

The Table below just extends the time series from Figure 1 with a time series of a tentative background variable $B$ that could explain the anomalous behaviour of the circuit as shown in the Table below.

<table>
    <thead>
        <tr>
            <th>Time <th>V<sub>0</sub> <th> V<sub>1</sub> <th> S<sub>1</sub> <th> B
        </tr>
    </thead>
        <tr>
            <td>[  0:  4] <td> 0,0 <td> 0,0 <td> 0 <td> 0
        </tr>
        <tr>
            <td>[  5:14] <td> 3,4 <td> 1,5 <td> 1 <td> 0
        </tr>
        <tr>
            <td>[15:18] <td> 0,0 <td> 0,0 <td> 0 <td> 0
        </tr>
        <tr>
            <td>[19:23] <td> 3,4 <td> 1,5 <td> 1 <td> 0
        </tr>
        <tr>
            <td>[24:30] <td> 0,0 <td> 0,1 <td> 0 <td> 0
        </tr>
        <tr>
            <td>[31:36] <td> 3,5 <td> 1,4 <td> 1 <td> 0
        </tr>
        <tr>
            <td>[37:38] <td> 0,0 <td> 0,0 <td> 0 <td> 0
        </tr>
        <tr>
            <td>[39:40] <td> 0,0 <td> 0,0 <td> 0 <td> 1
        </tr>
        <tr>
            <td>[41:49] <td> 2,5 <td> 2,5 <td> 1 <td> 1
        </tr>
        <tr>
            <td>[50:53] <td> 3,5 <td> 1,5 <td> 1 <td> 0
        </tr>
        <tr>
            <td>[54:60] <td> 0,0 <td> 0,0 <td> 0 <td> 0
        </tr>    
</table>

Now, the average voltage $V_{0}$ given an open switch $S_{1}=1$ conditional on $B=0$ is given by:

$  [\overline{V}_{0}|S_{1}=0, B=0]=0,0V $

The average voltage $V_{0}$ given a closed switch $S_{1}=1$ conditional on $B=0$ is given by:

$ [\overline{V}_{0}|S_{1}=1, B=0]=3,44V $

From the above equations, a difference in conditional expectations can be estimated:

$ E[V_{0}|S_{1}=1,B=0]-E[V_{0}|S_{1}=0,B=0] = [\overline{V}_{0}|S_{1}=1,B=0] - [\overline{V}_{0}|S_{1}=0,B=0] = 3,44 - 0,0 = 3,44V $

and:

$ E[V_{0}|S_{1}=1,B=1]-E[V_{0}|S_{1}=0,B=1] = [\overline{V}_{0}|S_{1}=1,B=1] - [\overline{V}_{0}|S_{1}=0,B=1] = 2,50 - 0,0 = 2,50V $

Again, these conditional differences in expectation are *statistically significant* as the data points at a closed switch are entirely separated from the ones at an open switch. So, conditioning on the tentative background variable $B$ "significantly" reduces the spread in the voltage $V_{0}$. Now, by presuming the DAG:

$V_{0} \leftarrow S_{1} \rightarrow V_{1}$ and $V_{0} \leftarrow B \rightarrow V_{1}$

a conditional average treatment effect (ATE) of the switch again follows from the above differences in conditional expectations:

$ E[V_{0}|S_{1}=1,B=0]-E[V_{0}|S_{1}=0,B=0] = 3,44V   \;\;\;\;\;\;\;-^{DAG}\rightarrow\;\;\;\;\;\;\;   ATE_{@B=0} = E[V_{0}(S_{1}=1)|B=0]-E[V_{0}(S_{1}=0)|B=0]=3,44V  $
$ E[V_{0}|S_{1}=1,B=1]-E[V_{0}|S_{1}=0,B=1] = 2,50V   \;\;\;\;\;\;\;-^{DAG}\rightarrow\;\;\;\;\;\;\;   ATE_{@B=1} = E[V_{0}(S_{1}=1)|B=1]-E[V_{0}(S_{1}=0)|B=1]=2,50V  $

Up until now, the time series $B$ remained an unobserved hidden variable. But an engineer may identify the anomalous behaviour at $t=[41:49]$ as a short circuit of the light. Eventually, the occurrence of this hidden failure mode has been recorded which enables verification. Generally, any causal inference from observations is vulnerable for unobserved background variables as observations are incomplete. The presumed DAG may be highly controversial in some cases, but at least it is a precise description of the expert knowledge required to causaly explain this statistical association.


## Example of causal inference extended with multiple failure modes

This Section will consider multiple hidden failure modes that perturb the primary function. Technical systems are typically designed to have a specific effect. In this case, the light should listen to the switch in a specific way. Let the Failure Mode and Effect Analysis (FMEA) below identify the hidden variables (failure modes) that may perturb the ATE of the switch.

<table>
    <tr>
        <th>Component <th>Failure mode <th>Code <th>Failure effect
    </tr>
    <tr>
        <td>wire to light <td>break circuit <td> $F_{1}$ <td>No light when switched on
    </tr>
    <tr>
        <td>light <td>break circuit <br /> short circuit <td>$F_{2}$ <br />$F_{3}$ <td>No light when switched on
    </tr>
    <tr>
        <td>wire to ground <td>break circuit <td>$F_{4}$ <td>No light when switched on
    </tr>
</table>

As each failure mode could either be absent or present, the electric circuit will exhibit 2<sup>4</sup> system states that could perturb the causal effect of the switch $S_{1}$. Let the voltages of the time series in the example be *caused* by the following states of the circuit: 

<table>
    <tr>
        <th> $E[V_{0}|S_{1},F_{1},F_{2},F_{3},F_{4}]$ <th> $E[V_{1}|S_{1},F_{1},F_{2},F_{3},F_{4}]$ <th> $S_{1}$ <th> $F_{1}$ <th> $F_{2}$ <th> $F_{3}$ <th> $F_{4}$
    </tr>
    <tr>
        <td> $0$     <td> $0$     <td> $0$     <td> $0$           <td> $0$           <td> $0$           <td> $0$
    </tr>
    <tr>
        <td> $0$     <td> $0$     <td> $0$     <td> $0$           <td> $0$           <td> $1$           <td> $0$
    </tr>
    <tr>
        <td> $3,44$     <td> $1,48$     <td> $1$     <td> $0$           <td> $0$           <td> $0$           <td> $0$
    </tr>
    <tr>
        <td> $2,5$     <td> $2,5$     <td> $1$     <td> $0$           <td> $0$           <td> $1$           <td> $0$
    </tr>
</table>

Evidently, the Table above should comprise 32 rows to be complete implying that the time series in the example do not cover all states of the circuit. In practice, time series collected from a system's normal course of operations are similarly incomplete. This precludes a data driven estimation of an ATE. Still, To assess a specific conditional ATE from the first two columns of the Table, a DAG should be presumed. In particular, to assess the conditional ATE of the switch $S_{1}$ on a voltage:

$E[V_{x}(S_{1}=1)|F_{1},F_{2},F_{3},F_{4}] - E[V_{x}(S_{1}=0)|F_{1},F_{2},F_{3},F_{4}]$

a DAG is presumed where the switch $S_{1}$ and the failure modes $F_{y}$ collide on a voltage $V_{0}$:

$S_{1} \rightarrow V_{x} \leftarrow F_{y}$

So, the switch $S_{1}$ and the failure modes $F_{y}$ together cause the voltages $V_{x}$. As the number of system states exponentially grows with the number of failure modes and operating states (switch positions), larger electric circuits rapidly become too complex to estimate or observe all possible system states. Data driven estimations of ATE's are therefore only applicable to cases of *low complexity*.

The assessment of the fault state $F_{1}F_{2}F_{3}F_{4}$ in the Table above remained undiscussed up until now. If each fault state $F_{1}F_{2}F_{3}F_{4}$ were to biject with the voltages and the switch position $V_{0}V_{1}S_{1}$, *autonomous online* fault detection would have been straightforward. The Table above already showed that an open switch $S_{1}=0$ maps to several fault states. Therefore, the triplet $V_{0}V_{1}S_{1}$ is insufficient to identify the fault state. Still, knowledge of the triplet $V_{0}V_{1}S_{1}$ constrains the possible fault states $F_{1}F_{2}F_{3}F_{4}$ and contribute to *expert based online* fault detection. If the circuit were to be taken *offline*, it is often possible to control the triplet $V_{0}V_{1}S_{1}$ which often enables a further reduction of the possible fault states $F_{1}F_{2}F_{3}F_{4}$.


# [Click here to see the script](https://nbviewer.jupyter.org/github/chrisrijsdijk/RAMS/blob/master/notebook/Arduino_3Vars/Arduino_diagnostics_ensemble3VarsVal.ipynb) 
