# Chapter 1 - Introduction to Causality


## Anything wrong with data science?

Let's start with a simple image (Courtesy of Markus Elsholz). 
What is the object in the image below? A chair, right?

![img](img/ch1/Beuchet_chair_a.png)


<br/><br/>
What if we look at the object from a different angle? 

![img](img/ch1/Beuchet_chair_b.png)


The object is not a chair! We just had an illusion of a chair if the parts are viewed from a single and specific angle. This is the **Beuchet chair experiment** on changing perception in observations.

Do you have any other examples?

<br/><br/>
Unfortunately, most of our works in statistics, data science, and machine learning are based on **observations**! However, we can not just rely on observations to model and understand our world. 

![img](img/ch1/Math_Learning.jpeg)     


Saying that, let's see what Causality is or is NOT.


<br/><br/>
## What is NOT Causality?

<br/><br/>
### Causality is not Algebra

Dario is a little boy that feels fever, so his Mom measures his temperature with a thermometer. Then, hopefully, he can skip school today. 
From an algebra point of view, the height of mercury $X$ in the pipe is related to Dario's body temperature $Y$ with a constant $k$. 

$Y = k * X$

<img src="img/ch1/little-sick-boy.jpeg" width="250"/>

For our algebra equation, it does not matter if Dario's body temperature increases the mercury column height or the other way.

<br/><br/>
### Causality is not Statistics

- Most scientific inquiry/data analyses have one of the two goals:

    - **Association/prediction**, i.e., determine predictors or variables associated with the outcome of interest.
    - **Causality**, i.e., understanding factors that cause or influence the outcome of interest.

- Statistical concepts are those expressible in terms of the joint distribution of observed variables.

- We are often told that association is not causation. However, we forget about it. Therefore, we see numerous spurious/funny correlations like examples in the [Spurious Correlations collection](https://tylervigen.com/spurious-correlations). 

![img](img/ch1/Spurious_Correlations_Muzzarella.png)     

<br/><br/>
Another example is related to cigarette commercials in the USA in the 50th that claim smoking is helpful for coughs treatment and even helps you have a more fit body!

![img](img/ch1/Cigarette_Commercials.png)  

<br/><br/>
### Causality is not Machine Learning

We hear about rapid advances in machine learning systems every day, such as deep-learning algorithms in self-driving cars, speech-recognition systems, image processing, and virtual reality. Nevertheless, deep learning has succeeded primarily by performing repeatable tasks to answer specific questions that we thought were difficult. But, those questions are not that difficult. 

<br/><br/>
Machine learning has not addressed the tough questions that prevent us from achieving human-level AI. The public believes that AI machines can think like humans. In reality, computers don't even have animal-like cognitive abilities yet. See [Adnan Darwiche's paper, Human-Level Intelligence or Animal-Like Abilities?](https://arxiv.org/abs/1707.04327). 

<br/><br/>
The field of artificial intelligence is "bursting with micro discoveries"—the sort of things that make good press releases—but machines are still disappointingly far from human-like cognition. See [Gary Marcus's book, Rebooting AI: Building Machines We Can Trust](http://garymarcus.com/index.html)

![img](img/ch1/ML_Animal_Abilities.png)  

<br/><br/>
Machine learning systems usually operate in complex environments governed by rich webs of causal relations while having only access to surface representations of those causal relations through observation and measurements. Medicine, economics, education, climatology, and politics are typical examples of such environments. In other words, machine-learning methods today provide us with an efficient way of going from finite sample estimates to probability distributions. However, we still need to move from distributions to cause-effect relations in the real world. Machine learning is trapped in the Plato cave. See [Judea Pearl's book, the Book of Why](http://bayes.cs.ucla.edu/WHY/). 

![img](img/ch1/Plato_Cave.jpeg)  


<br/><br/>
Followings are some shortcomings of machine learning when it comes to causal inference. 

- Machine learning is limited to transferability to new problems and any form of generalization to data with a different distribution. 

- Machine learning often disregards information that even animals use heavily, e.g., interventions, domain shifts, and temporal structure. 

- Most current successes of machine learning boil down to large-scale pattern recognition on suitably collected independent and identically distributed (i.i.d.) data which is not the case in reality!



<br/><br/>
## Going Beyond Machine Learning to Answer a Different Kind of Questions

Machine Learning is currently very good at answering **prediction kinds of questions**. As the authors put it in the book Prediction Machines, "the new wave of artificial intelligence does not bring us intelligence but instead a critical component of intelligence - prediction." We can do all sorts of fantastic things with machine learning. The only requirement is that we frame our problems as prediction problems. 


<font color='blue'>*Do we want to translate from English to Italian?*</font>

<font color='blue'>*Do we want to recognize human faces?*</font>


An ML algorithm can wonder under very strict boundaries, and it fails miserably if our used data deviates a little from what the model has been trained before. ML is notoriously poor at this inverse causality type of problem that requires us to answer **what if** questions or **counterfactuals**. 

<font color='blue'>*What would happen if I do a low-sugar one instead of this low-fat diet I'm in?*</font>

<font color='blue'>*What would happen if I used another price instead of this price I'm currently asking for my merchandise?*</font>

At the heart of these questions, there is a causal inquiry we wish to know the answer to. **Causal Questions** permeate everyday problems, like figuring out how to make sales go up. Still, they also play an important role in very personal and dear dilemmas: 

<font color='blue'>*Do I have to go to an expensive school to be successful in life (does education cause earnings)?*</font>

<font color='blue'>*Does the public healthcare system increase life expectancy?*</font>


Unfortunately for Machine Learning, we can't rely on correlation-type predictions to answer causal questions. Answering this kind of question is more challenging than most people appreciate. Your teachers have probably repeated that "association is not causation" and "association is not causation." This is what this course is all about. 

![img](img/ch1/Courtroom.png)  


<br/><br/>
## What is Causality?

<br/><br/>
### Traditional Statistical Inference Paradigm

To explain causality, we go back to statistics fundamentals. Statistics summarize a population/set/observation into a distribution based on samples drawn from that population. Remember that we cannot derive causal claims from observational data alone.

Causal inference is the scientific process in which cause-and-effect relationships are inferred from observational data, but only after assuming a **causal model** that drives the relationships between random variables. 

We used an analogy proposed initially by [Judea Pearl, 2016](http://bayes.cs.ucla.edu/jsm-august2016-bw.pdf) and later used by [Camilo Hurtado, 2017](https://repositorio.unal.edu.co/handle/unal/59495) to better explain the causal inference. We assume an unknown, invariant, and true data generating process, $M$, generates a set of observed random variables (data), $D$, and associated multivariate probability distribution, $P$. 
The target of scientific inquiry in traditional statistical analysis is a probabilistic quantity, $Q(P)$, which summarizes some attribute of $D$ that is of our interest. $Q(P)$ can be estimated from $P$ and $D$ alone. 

![img](img/ch1/Stat_Paradigm.png)

However, causal analysis is different from statistical analysis. Causal inference is interested in an external intervention (treatment) effect on the causal system $M$ when experimental conditions change. This **intervention** acts as a specific modification to the data-generating model $M$, leading to an **unobserved (counterfactual) set of data $D'$ and a distribution $P'$**. This change is known as the **causal effect of an intervention**. In other words, it is the changes in the data generating process $M$ that generate hypothetical (unobserved) $D'$ and $P'$. Then, a causal target parameter $Q(P')$ is computed, which summarizes the causal effect of the given intervention (or treatment). 

![img](img/ch1/Causal_Paradigm.png)

The problem is that we only have access to $D$ and therefore $P$ in observational studies, while $D'$ and $P'$ remain unknown. Therefore, $D$ or $P$ alone cannot answer the causal quantity of interest. That is why we use a set of (un)testable causal assumptions to estimate $Q(P')$ from $D$ and $P$. With these assumptions at hand; we can mathematically express $Q(P')$ in terms of both $D$ and $P$, leaving $D'$ and $P'$ out.

<br/><br/>
### Causality is Beyond Statistics

- Causal inference requires extra information. There is nothing in the distribution of the data alone that tells us how it should change when conditions change.
- To make causal inferences we must make **assumptions** about the processes that generated the data. These are not statistical assumptions.
- **Causal assumptions** come from the expertise and previous experience of the researcher.
- **Causal questions** are questions about what happens when we change the way data are generated.


<br/><br/>
To summarize, we have two schools of thought for inference:

- **Associational Inference:** it includes any relationship that can be defined in terms of a joint distribution of observed variables
    - Correlation, conditional independence, dependence, likelihood, confidence level…
    - Testable in principle

- **Causation Inference:** it includes any relationship that cannot be defined in terms of joint distribution alone
    - Randomization, confounding, mediation, attribution, effect, …
    - Not testable in principle (without experimental control)
    - we can only test them if we can intervene on the system and see what happens. Even if we can intervene, there are some complications. But there are some tricks…
    - Capturing the patterns in the data we capture the principle patterns of the process that generates the data.
    
![img](img/ch1/DataGeneration.gif)


<br/><br/>
### Causality Ladder

In the [Book of Why](http://bayes.cs.ucla.edu/WHY/), Judea Pearl suggested the **Ladder of Causation**, which represents three levels of causality with different organisms at each level. 
* Most animals and present-day learning machines are on the first level, learning from the association. 
* Tool users, such as early humans, are on the second level if they act by planning and not merely by imitation. We can also use experiments to learn the effects of interventions, and presumably, this is how babies acquire much of their causal knowledge. 
* Finally, on the top level, counterfactual learners can imagine worlds that do not exist and infer reasons for observed phenomena. 

![img](img/ch1/Causal_Ladder.png)

<br/><br/>

<h3 style="color: green;">Ask ChatGPT</h3>

<font color='green'>**Prompt: What is the difference of causal inference and machine learning?**<font>


Causal inference and machine learning are two distinct but related fields with different goals and approaches, although there is some overlap between them. Here are the key differences:

**Goal:**

Causal Inference: The primary goal of causal inference is to understand cause-and-effect relationships and make causal claims. It focuses on identifying the causal effects of specific factors or interventions on outcomes of interest.
Machine Learning: The primary goal of machine learning is to develop algorithms and models that can learn patterns and make predictions or classifications based on data. It focuses on optimizing predictive accuracy and generalization to new data.

**Focus:**

Causal Inference: Causal inference is concerned with understanding the underlying mechanisms and factors that lead to a particular outcome. It seeks to answer questions like "What is the effect of X on Y?" or "What would have happened if X had not occurred?"
Machine Learning: Machine learning focuses on building models that can accurately predict outcomes based on input variables. It emphasizes finding patterns and relationships in the data that allow for accurate predictions without necessarily aiming to establish causal relationships.

**Data Requirements:**

Causal Inference: Causal inference often requires specific types of data, such as randomized controlled trial (RCT) data or carefully designed observational studies with controls for confounding factors. It typically involves collecting data that allow for the identification of causal effects.
Machine Learning: Machine learning can work with various types of data, including observational data, as long as it provides enough information to train a model and make accurate predictions. It can handle complex and large-scale datasets without necessarily requiring specific experimental designs.
Methods and Approaches:

Causal Inference: Causal inference relies on a range of statistical and econometric techniques, such as propensity score matching, instrumental variables, difference-in-differences, or structural equation modeling. These methods are designed to control for confounding and estimate causal effects.
Machine Learning: Machine learning encompasses a broad set of algorithms and techniques, such as decision trees, neural networks, support vector machines, and ensemble methods. These methods are primarily focused on learning patterns and relationships in data to optimize predictive performance.
It's important to note that while causal inference and machine learning have distinct goals, they can also be complementary. Causal inference can benefit from machine learning techniques for data analysis and prediction, while machine learning can benefit from causal inference to improve interpretability and make more informed decisions based on identified causal relationships.


## The Danger of Oversimplification

* Be aware of **our temptation to oversimplify**.
* Simple modes are sweeter for human intution. But, intuition is not enough to help us make sense of complex data- that is why we have statistics.
* Before using simpler models, let's have our or expert's knowledge justify the model first.
* **Oversimplification leads to misinformation**.

![img](img/ch1/misinformation.png)  

Here is the a model vs. a simplified model:

![img](img/ch1/Oversimplified-model.png)  
![img](img/ch1/normal-model.png) 


## References

Most of the ideas in this chapter are taken from Judea Pearl Books. 

* [Causality, 2nd Edition](http://bayes.cs.ucla.edu/BOOK-2K/)
* [The Book of Why](http://bayes.cs.ucla.edu/WHY/)

We also like to reference the open-source book on causality by Matheus Facure Alves. He did a great job in explaining causal concepts with examples and fuuny memes.

* [Causal Inference for The Brave and True](https://matheusfacure.github.io/python-causality-handbook/landing-page.html)

[Ilya Shpitser](https://www.cs.jhu.edu/~ilyas/) from JHU also did a great job with his causal inference course. 