# Chapter 1 - Introduction to Causality


Our journey into the world of Applied Causal Inference begins here. In this chapter, we'll grapple with fundamental questions about causality: 

- What is it? 
- How does causal inference differ from statistical inference? 
- In an age of remarkable machine learning achievements, do we still need causality?

Over the past decade, the landscape of data science and artificial intelligence has been transformed by the "unreasonable effectiveness" of machine learning algorithms. From computer vision systems that outperform humans in image recognition to natural language models capable of generating coherent, context-aware text, the capabilities of AI have grown exponentially. Models like **Claude Sonnet** and **GPT-4** have not only revolutionized research but have also captured the public imagination, leading some to question the need for alternative approaches to data analysis.


Indeed, if you've been following the rapid evolution of machine learning, you've likely encountered numerous examples of its prowess across various domains. This might prompt you to ask: *If these algorithms work so well, why should we bother looking into something else?*

The answer lies in the unique insights and capabilities that causal inference offers. Despite the power of modern machine learning, there are scenarios where understanding the underlying causal mechanisms becomes crucial. In this chapter, we'll explore:


- The historical development of causal thinking and its impact on scientific inquiry
- Specific cases where causal models provide advantages over purely statistical methods
- Common misconceptions and oversimplifications in causal reasoning


By examining these aspects, we'll uncover why causal inference remains a critical tool in the data scientist's arsenal, complementing rather than competing with machine learning approaches.

<br/><br/>
## A Brief History of Causality

The concept of causality has been a cornerstone of human understanding across civilizations and throughout history. Its study has evolved significantly, shaping our approach to scientific inquiry and our understanding of the world around us.

### Ancient Foundations: Aristotle's Four Causes

In ancient Greece, Aristotle laid the groundwork for causal thinking that would influence Western philosophy for centuries. He posited that true knowledge of any process necessitates an understanding of its causal structure. Aristotle's framework included four types of causes:

- **Material cause**: The substance from which something is made
- **Formal cause**: The essential nature or form of the thing
- **Efficient cause**: The agent of change or the maker
- **Final cause**: The purpose or end for which something exists

While this categorization might seem counterintuitive to modern scientists, it represents one of the earliest systematic attempts to categorize different aspects of causation. Aristotle argued that answering "why" questions—explaining these causes—forms the essence of scientific explanation.

<img src="img/ch1/Aristotle_and_causal_thinking.png" alt="Aristotle and Causal Thinking" width="500"/>


BTW, what do you think about Aristotle picture?

### The Enlightenment Shift: David Hume's Empiricism
Fast forward to the 18th century, and we encounter David Hume, a Scottish philosopher who revolutionized causal thinking. Hume's approach marked a significant departure from Aristotelian ideas, focusing instead on empirical observation and human psychology.
Hume's key insight was that we never directly observe cause-effect relationships in the world. Instead, we only experience the conjunction of events. As he famously wrote:


"We only find, that the one does actually, in fact, follow the other. The impulse of one billiard-ball is attended with motion in the second. This is the whole that appears to the outward senses." *(original spelling; Hume & Millican, 2007; originally published in 1739)*.


Hume's theory of causality, simplified for clarity, can be summarized as follows:

- We observe sequences of events (e.g., object A moves, then object B moves)
- Repeated observations of such sequences create an expectation in our minds
- This feeling of expectation is what we call **"causality"**

In essence, Hume argued that causality is not an inherent property of the world, but a psychological construct arising from our experiences.

<img src="img/ch1/David_Hume_theory_of_causality_18th_century.png" alt="Hume and Causal Theory" width="500"/>


### Implications for Modern Causal Inference
The historical evolution of causal thinking, from Aristotle to Hume, set the stage for modern approaches to causal inference. These early philosophers grappled with fundamental questions that still resonate today:

- How can we distinguish genuine causal relationships from mere correlations?
- To what extent can we infer causal structures from observational data alone?
- What role does human cognition play in our understanding of causality?

As we delve deeper into contemporary methods of causal inference, it's crucial to remember that we're building upon centuries of philosophical and scientific thought. The challenges we face in identifying and quantifying causal relationships echo those pondered by thinkers throughout history.

Here are some of the main theorists of causal inference, along with their key contributions and primary academic affiliations:

**[Judea Pearl](https://bayes.cs.ucla.edu/jp_home.html)**, Professor Emeritus, Department of Computer Science, University of California, Los Angeles (UCLA)
*Key Contribution*: Developed causal diagrams (DAGs) and do-calculus, laying the groundwork for modern approaches to causal inference, particularly in distinguishing correlation from causation.

**[Donald Rubin](https://statistics.fas.harvard.edu/people/donald-b-rubin)**, Professor Emeritus, Department of Statistics, Harvard University
*Key Contribution*: Creator of the Rubin Causal Model (RCM) or potential outcomes framework, which is central to the analysis of causal effects, particularly in the context of randomized and observational studies.

**[James Heckman](https://cehd.uchicago.edu/?page_id=71)**, Professor of Economics, University of Chicago
*Key Contribution*: Significant contributions to econometrics, including the Heckman correction for addressing selection bias and the estimation of treatment effects in observational data.

**[Guido Imbens](https://www.gsb.stanford.edu/faculty-research/faculty/guido-w-imbens)**, Professor of Economics, Stanford University
*Key Contribution*: Known for work on instrumental variables and local average treatment effects (LATE), providing practical methods for estimating causal effects in both observational and quasi-experimental contexts.

<br/><br/>

## What babies are telling us about Causality

While Hume's theory of causation provided a foundational understanding of cause and effect, it left some questions unanswered. To address these gaps, we turn to an unexpected source: human infants. The study of how babies develop their understanding of the world offers profound insights into the nature of causal reasoning and its importance in human cognition.

<img src="img/ch1/baby_learning_brain.png" alt="Baby and Learning" width="500"/>


### Beyond Passive Observation: The Active Learner

Alison Gopnik, a prominent developmental psychologist, has made significant contributions to our understanding of how children construct their models of the world. Her work, bridging developmental psychology and computer science, reveals that children are far more than passive observers of their environment [Gopnik,(2012)](https://doi.org/10.1126/science.1223416).

Key insights from Gopnik's research include:

1. Children as Scientists: Babies and young children engage in behaviors that, while sometimes interpreted as disruptive or random, are actually systematic experiments to understand their environment [Gopnik,(2009)](https://books.google.no/books/about/The_Philosophical_Baby.html).

2. Preference for the Unpredictable: Infants as young as 11 months show a preference for objects that behave in unpredictable ways. This preference drives them to explore and learn about novel phenomena efficiently [Stahl,(2015)](https://doi.org/10.1126/science.aaa3799).

3. Active Interaction: Unlike Hume's theory, which focuses on passive observation, babies actively interact with their environment to test hypotheses and build causal models.


### The Power of Intervention

What sets the infant's approach apart from Hume's conception is the crucial element of intervention. In the context of causal inference, these interactions are termed **interventions**, and they form the backbone of modern experimental design [Pearl,(2009)](https://doi.org/10.1017/CBO9780511803161).


Interventions allow us to:

- Distinguish between correlation and causation
- Test hypotheses about causal relationships
- Build more robust and accurate models of the world

This concept of intervention is not just a quirk of infant behavior; it's at the heart of scientific inquiry. The gold standard of scientific experimentation, the **Randomized Controlled Trial (RCT)**, is essentially a formalized, rigorous application of the same principle that drives a baby to repeatedly drop a spoon from their high chair [5].


### Implications for Causal Inference

The insights from developmental psychology have profound implications for how we approach causal inference:

1. **Active Learning**: We should design algorithms and studies that don't just passively observe data but actively interact with systems to uncover causal structures [6].

2. **Embracing Uncertainty**: Like infants who are drawn to unpredictable phenomena, our causal inference methods should be capable of identifying and exploring areas of uncertainty [7].

3. **Iterative Experimentation**: The scientific process, mirroring a child's repeated experiments, should involve iterative interventions and observations to refine our causal models [8].

As we delve deeper into the methods and applications of causal inference in subsequent chapters, keep in mind this fundamental insight: true understanding of Causality comes not just from observing the world, but from interacting with it. This principle, so naturally embodied in the behavior of infants, forms the cornerstone of modern causal inference techniques and experimental design.

<br/><br/>

## Anything wrong with data science?

Let's start with a simple image (Courtesy of Markus Elsholz). 
What is the object in the image below? A chair, right?

![img](img/ch1/Beuchet_chair_a.png)


<br/><br/>
What if we look at the object from a different angle? 

![img](img/ch1/Beuchet_chair_b.png)


The object is not a chair! We just had an illusion of a chair if the parts are viewed from a single and specific angle. This is the **Beuchet chair experiment** on changing perception in observations.


<font color='blue'>*Do you have any other examples?*</font>


<br/><br/>
Unfortunately, most of our works in statistics, data science, and machine learning are based on **observations**! However, we can not just rely on observations to model and understand our world. 

![img](img/ch1/Math_Learning.jpeg)


Saying that, let's see what Causality is or is NOT.


<br/><br/>
## What is NOT Causality?

<br/><br/>
### Causality is not Algebra

Dario is a little boy that feels fever, so his Mom measures his temperature with a thermometer. Then, hopefully, he can skip school today. 
From an algebra point of view, the height of mercury $X$ in the pipe is related to Dario's body temperature $Y$ with a constant $k$. 

$Y = k * X$

<img src="img/ch1/little-sick-boy.jpeg" width="250"/>

For our algebra equation, it does not matter if Dario's body temperature increases the mercury column height or the other way.

<br/><br/>
### Causality is not Statistics

- Most scientific inquiry/data analyses have one of the two goals:

    - **Association/prediction**, i.e., determine predictors or variables associated with the outcome of interest.
    - **Causality**, i.e., understanding factors that cause or influence the outcome of interest.

- Statistical concepts are those expressible in terms of the joint distribution of observed variables.

- We are often told that association is not causation. However, we forget about it. Therefore, we see numerous spurious/funny correlations like examples in the [Spurious Correlations collection](https://tylervigen.com/spurious-correlations). 

![img](img/ch1/Spurious_Correlations_Muzzarella.png)     

<br/><br/>
Another example is related to cigarette commercials in the USA in the 50th that claim smoking is helpful for coughs treatment and even helps you have a more fit body!

![img](img/ch1/Cigarette_Commercials.png)  

<br/><br/>
### Causality is not Machine Learning

* We hear about rapid advances in machine learning systems every day, such as deep-learning algorithms in self-driving cars, speech-recognition systems, image processing, virtual reality, and LLMs. Nevertheless, deep learning has succeeded primarily by performing repeatable tasks to answer specific questions that we thought were difficult. But, those questions are not that difficult. 


* Machine learning has not addressed the tough questions that prevent us from achieving human-level AI. The public believes that AI machines can think like humans. In reality, computers don't even have animal-like cognitive abilities yet. See [Gary Marcus's paper, The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence.](https://arxiv.org/pdf/2002.06177). 


* The field of artificial intelligence is **bursting with micro discoveries**—the sort of things that make good press releases—but machines are still disappointingly far from human-like cognition. See [Gary Marcus's book, Rebooting AI: Building Machines We Can Trust](http://garymarcus.com/index.html)

![img](img/ch1/ML_Animal_Abilities.png)  

<br/><br/>


Machine learning is trapped in the **Plato Cave**. See [Judea Pearl's book, the Book of Why](http://bayes.cs.ucla.edu/WHY/). 


![img](img/ch1/Plato_Cave.jpeg)  


<br/><br/>
Followings are some shortcomings of machine learning when it comes to causal inference. 

- Machine learning is limited to transferability to new problems and any form of generalization to data with a different distribution. 

- Machine learning often disregards information that even animals use heavily, e.g., interventions, domain shifts, and temporal structure. 

- Most current successes of machine learning boil down to large-scale pattern recognition on suitably collected independent and identically distributed (i.i.d.) data which is not the case in reality!



<br/><br/>
## Going Beyond Machine Learning to Answer a Different Kind of Questions

Machine Learning is currently very good at answering **prediction kinds of questions**. As the authors put it in the book Prediction Machines, "the new wave of artificial intelligence does not bring us intelligence but instead a critical component of intelligence - prediction." We can do all sorts of fantastic things with machine learning. The only requirement is that we frame our problems as prediction problems. 


<font color='blue'>*Do we want to translate from English to Italian?*</font>

<font color='blue'>*Do we want to recognize human faces?*</font>


An ML algorithm can wonder under very strict boundaries, and it fails miserably if our used data deviates a little from what the model has been trained before. ML is notoriously poor at this inverse causality type of problem that requires us to answer **what if** questions or **counterfactuals**. 

<font color='blue'>*What would happen if I do a low-sugar one instead of this low-fat diet I'm in?*</font>

<font color='blue'>*What would happen if I used another price instead of this price I'm currently asking for my merchandise?*</font>

At the heart of these questions, there is a causal inquiry we wish to know the answer to. **Causal Questions** permeate everyday problems, like figuring out how to make sales go up. Still, they also play an important role in very personal and dear dilemmas: 

<font color='blue'>*Do I have to go to an expensive school to be successful in life (does education cause earnings)?*</font>

<font color='blue'>*Does the public healthcare system increase life expectancy?*</font>


Unfortunately for Machine Learning, we can't rely on correlation-type predictions to answer causal questions. Answering this kind of question is more challenging than most people appreciate. Your teachers have probably repeated that "association is not causation" and "association is not causation." This is what this course is all about. 



<br/><br/>
## What is Causality?

To understand causality, let's first revisit the fundamentals of **statistical inference** and then contrast it with **causal inference**.

<br/><br/>
### Traditional Statistical Inference Paradigm

Statistics summarize a population/set/observation into a distribution based on samples drawn from that population. Remember that we cannot derive causal claims from observational data alone.

Causal inference is the scientific process in which **cause-and-effect** relationships are inferred from observational data, but only after assuming a **causal model** that drives the relationships between random variables. 

We used an analogy proposed initially by [Judea Pearl, 2016](http://bayes.cs.ucla.edu/jsm-august2016-bw.pdf) and later used by [Camilo Hurtado, 2017](https://repositorio.unal.edu.co/handle/unal/59495) to better explain the causal inference. 

- We assume an unknown, invariant, and true data generating process, $M$, generates a set of observed random variables (data), $D$, and associated multivariate probability distribution, $P$. 
- The target of scientific inquiry in traditional statistical analysis is a probabilistic quantity, $Q(P)$, which summarizes some attribute of $D$ that is of our interest.
- $Q(P)$ can be estimated from $P$ and $D$ alone. 

![img](img/ch1/Stat_Paradigm.png)


However, causal analysis is different from statistical analysis. Causal inference is interested in an external **intervention (treatment)** effect on the causal system $M$ when experimental conditions change. 

- This **intervention** acts as a specific modification to the data-generating model $M$, leading to an **unobserved (counterfactual) set of data $D'$ and a distribution $P'$**. This change is known as the **causal effect of an intervention**. 
- In other words, it is the changes in the data generating process $M$ that generate hypothetical (unobserved) $D'$ and $P'$. 
- Then, a causal target parameter $Q(P')$ is computed, which summarizes the causal effect of the given intervention (or treatment). 

![img](img/ch1/Causal_Paradigm.png)


The challenge: 

- The problem is that we only have access to $D$ and therefore $P$ in observational studies, while $D'$ and $P'$ remain unknown. Therefore, $D$ or $P$ alone cannot answer the causal quantity of interest. 
- That is why we use a set of (un)testable causal assumptions to estimate $Q(P')$ from $D$ and $P$.
- With these assumptions at hand; we can mathematically express $Q(P')$ in terms of both $D$ and $P$, leaving $D'$ and $P'$ out.


<br/><br/>
### Causality is Beyond Statistics

- Causal inference requires extra information. There is nothing in the distribution of the data alone that tells us how it should change when conditions change.
- To make causal inferences we must make **assumptions** about the processes that generated the data. These are not statistical assumptions.
- **Causal assumptions** come from the expertise and previous experience of the researcher.
- **Causal questions** are questions about what happens when we change the way data are generated.


<br/><br/>
To summarize, we have two schools of thought for inference:

- **Associational Inference:** it includes any relationship that can be defined in terms of a joint distribution of observed variables
    - Correlation, conditional independence, dependence, likelihood, confidence level…
    - Testable in principle

- **Causation Inference:** it includes any relationship that cannot be defined in terms of joint distribution alone
    - Randomization, confounding, mediation, attribution, effect, …
    - Not testable in principle (without experimental control)
    - we can only test them if we can intervene on the system and see what happens. Even if we can intervene, there are some complications. But there are some tricks…
    - Capturing the patterns in the data we capture the principle patterns of the process that generates the data.
    
![img](img/ch1/DataGeneration.gif)


<br/><br/>
### Causality Ladder

In the [Book of Why](http://bayes.cs.ucla.edu/WHY/), Judea Pearl suggested the **Ladder of Causation**, which represents three levels of causality with different organisms at each level. 

* Most animals and present-day learning machines are on the first level, learning from the association. 
* Tool users, such as early humans, are on the second level if they act by planning and not merely by imitation. We can also use experiments to learn the effects of interventions, and presumably, this is how babies acquire much of their causal knowledge. 
* Finally, on the top level, counterfactual learners can imagine worlds that do not exist and infer reasons for observed phenomena. 

![img](img/ch1/Causal_Ladder.png)

<br/><br/>


## The Danger of Oversimplification

In our quest to understand causality, it's crucial to address a common pitfall: the temptation to oversimplify.

* **Human Intuition vs. Complex Reality**: While simple models are more appealing to human intuition, they often fail to capture the intricacies of complex systems. This is precisely why we rely on statistics and advanced causal inference methods.
  
* **The Need for Justified Models**: Before adopting simpler models, we must ensure they are justified by expert knowledge or thorough analysis. Simplicity should not come at the cost of accuracy or completeness.
  
* **Misinformation Risk**: Oversimplification can lead to misinformation. In causal inference, this is particularly dangerous as it may result in incorrect conclusions about cause-effect relationships.
  
* **Balancing Simplicity and Accuracy**: The challenge lies in finding the right balance between model simplicity and accurate representation of causal relationships. This balance is crucial for both understanding and practical application.

Oversimplification in causal inference can lead to misleading conclusions. Always strive for models that are as simple as possible, but no simpler than the complexity of the system under study requires.


Further Reading and Media on Oversimplification:

**Book**: [Thinking, Fast and Slow](https://www.goodreads.com/book/show/11468377-thinking-fast-and-slow) by Kahneman, D., 2011, explores various cognitive biases, including our tendency to prefer simple explanations over complex ones.

**Podcast**: The "[How we transferred our biases into our machines and what we can do about it](https://youarenotsosmart.com/2017/11/20/yanss-115-how-we-transferred-our-biases-into-our-machines-and-what-we-can-do-about-it/)" episode from the podcast "You Are Not So Smart".
This episode discusses how our tendency to oversimplify can lead to biases in machine learning and AI systems.




## Causal Inference in Business: The Promise of Causal AI

As we explore the principles of causal inference, its applications within business contexts stand out as particularly transformative. The intersection of causality and artificial intelligence, known as Causal AI, promises to reshape how businesses approach decision-making and solve complex, multidimensional problems. Traditional analytics often rely on correlations, but Causal AI enables businesses to pinpoint cause-and-effect relationships, thereby driving more informed and effective decisions. Thsi section inspired by the [Causal Artificial Intelligence](https://www.oreilly.com/library/view/causal-artificial-intelligence/9781394184132/) book by Hurwitz & Thompson, 2023.


### The Limitations of Traditional Data Analysis
Over the past decade, the prevailing trend in data science has been the mantra of "more data is better." Organizations invested heavily in data collection, assuming that the sheer volume of information would inevitably lead to more profound insights and improvements in business performance. However, this data-centric approach has its limitations:

* **Correlation vs. Causation**: While traditional analysis can identify correlations, it often struggles to differentiate them from true causal relationships.
* **Data Overload**: Simply gathering more data does not guarantee better insights, especially if the underlying business questions are not well-posed.
* **Lack of Context**: Data-driven approaches often miss the critical contextual knowledge that domain experts can provide, leading to incomplete or misleading conclusions.

### The Causal AI Advantage
Causal AI offers a solution to these limitations by combining causal inference techniques with the predictive power of AI models. The key benefits of Causal AI for businesses include:

* **Understanding 'Why'**: Instead of merely forecasting what may happen, Causal AI seeks to uncover why events occur, leading to interventions that target root causes rather than symptoms.
* **Enhanced Decision Making**: By understanding causal relationships, organizations can make better decisions about resource allocation and strategic planning.
* **Robust Predictions**: Causal models tend to be more resilient to changes in external conditions, making them particularly useful in dynamic and uncertain business environments.

### Practical Applications of Causal AI in Business
Causal AI has practical uses across various business functions:

* **Supply Chain Optimization**: By identifying the true causal factors behind supply chain disruptions, businesses can build more resilient and efficient operations.
* **Marketing Attribution**: Understanding which marketing channels truly drive consumer behavior helps companies optimize their marketing expenditures.
* **Product Development**: Causal AI helps to identify which features of a product directly contribute to customer satisfaction and long-term success.
* **Risk Management**: Businesses can use Causal AI to understand how different risk factors interact, leading to more effective risk mitigation strategies.

<font color='blue'>What do you think?</font>


## Challenges and Considerations
Despite its potential, implementing Causal AI in business comes with several challenges:

* **Data Requirements**: Effective causal inference often requires data that businesses may not yet be collecting, such as longitudinal or experimental data.
* **Interdisciplinary Collaboration**: Successful Causal AI projects rely on collaboration between data scientists, domain experts, and business strategists, highlighting the need for a multi-disciplinary approach.


By embracing Causal AI, businesses can unlock new efficiencies, foster innovation, and provide a competitive edge in the rapidly changing business environment.


## Acknowledgement

Most of the ideas in this chapter are taken from Judea Pearl Books. 

* [Causality, 2nd Edition](http://bayes.cs.ucla.edu/BOOK-2K/)
* [The Book of Why](http://bayes.cs.ucla.edu/WHY/)

We also like to reference the open-source book on causality by Matheus Facure Alves. He did a great job in explaining causal concepts with examples and fuuny memes.

* [Causal Inference for The Brave and True](https://matheusfacure.github.io/python-causality-handbook/landing-page.html)

[Ilya Shpitser](https://www.cs.jhu.edu/~ilyas/) from JHU also did a great job with his causal inference course. 