# 9. Experimental Design & Causality

While the previous notebooks focused on analyzing existing data, this one covers the principles of **designing experiments** to collect new data. Good experimental design is the foundation for making strong causal claims, moving beyond 'correlation does not imply causation' to understanding what *causes* an effect.

## 9.1 Randomized Controlled Trials (RCTs)

The **Randomized Controlled Trial (RCT)** is the gold standard for establishing causality.

**Key Components:**
- **Control Group:** A group that does not receive the treatment or intervention. They provide a baseline to compare against.
- **Treatment Group:** The group that receives the treatment.
- **Random Assignment:** Subjects are randomly assigned to either the control or treatment group. This is the most critical step. Randomization helps ensure that, on average, the two groups are identical in every way *except* for the treatment. This minimizes the effect of confounding variables.

**Example:** To test a new drug, you would randomly assign participants to receive either the new drug (treatment) or a placebo (control). By comparing the outcomes, you can isolate the effect of the drug itself.

## 9.2 Blocking and Stratification

Sometimes, you know that a certain characteristic will have a large effect on the outcome. To control for this, you can use blocking or stratification.

- **Blocking:** Group subjects into blocks based on a known confounding variable (e.g., age, gender). Then, randomize the treatment *within each block*. This ensures that the treatment and control groups are balanced with respect to that variable.

- **Stratification:** Similar to blocking, but used in the sampling phase. You divide the population into strata and sample from each to ensure representation.

## 9.3 Factorial Designs

A factorial design allows you to test the effect of **two or more** independent variables (factors) at the same time. It also allows you to test for **interaction effects**—where the effect of one factor depends on the level of another factor.

**Example:** To test a new fertilizer (`Factor A`: new vs. old) and a new watering schedule (`Factor B`: daily vs. weekly), you would have four groups:
1. Old Fertilizer, Daily Water
2. New Fertilizer, Daily Water
3. Old Fertilizer, Weekly Water
4. New Fertilizer, Weekly Water

This design lets you see the main effect of the fertilizer, the main effect of the watering schedule, and whether they have a combined, interactive effect.

## 9.4 Quasi-Experimental Design

In many real-world situations, a true RCT is not possible due to ethical or practical constraints (e.g., you can't randomly assign people to smoke). A **quasi-experiment** is similar to an RCT but lacks random assignment.

**Common Methods:**
- **Difference-in-Differences:** Compares the change in outcomes over time between a treatment group and a control group.
- **Regression Discontinuity:** Studies the effect of an intervention by looking at subjects just above and below a cutoff point.

These methods are more complex and require careful consideration of potential biases, but they are powerful tools when RCTs are not feasible.

## 9.5 Confounding and Bias

A **confounding variable** is a third variable that is related to both the treatment and the outcome, leading to a spurious association. Randomization is the best tool to combat confounding.

**Bias** is a systematic error that can creep into a study:
- **Selection Bias:** When the groups being compared are not similar (e.g., the treatment group is younger than the control group).
- **Observer Bias (or Experimenter Bias):** When the researcher's expectations influence the results (e.g., they treat one group differently). This can be mitigated by **blinding**, where the researcher doesn't know who is in which group.
- **Placebo Effect:** When subjects in the control group show improvement simply because they believe they are receiving a treatment. This is mitigated by using a **placebo** and **double-blinding**, where neither the subjects nor the researchers know who is receiving the real treatment.

## 9.6 Causal Inference Basics

**Causal inference** is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. The goal is to answer "what if" questions.

The key idea is to estimate the **counterfactual**: what would have happened to the treatment group if they had *not* received the treatment?

Since we can never observe the counterfactual directly, the goal of good experimental design (like RCTs) is to create a control group that is as close to a perfect counterfactual as possible.