# Probability, Likelihood, and Probability Distributions: A Conceptual Clarification

Probability, likelihood, and probability distributions are closely related mathematical objects, but they play fundamentally different roles in statistical reasoning.  
The key distinction between them lies **not in the formula itself**—often the same expression appears in all three—but in **which quantities are treated as known, which are unknown, and what inference task is being performed**.

At a high level:

- **Probability** is about **prediction**:  
  *What outcomes should we expect if the model is already known?*

- **Likelihood** is about **inference**:  
  *Which model parameters best explain the data we have observed?*

- **Probability distributions** are **models**:  
  They describe the entire space of possible outcomes and how probability mass or density is allocated across that space.

---

## 1. Probability — Forward Reasoning (Prediction)

**Probability answers the question:**

> Given a model with known parameters, how likely is a particular outcome?

Mathematically, probability is written as

$$
P(x \mid \theta)
$$

where:

- $\theta$ represents **fixed, known model parameters**
- $x$ represents a **random outcome**

Here, randomness lies entirely in the data $x$, **not** in the parameters.

### Interpretation

- Probability measures the chance or long-run frequency with which an outcome would occur if the data-generating process were repeated many times.
- It is **normalized**:
  - $0 \le P(x \mid \theta) \le 1$
  - The total probability across all possible outcomes must sum (or integrate) to 1.

### Example

Given that a coin is fair ($\theta = 0.5$), what is the probability of observing 5 heads in 10 tosses?

This is a **predictive** question: the model is assumed correct, and we ask what data it produces.

---

## 2. Likelihood — Reverse Reasoning (Inference)

**Likelihood answers the question:**

> Given observed data, how plausible are different parameter values?

Likelihood uses the *same mathematical expression* as probability, but it is interpreted differently:

$$
L(\theta \mid x) = P(x \mid \theta)
$$

The crucial difference is **conceptual**:

- The data $x$ is now **fixed and observed**
- The parameters $\theta$ are treated as **variable and unknown**

### Interpretation

- Likelihood is **not** a probability of parameters.
- It is a **relative measure of support** that the observed data provides for different parameter values.
- Likelihoods:
  - Do **not** need to sum to 1
  - Can take values greater than 1
- What matters is **comparison**, not absolute scale:

> Which parameter value makes the observed data most plausible?

### Example

Given that we observed 5 heads in 10 tosses, how plausible is it that the coin’s bias is $0.5$ compared to $0.7$?

This is an **inferential** question: the data is known, and we reason backward toward the model.

---

## 3. Probability Distributions — The Generative Model

A **probability distribution** is a mathematical function that defines:

- The set of all possible outcomes
- The probability (or density) assigned to each outcome

Formally, a distribution specifies

$$
P(x \mid \theta)
$$

for **all possible values** of $x$.

### Role

Probability distributions are the **generative mechanisms** behind both probability and likelihood.

They describe how data would be generated if the parameters were known.

### Examples

- **Binomial distribution**: models counts of successes
- **Normal distribution**: models continuous variation around a mean
- **Poisson distribution**: models event counts in time or space

Without a probability distribution, **neither probability nor likelihood can be defined**.

---

## 4. Side-by-Side Conceptual Comparison

| Aspect | Probability | Likelihood |
|------|------------|------------|
| Fixed quantity | Parameters $\theta$ | Observed data $x$ |
| Variable quantity | Data $x$ | Parameters $\theta$ |
| Question asked | “What data should I expect?” | “Which parameters explain the data?” |
| Purpose | Prediction | Inference |
| Normalization | Must sum/integrate to 1 | No normalization required |
| Typical use | Simulation, forecasting | Estimation, model fitting |

---

## 5. The Core Insight

The most important idea to internalize is this:

> **Probability and likelihood are the same function viewed from opposite directions.**

- Probability looks **forward**:  
  $$ \text{model} \;\rightarrow\; \text{data} $$

- Likelihood looks **backward**:  
  $$ \text{data} \;\rightarrow\; \text{model} $$

Both are grounded in the same probability distribution, but they serve **entirely different epistemic purposes**.

---

## Final Summary

- **Probability** is used when the model is known and we want to predict outcomes.
- **Likelihood** is used when the data is known and we want to infer parameters.
- **Probability distributions** are the underlying generative structures that make both possible.

In short:

- Probability predicts data.  
- Likelihood evaluates models.  
- Distributions define the world in which both live.
