# Content

[Statistics Intro: Mean, Median, & Mode](#statistics-intro-mean-median--mode)

# Statistics Intro: Mean, Median, & Mode

These three measures are fundamental concepts in descriptive statistics. They are known as **measures of central tendency**, which means they try to describe the "center" or "typical" value of a dataset.

## Part 1: Theory with Examples

---

### 1. Mean (Average)

**Definition:**
The mean is the sum of all values in a dataset divided by the number of values. It's what most people commonly refer to as the "average."

**Formula:**
Mean = (Sum of all values) / (Number of values)
Or, using mathematical notation:
μ (for population mean) or x̄ (for sample mean) = Σx / n
Where:
*   Σ (sigma) represents the sum of
*   x represents each individual value in the dataset
*   n represents the total number of values

**When to Use:**
*   When the data is numerical and symmetrically distributed (e.g., follows a normal distribution).
*   When you want a measure that incorporates every value in the dataset.

**When to be Cautious:**
*   The mean is sensitive to **outliers** (extreme values). A single very high or very low value can significantly skew the mean, making it less representative of the typical value.

**Example:**
Consider the dataset: `[2, 4, 6, 8, 10]`

1.  **Sum of all values:** 2 + 4 + 6 + 8 + 10 = 30
2.  **Number of values:** 5
3.  **Mean:** 30 / 5 = 6

**Example with an Outlier:**
Consider the dataset: `[2, 4, 6, 8, 100]` (100 is an outlier)

1.  **Sum of all values:** 2 + 4 + 6 + 8 + 100 = 120
2.  **Number of values:** 5
3.  **Mean:** 120 / 5 = 24
    *Notice how the mean (24) is pulled towards the outlier (100) and isn't very representative of the other values (2, 4, 6, 8).*

---

### 2. Median

**Definition:**
The median is the middle value in a dataset that has been sorted in ascending (or descending) order. It divides the dataset into two equal halves.

**How to Find:**
1.  **Sort the data:** Arrange all values from smallest to largest.
2.  **Find the middle value:**
    *   **If the number of values (n) is odd:** The median is the single middle value. The position is `(n+1)/2`.
    *   **If the number of values (n) is even:** The median is the average of the two middle values. The positions are `n/2` and `(n/2) + 1`.

**When to Use:**
*   When the data is numerical and might be skewed or contain outliers. The median is **robust** to outliers.
*   For ordinal data (data that can be ranked).

**Example (Odd number of values):**
Dataset: `[7, 2, 10, 5, 3]`

1.  **Sort the data:** `[2, 3, 5, 7, 10]`
2.  **Number of values (n):** 5 (odd)
3.  **Position of median:** (5 + 1) / 2 = 3rd value
4.  **Median:** 5

**Example (Even number of values):**
Dataset: `[7, 2, 10, 5, 3, 12]`

1.  **Sort the data:** `[2, 3, 5, 7, 10, 12]`
2.  **Number of values (n):** 6 (even)
3.  **Positions of middle values:**
    *   `n/2` = 6/2 = 3rd value (which is 5)
    *   `(n/2) + 1` = (6/2) + 1 = 4th value (which is 7)
4.  **Median:** (5 + 7) / 2 = 12 / 2 = 6

**Example with an Outlier (revisiting the mean's outlier example):**
Dataset: `[2, 4, 6, 8, 100]`

1.  **Sort the data:** `[2, 4, 6, 8, 100]`
2.  **Number of values (n):** 5 (odd)
3.  **Median:** 6
    *Notice how the median (6) is much more representative of the typical non-outlier values compared to the mean (24) in this case.*

---

### 3. Mode

**Definition:**
The mode is the value that appears most frequently in a dataset. A dataset can have:
*   **No mode:** If all values appear with the same frequency (e.g., each value appears only once).
*   **One mode (unimodal):** If one value appears more frequently than any other.
*   **Multiple modes (bimodal, trimodal, multimodal):** If two or more values appear with the same highest frequency.

**When to Use:**
*   For categorical data (e.g., colors, types of cars).
*   For numerical data to identify the most common value(s).
*   It's the only measure of central tendency that can be used with nominal data (data that can be named or labeled but not ordered or measured).

**Example (Unimodal):**
Dataset: `[1, 2, 2, 3, 4, 4, 4, 5]`
*   The value `4` appears 3 times, which is more than any other value.
*   **Mode:** 4

**Example (Bimodal):**
Dataset: `[1, 2, 2, 3, 3, 4, 5]`
*   The values `2` and `3` both appear 2 times, which is the highest frequency.
*   **Modes:** 2, 3

**Example (No Mode):**
Dataset: `[1, 2, 3, 4, 5]`
*   Each value appears only once.
*   **Mode:** No mode (or sometimes all values are considered modes, depending on convention, but "no mode" is common).

---

### Summary Comparison

| Feature                 | Mean                                  | Median                                     | Mode                                       |
| :---------------------- | :------------------------------------ | :----------------------------------------- | :----------------------------------------- |
| **Definition**          | Sum of values / Number of values      | Middle value of a sorted dataset           | Most frequent value(s)                     |
| **Calculation**         | Arithmetic                            | Sorting & identifying middle               | Counting frequencies                       |
| **Data Type**           | Numerical                             | Numerical, Ordinal                         | Numerical, Ordinal, Categorical (Nominal)  |
| **Sensitivity to Outliers** | High                                  | Low (Robust)                               | Low (Generally)                            |
| **Uniqueness**          | Always unique                         | Always unique                              | Can have no mode, one, or multiple modes   |
| **Uses all data?**      | Yes                                   | No (primarily position-based)              | No (primarily frequency-based)             |

---

## Part 2: Python Code Example with Illustrations

We'll use Python with common libraries like `numpy` for numerical operations (mean, median) and `scipy.stats` for statistical functions (mode). We'll also use `matplotlib` for a simple visualization.