# NumPy: Mean, Median, and Mode Summary

This notebook summarizes the three measures of central tendency in statistics and how to calculate them using NumPy.

## 1. MEAN (Average)

**Definition**: The sum of all values divided by the count of values.

**Formula**: Mean = Sum / Count

### Method 1: Manual Calculation

In [1]:
import numpy as np

array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
mean_manual = np.sum(array) / len(array)
print(f"Manual Mean: {mean_manual}")  # = 45 / 9 = 5.0

Manual Mean: 5.0


### Method 2: Using np.mean()

In [2]:
import numpy as np

array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
mean_function = np.mean(array)
print(f"Using np.mean(): {mean_function}")  # = 5.0

Using np.mean(): 5.0


---

## 2. MEDIAN (Middle Value)

**Definition**: The middle value when data is sorted.

**Note**: For median, you must SORT the array first!

In [3]:
import numpy as np

array = np.array([9, 8, 6, 5, 7, 1, 3, 2, 4])
print(f"Original array: {array}")

# Sort the array first
sorted_array = np.sort(array)
print(f"Sorted array: {sorted_array}")

# Find median
median = np.median(sorted_array)
print(f"Median: {median}")  # Middle value is 5

Original array: [9 8 6 5 7 1 3 2 4]
Sorted array: [1 2 3 4 5 6 7 8 9]
Median: 5.0


---

## 3. MODE (Most Frequent Value)

**Definition**: The value that appears most frequently in the data.

**Note**: NumPy doesn't have a built-in mode() function. We need to use np.bincount() and np.where().

### What is np.bincount()?

Counts how many times each value appears in an array.

In [4]:
import numpy as np

array = np.array([1, 1, 2, 2, 2, 3, 3])
counts = np.bincount(array)
print(f"Array: {array}")
print(f"Counts: {counts}")
# Output: [0, 2, 3, 2]
# Index 0: appears 0 times (not in array)
# Index 1: appears 2 times
# Index 2: appears 3 times  ← Most frequent!
# Index 3: appears 2 times

Array: [1 1 2 2 2 3 3]
Counts: [0 2 3 2]


### What is np.argmax()?

Finds the INDEX (position) of the largest value in an array.

In [5]:
import numpy as np

counts = np.array([0, 2, 3, 2])
mode_index = np.argmax(counts)
print(f"Counts: {counts}")
print(f"Mode index: {mode_index}")  # 2
# Index 2 has the highest count (3)

Counts: [0 2 3 2]
Mode index: 2


### Finding Mode - Single Value

In [6]:
import numpy as np

array = np.array([1, 1, 2, 2, 2, 3, 3])
counts = np.bincount(array)
mode = np.argmax(counts)  # Find which value appears most
print(f"Array: {array}")
print(f"Mode: {mode}")  # Output: 2 (appears 3 times)

Array: [1 1 2 2 2 3 3]
Mode: 2


### What is np.where()?

Finds ALL indices where a condition is True.

**Syntax**: `np.where(condition)[0]`

In [7]:
import numpy as np

counts = np.array([0, 2, 4, 2, 4, 4, 4])
max_count = np.max(counts)
print(f"Counts: {counts}")
print(f"Max count: {max_count}")

# Step by step:
condition = counts == max_count
print(f"Condition (== 4): {condition}")

indices = np.where(condition)[0]
print(f"Indices where count == max: {indices}")  # [2, 4, 5, 6]

Counts: [0 2 4 2 4 4 4]
Max count: 4
Condition (== 4): [False False  True False  True  True  True]
Indices where count == max: [2 4 5 6]


### Finding Mode - Multiple Values (Tied Modes)

In [8]:
import numpy as np

array = np.array([9, 9, 9, 9, 8, 3, 4, 5, 2, 8, 2, 7, 2, 6, 7, 7, 4, 5, 5, 3, 1, 1, 5, 8, 8, 4, 6, 9, 7, 6, 5])
counts = np.bincount(array)
max_count = np.max(counts)
modes = np.where(counts == max_count)[0]

print(f"Array: {array}")
print(f"All modes: {modes}")
print(f"Frequency: {max_count}")
# Output shows values 5, 7, 8, 9 all appear 4 times (tied)

Array: [9 9 9 9 8 3 4 5 2 8 2 7 2 6 7 7 4 5 5 3 1 1 5 8 8 4 6 9 7 6 5]
All modes: [5 9]
Frequency: 5


### Alternative: Using SciPy stats.mode()

In [9]:
from scipy import stats
import numpy as np

array = np.array([9, 9, 9, 9, 8, 3, 4, 5, 2, 8, 2, 7, 2, 6, 7, 7, 4, 5, 5, 3, 1, 1, 5, 8, 8, 4, 6, 9, 7, 6, 5])
result = stats.mode(array)
print(result)  # Returns mode and count

ModeResult(mode=np.int64(5), count=np.int64(5))


---

## Summary Table

| Measure | Definition | When to Use | NumPy Function |
|---------|-----------|------------|----------------|
| **Mean** | Average of all values | Normally distributed data | `np.mean()` |
| **Median** | Middle value when sorted | Data with outliers | `np.median()` (sort first) |
| **Mode** | Most frequent value | Categorical data | `np.bincount()` + `np.argmax()` |

## Use Cases in Machine Learning

1. **Data Scaling**: Use mean and std for normalization
2. **Feature Analysis**: Calculate mean/median of features
3. **Outlier Detection**: Compare values to mean ± 3×std
4. **Categorical Features**: Use mode for missing data imputation