## 📊 Phase 1 (Continued): Essential Math — Descriptive Statistics

**📘 1. Mean (Average)**

The mean is the total of all values divided by how many values.

📌 Formula:
    
    mean = (sum of all values) / (number of values)

- 🧠 Use in Data Science: Used in aggregations, loss functions (like MSE), center trend, etc.

🔢 Example:


In [18]:
salaries = [4000, 4500, 6000, 7000, 8000, 9000]
mean = sum(salaries) / len(salaries)
#    = (4000 + 4500 + 6000 + 7000 + 8000 + 9000) / 6
#    = 38500 / 6
#    = 6416.666666666667
#    = 6416.67 (rounded to 2 decimal places)

📌 Python:

In [19]:
print(f"Mean salary: {mean:.2f}")

Mean salary: 6416.67


** 📘 2. Median **

    The median is the middle value in a sorted list.
    If even number of values → average of middle two.

🔢 Example:

Sorted: [4000, 4500, 4700, 4700, 5000, 10000]

Middle values = 4700 and 4700

Median = (4700 + 4700) / 2 = 4700


**🧠 Why Median?**

Better than mean when there are outliers (like 10000 here).

In [20]:
import statistics
median_salary = statistics.median(salaries)
#    = 6500.0 (middle value of the sorted list [4000, 4500, 6000, 7000, 8000, 9000])
#    = (6000 + 7000) / 2
#    = 6500.0
#    = 6500.0 (rounded to 2 decimal places)
print(f"Median salary: {median_salary:.2f}")

Median salary: 6500.00


**📘 3. Mode**

    The mode is the most frequent value in a list.

🧠 Use Case: Useful in categorical data ("most used browser", "most common job title", etc.)

🔢 Example:

In [21]:
salaries = [4000, 4500, 4700, 4700, 5000, 10000]
# mode = 4700 (appears twice, more than any other number)

mode_salary = statistics.mode(salaries)
#    = 4700 (most common value in the list [4000, 4500, 4700, 4700, 5000, 10000])
print(f"Mode salary: {mode_salary:.2f}")

Mode salary: 4700.00


**📘 4. Range, Variance & Standard Deviation**

- 🔸 Range = Max - Min

        range = 10000 - 4000 = 6000

🔸 Variance: Measures how far values are from the mean (spread).


In [22]:
statistics.variance(salaries)

5005666.666666667

🔸 Standard Deviation: Square root of variance, easier to interpret (same unit as data)

🧠 Real Use: Tells us whether values are tightly packed or spread out.



In [23]:
statistics.stdev(salaries)

2237.3347238772

📘 5. Percentiles & Quartiles
These help understand distribution.

25th percentile (Q1): Lower quarter

50th percentile (Q2): Median

75th percentile (Q3): Upper quarter

🧠 Use: Used to detect outliers, calculate IQR, draw box plots, etc.

📌 Python (with NumPy):

In [24]:
import numpy as np

np.percentile(salaries, 25) #Q1 
np.percentile(salaries, 50) #Q2 (median)
np.percentile(salaries, 75) #Q3

4925.0

## Practice Time

In [None]:
data = [65, 70, 75, 80, 85, 90, 100]

mean = statistics.mean(data)
mode = statistics.mode(data)
median = statistics.median(data)
range_val = max(data) - min(data)  # renamed from `range` to `range_val` to avoid conflict
variance = statistics.variance(data)
stdev = statistics.stdev(data)

q1 = np.percentile(data, 25)
q2 = np.percentile(data, 50)
q3 = np.percentile(data, 75)

print(f"Mean: {mean:.2f}")
print(f"Mode: {mode:.2f}")
print(f"Median: {median:.2f}")
print(f"Range: {range:.2f}")
print(f"Variance: {variance:.2f}")
print(f"Standard Deviation: {stdev:.2f}")

print(f"Q1: {q1:.2f}")
print(f"Q2: {q2:.2f}")
print(f"Q3: {q3:.2f}")

Mean: 80.71
Mode: 65.00
Median: 80.00
Range: 35.00
Variance: 145.24
Standard Deviation: 12.05
Q1: 72.50
Q2: 80.00
Q3: 87.50
