# Day-8 Of 
# <b>#100 Days of Machine Learning</b>
---

# 🔹Statistics Fundamentals - Mean, Median, Mode, Variance, and Standard Deviation

## 🔸1. Measures of Central Tendency
### These measures help us understand the "center" or typical value of a dataset.

## 1.1 Mean (Average)
### Definition: The mean is the sum of all values in a dataset divided by the total number of values.

### Formula:
### Mean = (Sum of all the observations/Total number of observations)
μ= $∑i=1 xi / n$
### where:

- μ (mu) represents the population mean (often estimated by the sample mean, denoted as  
x
ˉ
 )
- xi represents each individual value in the dataset
- n represents the total number of values in the dataset
- ∑ denotes the sum

In [11]:
import numpy as np
from statistics import mean, median, mode, variance, stdev

In [12]:
data = [4, 8, 6, 5, 3, 8, 9, 7, 8]
print("Mean:", mean(data))

Mean: 6.444444444444445


In [13]:
data2 = [15, 20, 35, 40, 50]
print("Mean:", mean(data2))

Mean: 32


## 1.2 Median
### Definition: The median is the middle value in a dataset that is ordered from least to greatest. If there's an even number of data points, the median is the average of the two middle values.   
## Steps to find the median:

- Sort the dataset in ascending order.
- If the number of data points (n) is odd, the median is the value at the $((n+1)/2)th$ position.
- If the number of data points (n) is even, the median is the average of the values at the $(n/2)th$ and $((n/2)+1)th$ positions.
### Example:
### Using the same quiz scores: [78, 85, 88, 90, 92] (sorted)
### The middle value is 88.

### Consider a dataset with an even number of values: [78, 85, 88, 90]
### The two middle values are 85 and 88.
### Median = (85+88)/2 = 86.5

In [14]:
data = [4, 8, 6, 5, 3, 8, 9, 7, 8]
print("Median:", median(data))

Median: 7


In [15]:
print("Median:", median(data2))

Median: 35


## 1.3 Mode
### Definition: The mode is the value that appears most frequently in a dataset. A dataset can have no mode (if all values are unique), one mode (unimodal), or multiple modes (bimodal, trimodal, etc.).   
## Example:
### Consider the dataset of the number of siblings of students: [1, 2, 0, 1, 3, 2, 1]
### The number 1 appears most frequently (3 times). Therefore, the mode is 1.

In [16]:
data = [4, 8, 6, 5, 3, 8, 9, 7, 8]
print("Mode:", mode(data))

Mode: 8


In [17]:
data2 = [15, 20, 15, 40, 50]
try:
    print("Mode:", mode(data2))
except:
    print("Mode: No mode found (all values are unique)")

Mode: 15


## 🔸2. Measures of Dispersion
### These measures describe the spread or variability of the data around the central tendency.
## 2.1 Variance
### Definition: Variance measures how far each data point is from the mean. It's the average of the squared differences from the mean. Squaring the differences ensures that all deviations (positive and negative) contribute positively to the measure.
### Formula for Population Variance:
### $σ^2 = ∑i=1(xi−μ)^2 / N$
#### where:
### - σ^2 (sigma squared) represents the population variance
### - N represents the size of the population
### Formula for Sample Variance (Bessel's Correction):
### $s^2 = ∑i=1(xi - x)^2 / n−1$
#### where:
### - s^2 represents the sample variance
### - n represents the size of the sample
### We use n−1 in the denominator for sample variance to provide an unbiased estimate of the population variance.

In [18]:
data = [4, 8, 6, 5, 3, 8, 9, 7, 8]
print("Variance:", variance(data))

Variance: 4.277777777777778


In [19]:
data2 = [15, 20, 35, 40, 50]
print("Variance:", variance(data2))

Variance: 207.5


## 2.2 Standard Deviation
### Definition: Standard deviation is the square root of the variance. It provides a measure of the spread of the data in the original units, making it easier to interpret than variance. ### Formula for Population Standard Deviation:
#### $σ = σ^2 = (∑i=1(xi − μ)^2 / N)^1/2$
### Formula for Sample Standard Deviation:
#### $s =  s^2 = (∑i=1(xi − x)^2/n−1)^1/2$

In [20]:
data = [4, 8, 6, 5, 3, 8, 9, 7, 8]
print("Standard Deviation:", stdev(data))

Standard Deviation: 2.068278940998476


In [21]:
data2 = [15, 20, 35, 40, 50]
print("Standard Deviation:", stdev(data2))

Standard Deviation: 14.404860290887934


## 📌 Key Takeaways
- Use mean for general trend
- Use median when data has outliers
- Use mode for most frequent item
- Variance and standard deviation show data spread