# Descriptive Statistics

## **1. Measures of Central Tendency**
Central tendency provides an estimate of the center of a dataset.

### **Mean**
- **Definition**: The average of all data points.
- **Formula**:  
  $ \text{Mean} (\mu) = \frac{\sum_{i=1}^n x_i}{n} $  
  where $x_i$ are the data points, and $n$ is the total number of points.
- **Example**:  
  Data: [2, 4, 6, 8]  
  Mean = $(2 + 4 + 6 + 8)/4 = 5$

---

### **Median**
- **Definition**: The middle value of the data when sorted.
  - If $n$ is odd: Median = middle value.
  - If $n$ is even: Median = average of the two middle values.
- **Example**:  
  Data: [2, 4, 6, 8, 10]  
  Median = $6$  
  Data: [2, 4, 6, 8]  
  Median = $(4 + 6)/2 = 5$

---

### **Mode**
- **Definition**: The most frequently occurring value(s) in the dataset.
- **Example**:  
  Data: [2, 2, 3, 4, 4, 4, 5]  
  Mode = $4$ (appears most often).

---

## **2. Measures of Dispersion**
Dispersion measures how spread out the data is.

### **Range**
- **Definition**: The difference between the maximum and minimum values.
- **Formula**:  
  $ \text{Range} = \text{Max} - \text{Min} $
- **Example**:  
  Data: [2, 4, 6, 8, 10]  
  Range = $10 - 2 = 8$

---

### **Variance**
- **Definition**: The average of squared deviations from the mean.
- **Formula**:  
  $ \text{Variance} (\sigma^2) = \frac{\sum_{i=1}^n (x_i - \mu)^2}{n} $
- **Example**:  
  Data: [2, 4, 6]  
  Mean = $4$  
  Variance = $((2-4)^2 + (4-4)^2 + (6-4)^2)/3 = (4 + 0 + 4)/3 = 2.67$

---

### **Standard Deviation**
- **Definition**: The square root of variance; measures average deviation from the mean.
- **Formula**:  
  $ \text{Standard Deviation} (\sigma) = \sqrt{\text{Variance}} $
- **Example**:  
  Data: [2, 4, 6]  
  Variance = $2.67$  
  Standard Deviation = $\sqrt{2.67} \approx 1.63$

---

### **Interquartile Range (IQR)**
- **Definition**: The range of the middle 50% of the data.
- **Formula**:  
  $ \text{IQR} = Q_3 - Q_1 $  
  where:
  - $Q_1$ = 25th percentile (lower quartile)
  - $Q_3$ = 75th percentile (upper quartile)
- **Example**:  
  Data: [1, 3, 5, 7, 9, 11, 13]  
  $Q_1 = 3$, $Q_3 = 11$  
  IQR = $11 - 3 = 8$

---

## **3. Distribution Shapes**
### **Skewness**
- **Definition**: Measures asymmetry of the distribution.
  - Positive Skew: Tail is longer on the right.
  - Negative Skew: Tail is longer on the left.
  - Zero Skew: Symmetric distribution.
- **Example**:  
  - Positively skewed: Income data (few high values).  
  - Negatively skewed: Exam scores (most high, few low).

---

### **Kurtosis**
- **Definition**: Measures the "tailedness" of the distribution.
  - High Kurtosis: Heavy tails (outliers likely).
  - Low Kurtosis: Light tails (outliers unlikely).
  - Normal Kurtosis: Similar to a normal distribution.
- **Example**:  
  - High kurtosis: Stock market returns.  
  - Low kurtosis: Heights of people.

---





# **Variance and Standard Deviation**

Variance and standard deviation are two of the most commonly used measures of dispersion. They quantify how spread out data is around the mean.

---

## **Variance**

### **Definition**
Variance measures the average squared difference between each data point and the mean. It gives a sense of how much the values in a dataset deviate from the mean **in square units**.

### **Formula**
$ \text{Variance} (\sigma^2) = \frac{\sum_{i=1}^n (x_i - \mu)^2}{n} $

where:
- $x_i$: Each data point  
- $\mu$: Mean of the data  
- $n$: Total number of data points  

### **Importance**
1. **Understanding Spread**: Variance tells us whether the data points are closely clustered or widely spread around the mean.  
2. **Foundation for Standard Deviation**: Variance is the foundation for calculating the standard deviation, which is more interpretable in real-world terms.

---

## **Standard Deviation**

### **Definition**
Standard deviation is the square root of the variance. It measures the average distance of data points from the mean **in the same units as the data**.

### **Formula**
$ \text{Standard Deviation} (\sigma) = \sqrt{\text{Variance}} $

### **Importance**
1. **Intuitive Interpretation**: Since it has the same units as the data, it is easier to understand and relate to real-world scenarios.  
2. **Comparisons Across Datasets**: Standard deviation helps compare the spread of different datasets with the same units.  
3. **Understanding Variability**: It indicates how much variation or "risk" is inherent in the dataset.

---

## **Key Differences Between Variance and Standard Deviation**
| **Aspect**           | **Variance**                             | **Standard Deviation**                 |
|-----------------------|------------------------------------------|-----------------------------------------|
| **Definition**        | Average of squared deviations from mean | Square root of the variance             |
| **Units**             | Squared units of data                   | Same as the data units                  |
| **Interpretability**  | Less intuitive                          | More intuitive and interpretable        |
| **Use**               | Theoretical or intermediate calculation | Final measure of spread for interpretation |

---

## **Real-World Examples**

### **Example 1: Test Scores**
- Imagine a class of students with test scores: [70, 75, 80, 85, 90].
- Mean = $80$.
- Variance: Measures how much each score differs **on average in squared terms** from $80$.  
  $ \text{Variance} = \frac{(70-80)^2 + (75-80)^2 + (80-80)^2 + (85-80)^2 + (90-80)^2}{5} = 50 $
- Standard Deviation: The square root of variance.  
  $ \text{Standard Deviation} = \sqrt{50} \approx 7.07 $
- **Interpretation**: On average, the test scores vary by about $7.07$ points from the mean score of $80$.

---

### **Example 2: Stock Market Returns**
- Suppose you track the daily returns of two stocks:
  - Stock A: Returns are consistently between $1\%$ and $3\%$.
  - Stock B: Returns vary widely from $-5\%$ to $10\%$.
- Variance and Standard Deviation:
  - Stock A has low variance and standard deviation, indicating low risk and stable returns.
  - Stock B has high variance and standard deviation, indicating higher risk and volatility.
- **Decision**: If you want stable returns, Stock A is better. If you are willing to take risks for potentially higher returns, choose Stock B.

---

### **Example 3: Heights of People**
- Group A: Heights of students in a classroom.
  - Mean height = $165$ cm.
  - Standard deviation = $5$ cm.
  - Interpretation: Most students are within $160$-$170$ cm (one standard deviation from the mean).
- Group B: Heights of players in a basketball team.
  - Mean height = $190$ cm.
  - Standard deviation = $12$ cm.
  - Interpretation: Basketball players' heights are more varied, with a wider range.

---

## **Why Are Variance and Standard Deviation Important?**
1. **Data Analysis**: They provide insights into the distribution and variability of data, which is crucial for decision-making.  
2. **Risk Assessment**: In finance, standard deviation helps measure investment risk.  
3. **Quality Control**: In manufacturing, low variance in product dimensions ensures consistency and quality.  
4. **Predictions and Models**: In machine learning, understanding variability helps refine predictive models.

### **Summary**
- Variance helps quantify the overall spread but is less interpretable due to squared units.  
- Standard deviation, with its same-unit measure, is the go-to metric for understanding data variability.  
- Together, they are fundamental tools for descriptive statistics and real-world problem-solving.



[![Data Analyst Home](images/DA.png)](data_analyst/start_analytics_math.ipynb)

[![Data Science Home](images/DS.png)](data_science/start_science_math.ipynb)