### Measures of Central Tendency & Dispersion


#### 1️⃣ Measures of Central Tendency (describe the center of a dataset)

| **Measure** | **Definition**                        | **Use Case**                          | **Python Function**           |
| ----------- | ------------------------------------- | ------------------------------------- | ----------------------------- |
| **Mean**    | Arithmetic average of the data values | Normally distributed data             | `np.mean()` / `df.mean()`     |
| **Median**  | Middle value when data is sorted      | Skewed data or when outliers exist    | `np.median()` / `df.median()` |
| **Mode**    | Most frequently occurring value       | Categorical or repeated discrete data | `stats.mode()` / `df.mode()`  |


#### 2️⃣ Measures of Dispersion (describe the spread or variability)
| **Measure**                   | **Definition**                                               | **Use Case**                                     | **Python Function**                |
| ----------------------------- | ------------------------------------------------------------ | ------------------------------------------------ | ---------------------------------- |
| **Range**                     | Difference between maximum and minimum values                | Quick measure of spread                          | `np.ptp()` / `df.max() - df.min()` |
| **Variance**                  | Average of squared deviations from the mean                  | Foundation for standard deviation, in ML metrics | `np.var()` / `df.var()`            |
| **Standard Deviation**        | Square root of variance, measures how spread out numbers are | Widely used in ML, z-scores, normal distribution | `np.std()` / `df.std()`            |
| **Interquartile Range (IQR)** | Difference between Q3 and Q1 (middle 50% of data)            | Robust to outliers, used in box plots            | `stats.iqr()`                      |



In [None]:
import numpy as np
import pandas as pd
from scipy import stats

data = [10, 20, 20, 30, 40, 1000]

# Central Tendency
mean = np.mean(data)           # 36.67 ## Mean means the average of the data
median = np.median(data)       # 25.0 ## Median is the middle value when data is sorted
mode = stats.mode(data).mode     # 20 ## Mode is the most frequently occurring value

# Dispersion
range_val = np.ptp(data)       # 90
std_dev = np.std(data)         # ~28.14
variance = np.var(data)        # ~792.22
iqr = stats.iqr(data)          # 20.0

print(f"Mean: {mean}, Median: {median}, Mode: {mode}")
print(f"Range: {range_val}, Std Dev: {std_dev}, Variance: {variance}, IQR: {iqr}")


Mean: 186.66666666666666, Median: 25.0, Mode: 20
Range: 990, Std Dev: 363.8528396053669, Variance: 132388.88888888888, IQR: 17.5


#### 🎯 Interview Questions and Answers
| **Question**                                                     | **Answer**                                                                           |
| ---------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
| What is the difference between mean and median?                  | Mean is sensitive to outliers, while median is a robust measure of central tendency. |
| Why is standard deviation preferred over variance in some cases? | Because it's in the same units as the original data, making it easier to interpret.  |
| When would you use the median instead of the mean?               | When data is skewed or contains outliers.                                            |
| What is IQR and why is it useful?                                | IQR = Q3 - Q1, helps understand spread while being resistant to outliers.            |
| Can standard deviation be negative?                              | No. It is always non-negative because it's a square root of a squared quantity.      |
| What is the limitation of range as a dispersion measure?         | It only considers two values (max, min) and ignores data distribution.               |


#### 📌 Summary Table
| **Metric**         | **Use For**                          | **Sensitive to Outliers?** |
| ------------------ | ------------------------------------ | -------------------------- |
| Mean               | Normally distributed data            | ✅ Yes                      |
| Median             | Skewed data or outliers              | ❌ No                       |
| Mode               | Categorical/discrete repetitive data | ❌ No                       |
| Standard Deviation | Spread of normally distributed data  | ✅ Yes                      |
| IQR                | Robust spread measure                | ❌ No                       |


####  Theory: Descriptive vs. Inferential Statistics
| **Aspect**         | **Descriptive Statistics**                                                       | **Inferential Statistics**                                                |
| ------------------ | -------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
| **Definition**     | Summarizes and presents data in a meaningful way                                 | Draws conclusions and makes predictions from a sample to a population     |
| **Objective**      | Organize, summarize, and describe the features of a dataset                      | Make generalizations, test hypotheses, and estimate population parameters |
| **Data Scope**     | Works with the **entire dataset (population or sample)**                         | Works with **samples** to infer about a **larger population**             |
| **Key Techniques** | Measures of central tendency (mean, median, mode), dispersion (std, IQR), charts | Hypothesis testing, confidence intervals, regression, correlation         |
| **Output**         | Tables, charts, summary metrics                                                  | Probabilities, confidence levels, p-values, statistical significance      |
| **Use Case**       | Reporting sales summaries, average user session time, survey results             | Predicting election outcomes, estimating population means, A/B testing    |


#### 🧪 Python Examples
| **Statistic Type** | **Code Snippet**                                                                             | **Explanation**                          |
| ------------------ | -------------------------------------------------------------------------------------------- | ---------------------------------------- |
| **Descriptive**    | `python<br>import numpy as np<br>data = [10, 20, 30, 40, 50]<br>np.mean(data), np.std(data)` | Computes mean and standard deviation     |
| **Inferential**    | `python<br>from scipy import stats<br>stats.ttest_1samp(data, popmean=35)`                   | Performs one-sample t-test (inferential) |


#### 🎯 Interview Questions and Answers
| **Question**                                               | **Answer**                                                                                                   |
| ---------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ |
| What is the primary goal of descriptive statistics?        | To summarize and describe the features of a dataset.                                                         |
| What is inferential statistics used for?                   | To make generalizations or predictions about a population based on sample data.                              |
| Give an example of descriptive vs. inferential statistics. | Descriptive: Average score in a class. Inferential: Predicting national exam results from that class sample. |
| Why is sampling important in inferential statistics?       | Because it allows us to estimate characteristics of a large population without studying every member.        |
| What is a confidence interval?                             | A range of values, derived from sample data, likely to contain the true population parameter.                |
| Name a few methods used in inferential statistics.         | T-tests, chi-square tests, ANOVA, regression analysis, confidence intervals                                  |


#### 📌 Summary Table
| **Criteria**     | **Descriptive**           | **Inferential**                              |
| ---------------- | ------------------------- | -------------------------------------------- |
| Data Scope       | Whole dataset             | Sample to population                         |
| Focus            | Summarization             | Generalization and prediction                |
| Tools/Techniques | Mean, median, std, charts | Hypothesis testing, p-values, CI, regression |
| Output Type      | Concrete values           | Probabilistic inferences                     |
| Use in ML / AI   | Data preprocessing, EDA   | Model evaluation, A/B testing                |
