# 📘 01 - Descriptive Statistics

🔹 **Objective**: Explain how to summarize and describe the main features of a dataset.

## 🔍 Introduction to Descriptive Statistics

Descriptive statistics is the branch of statistics that deals with summarizing and organizing data so it can be easily understood.

### What is Descriptive Statistics?
Descriptive statistics involves techniques such as:
- Measures of central tendency (mean, median, mode)
- Measures of dispersion (range, variance, standard deviation)
- Graphical representations (histograms, box plots)

### Descriptive vs Inferential Statistics
- **Descriptive**: Summarize or describe the characteristics of a dataset.
- **Inferential**: Use a sample of data to make inferences or generalizations about a population.

## 📏 Measures of Central Tendency
We describe data using:
- **Mean**: The average
- **Median**: The middle value
- **Mode**: The most frequent value

We will use a sample salary dataset to explore these concepts.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Sample dataset
np.random.seed(42)
salaries = pd.Series(np.append(np.random.normal(50000, 8000, 95), [100000, 120000, 130000, 150000, 200000]))

print("Mean:", salaries.mean())
print("Median:", salaries.median())
print("Mode:", salaries.mode().values)

### Handling Skewed Data
In a skewed distribution, the mean is pulled in the direction of the skew.
- Right skew (positive): Mean > Median
- Left skew (negative): Mean < Median

## 📊 Measures of Dispersion
- **Range**: Difference between max and min
- **Variance**: Average squared deviation from the mean
- **Standard Deviation**: Square root of variance
- **Interquartile Range (IQR)**: Difference between Q3 and Q1

In [None]:
print("Range:", salaries.max() - salaries.min())
print("Variance:", salaries.var())
print("Standard Deviation:", salaries.std())
print("IQR:", salaries.quantile(0.75) - salaries.quantile(0.25))

## 📈 Visualizing Distributions
We’ll use boxplots, histograms, and violin plots to visually understand distribution and dispersion.

In [None]:
# Histogram
sns.histplot(salaries, kde=False)
plt.title("Histogram of Salaries")
plt.xlabel("Salary")
plt.show()

In [None]:
# Boxplot
sns.boxplot(x=salaries)
plt.title("Boxplot of Salaries")
plt.show()

In [None]:
# Violin plot
sns.violinplot(x=salaries)
plt.title("Violin Plot of Salaries")
plt.show()

In [None]:
# KDE plot
sns.kdeplot(salaries)
plt.title("KDE Plot of Salaries")
plt.xlabel("Salary")
plt.show()

## 🧬 Shape of Distributions
- **Skewness**: Measure of asymmetry
- **Kurtosis**: Measure of tailedness
- Use these metrics to assess outliers and normality.

In [None]:
print("Skewness:", salaries.skew())
print("Kurtosis:", salaries.kurt())

## 📋 Summary Functions with pandas

In [None]:
salaries.describe()

## 🧪 Practice Exercises
1. Create a new Series of student scores and calculate all measures discussed above.
2. Visualize it with histogram and boxplot.
3. Check for skewness and interpret.