# Descriptive Statistics in Pandas
**`10-statistics.ipynb`**

Pandas makes it easy to calculate **statistical measures** for data analysis.  
We’ll explore methods like mean, median, mode, variance, standard deviation, correlation, and more.

---



## Step 1: Import Libraries

In [1]:
import pandas as pd
import numpy as np



---


## Step 2: Create a Sample DataFrame

In [2]:
data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Age": [25, 30, 28, 35, 40],
    "Salary": [50000, 60000, 55000, 65000, 70000],
    "Experience": [1, 3, 4, 6, 8]
}

df = pd.DataFrame(data)
print(df)


      Name  Age  Salary  Experience
0    Alice   25   50000           1
1      Bob   30   60000           3
2  Charlie   28   55000           4
3    David   35   65000           6
4      Eva   40   70000           8




---


## Step 3: Quick Overview

In [3]:

# Summary statistics for numerical columns
print(df.describe())

# Include categorical columns too
print(df.describe(include='all'))



            Age       Salary  Experience
count   5.00000      5.00000    5.000000
mean   31.60000  60000.00000    4.400000
std     5.94138   7905.69415    2.701851
min    25.00000  50000.00000    1.000000
25%    28.00000  55000.00000    3.000000
50%    30.00000  60000.00000    4.000000
75%    35.00000  65000.00000    6.000000
max    40.00000  70000.00000    8.000000
         Name       Age       Salary  Experience
count       5   5.00000      5.00000    5.000000
unique      5       NaN          NaN         NaN
top     Alice       NaN          NaN         NaN
freq        1       NaN          NaN         NaN
mean      NaN  31.60000  60000.00000    4.400000
std       NaN   5.94138   7905.69415    2.701851
min       NaN  25.00000  50000.00000    1.000000
25%       NaN  28.00000  55000.00000    3.000000
50%       NaN  30.00000  60000.00000    4.000000
75%       NaN  35.00000  65000.00000    6.000000
max       NaN  40.00000  70000.00000    8.000000


---



## Step 4: Measures of Central Tendency



In [4]:
# Mean
print("Average Age:", df['Age'].mean())
print("Average Salary:", df['Salary'].mean())

# Median
print("Median Age:", df['Age'].median())

# Mode
print("Most common Experience:", df['Experience'].mode()[0])


Average Age: 31.6
Average Salary: 60000.0
Median Age: 30.0
Most common Experience: 1



---


## Step 5: Measures of Dispersion


In [5]:
# Variance
print("Variance in Salary:", df['Salary'].var())

# Standard Deviation
print("Standard Deviation in Salary:", df['Salary'].std())

# Range (max - min)
salary_range = df['Salary'].max() - df['Salary'].min()
print("Range of Salary:", salary_range)

Variance in Salary: 62500000.0
Standard Deviation in Salary: 7905.694150420948
Range of Salary: 20000



---



## Step 6: Correlation and Covariance



In [6]:

# Correlation (relationship between variables)
print(df.corr(numeric_only=True))

# Covariance
print(df.cov(numeric_only=True))

                 Age    Salary  Experience
Age         1.000000  0.984656    0.962453
Salary      0.984656  1.000000    0.936329
Experience  0.962453  0.936329    1.000000
                 Age      Salary  Experience
Age            35.30     46250.0       15.45
Salary      46250.00  62500000.0    20000.00
Experience     15.45     20000.0        7.30



---


## Step 7: Value Counts & Unique Values

In [7]:
# Frequency of values in categorical data
print(df['Name'].value_counts())

# Unique values
print("Unique Ages:", df['Age'].unique())


Name
Alice      1
Bob        1
Charlie    1
David      1
Eva        1
Name: count, dtype: int64
Unique Ages: [25 30 28 35 40]



---



## Step 8: Apply Statistical Functions Across Rows/Columns

In [8]:
# Column-wise mean
print(df.mean(numeric_only=True))

# Row-wise sum
print(df.sum(axis=1, numeric_only=True))


Age              31.6
Salary        60000.0
Experience        4.4
dtype: float64
0    50026
1    60033
2    55032
3    65041
4    70048
dtype: int64



---



## Step 9: Custom Aggregations

In [9]:
# Aggregate multiple functions
print(df.agg({
    "Age": ["min", "max", "mean"],
    "Salary": ["min", "max", "median"]
}))


         Age   Salary
min     25.0  50000.0
max     40.0  70000.0
mean    31.6      NaN
median   NaN  60000.0



---

## Step 10: Summary

* `describe()` → quick summary
* Central tendency: **mean, median, mode**
* Dispersion: **variance, std, range**
* Relationships: **correlation, covariance**
* Frequency analysis: **value\_counts, unique**
* Custom aggregations with **agg()**

These tools are essential for **exploratory data analysis (EDA)**.

---