# Descriptive Statistics Notebook

### **Explanation of Topics:**

#### 1. **Statistics Moments**
   - **Mean**: The average value of the dataset.
   - **Variance**: Measure of how much values in the dataset differ from the mean.
   - **Skewness**: A measure of asymmetry of the distribution. Positive skew means a tail on the right.
   - **Kurtosis**: Measures the "tailedness" of the distribution. Higher kurtosis means more outliers.

#### 2. **Measure of Central Tendency**
   - **Mean**: The arithmetic average of the data points.
   - **Median**: The middle value that separates the higher half from the lower half of the dataset.
   - **Mode**: The value that appears most frequently in the dataset.

#### 3. **Measure of Dispersion**
   - **Range**: The difference between the maximum and minimum values.
   - **Standard Deviation**: How much data points deviate from the mean.
   - **Interquartile Range (IQR)**: The difference between the 75th percentile and the 25th percentile. It helps understand the spread of the middle 50% of the data.

#### 4. **Percentile and Quantile**
   - **Percentile**: A measure that indicates the value below which a given percentage of observations fall. For example, the 25th percentile (Q1) means 25% of data points are below this value.
   - **Quantile**: Similar to percentiles but expressed as fractions of the dataset. Quartiles are specific quantiles (0.25, 0.5, 0.75).

#### 5. **5-Number Summary**
   - A statistical summary that consists of:
     - **Min**: The smallest value in the dataset.
     - **Q1**: The 25th percentile (1st Quartile).
     - **Median**: The 50th percentile (2nd Quartile).
     - **Q3**: The 75th percentile (3rd Quartile).
     - **Max**: The largest value in the dataset.

### Usage Instructions:
1. **Run the code** to compute various descriptive statistics and visualize the data distribution.
2. Modify the `data` variable to analyze your own dataset.
3. Customize the plots and visualizations for further analysis if needed.

In [1]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt

# Sample data for demonstration
data = np.random.normal(loc=50, scale=10, size=1000)  # Random normal data with mean=50, std=10

# 1. Statistics Moments: Mean, Variance, Skewness, Kurtosis
## Mean
mean_value = np.mean(data)
print("Mean:", mean_value)

## Variance
variance_value = np.var(data)
print("Variance:", variance_value)

## Skewness
skewness_value = stats.skew(data)
print("Skewness:", skewness_value)

## Kurtosis
kurtosis_value = stats.kurtosis(data)
print("Kurtosis:", kurtosis_value)

# 2. Measure of Central Tendency
## Mean (already computed)
## Median
median_value = np.median(data)
print("Median:", median_value)

## Mode
mode_value = stats.mode(data)
print("Mode:", mode_value.mode[0])

# 3. Measure of Dispersion
## Range
range_value = np.ptp(data)  # Peak-to-peak (Max - Min)
print("Range:", range_value)

## Standard Deviation
std_dev = np.std(data)
print("Standard Deviation:", std_dev)

## Interquartile Range (IQR)
Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
IQR = Q3 - Q1
print("Interquartile Range (IQR):", IQR)

# 4. Percentile and Quantile
## Percentiles
percentile_25 = np.percentile(data, 25)
percentile_50 = np.percentile(data, 50)
percentile_75 = np.percentile(data, 75)

print("25th Percentile (Q1):", percentile_25)
print("50th Percentile (Median):", percentile_50)
print("75th Percentile (Q3):", percentile_75)

## Quantiles
quantiles = np.quantile(data, [0.25, 0.5, 0.75])
print("Quantiles:", quantiles)

# 5. 5-Number Summary
min_value = np.min(data)
max_value = np.max(data)
Q1 = np.percentile(data, 25)
median = np.median(data)
Q3 = np.percentile(data, 75)

five_number_summary = {
    "Min": min_value,
    "Q1 (25th Percentile)": Q1,
    "Median": median,
    "Q3 (75th Percentile)": Q3,
    "Max": max_value
}
print("5-Number Summary:", five_number_summary)

# Optional: Visualizing the data distribution
plt.hist(data, bins=30, color='blue', alpha=0.7)
plt.title('Histogram of Data')
plt.axvline(mean_value, color='red', label='Mean')
plt.axvline(median_value, color='green', label='Median')
plt.axvline(mode_value.mode[0], color='purple', label='Mode')
plt.legend()
plt.show()

Mean: 49.43526115464168
Variance: 102.79998776954453
Skewness: 0.12485866844908776
Kurtosis: 0.06591586378605907
Median: 49.24031077609418


IndexError: invalid index to scalar variable.