# Statistics & Data Analysis with Python

## Project Overview
This project demonstrates how to perform **Descriptive** and **Inferential Statistics** using Python. It covers data loading, cleaning, summary statistics, visualization, and hypothesis testing.

## Tech Stack
- **Data Manipulation**: Pandas, NumPy
- **Visualization**: Matplotlib, Seaborn
- **Statistics**: SciPy

---


In [None]:
# Install required packages (run once)
!pip install -q pandas numpy matplotlib seaborn scipy


## 1. Load & Inspect Data
We'll create a synthetic dataset to simulate customer data (Age, Income, Satisfaction Score, etc.).


In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as stats

# Create synthetic data
np.random.seed(42)
data = {
    'ID': range(1, 101),
    'Age': np.random.randint(18, 65, 100),
    'Income': np.random.normal(50000, 15000, 100).round(2),
    'Satisfaction_Score': np.random.randint(1, 11, 100),
    'Gender': np.random.choice(['Male', 'Female'], 100),
    'Region': np.random.choice(['North', 'South', 'East', 'West'], 100),
    'Purchases_Last_Year': np.random.randint(1, 20, 100)
}

df = pd.DataFrame(data)
df.head()

## 2. Descriptive Statistics
Summarize the central tendency, dispersion, and shape of the dataset's distribution.


In [None]:
# Summary statistics for numerical columns
display(df.describe())

# Summary statistics for categorical columns
display(df.describe(include='object'))

# Calculate Skewness of Income
skewness = df['Income'].skew()
print(f"Income Skewness: {skewness:.2f}")

## 3. Data Visualization
Visualizing distributions and relationships.


In [None]:
# Histogram of Income
plt.figure(figsize=(8, 5))
sns.histplot(df['Income'], kde=True, color='skyblue')
plt.title(f'Distribution of Income (Skewness: {skewness:.2f})')
plt.xlabel('Income')
plt.show()

In [None]:
# Boxplot of Income by Gender
plt.figure(figsize=(8, 5))
sns.boxplot(x='Gender', y='Income', data=df, palette='pastel')
plt.title('Income Distribution by Gender')
plt.show()

## 4. Inferential Statistics (Hypothesis Testing)
**Test:** Is there a significant difference in average Income between Male and Female customers?
- **Null Hypothesis (H0):** No difference in mean income.
- **Alternative Hypothesis (H1):** Difference exists.


In [None]:
male_income = df[df['Gender'] == 'Male']['Income']
female_income = df[df['Gender'] == 'Female']['Income']

# T-test (assuming equal variance for simplicity)
t_stat, p_value = stats.ttest_ind(male_income, female_income)

print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Reject Null Hypothesis: Significant difference in income found.")
else:
    print("Fail to Reject Null Hypothesis: No significant difference in income found.")

## Conclusion
This notebook covered:
1.  **Data Generation**: Creating a realistic synthetic dataset.
2.  **Descriptive Stats**: Understanding data distribution (mean, std, skewness).
3.  **Visualization**: Using histograms and boxplots to spot trends.
4.  **Hypothesis Testing**: Using a T-test to compare groups statistically.
