# 📓 Data Science Homework: Descriptive Stats & Inference
Name: *Your Name Here*  
Date: *Submission Date Here*

This notebook covers Descriptive Statistics and Inference based on simulated study data. The dataset explores the relationship between study hours and exam scores.

## 🔢 Section 1: Descriptive Statistical Analysis
We'll calculate measures of central tendency and spread.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from sklearn.utils import resample

sns.set(style="darkgrid")

# Load the dataset
df = pd.read_csv("study_data.csv")
df.head()

In [None]:
# Central Tendency
print("Mean:\n", df.mean())
print("\nMedian:\n", df.median())
print("\nMode:\n", df.mode().iloc[0])

# Spread
print("\nStandard Deviation:\n", df.std())
print("\nVariance:\n", df.var())
print("\nIQR:\n", df.quantile(0.75) - df.quantile(0.25))

## 📊 Section 2: Visualizations
We visualize the distribution and relationships within the data.

In [None]:
# Histogram
df.hist(bins=10, figsize=(10, 4))
plt.suptitle("Histograms of Study Hours & Exam Scores")
plt.show()

# KDE
df.plot(kind='kde', title='KDE of Study Hours & Exam Scores')
plt.show()

# Boxplot
plt.figure(figsize=(8,4))
sns.boxplot(data=df)
plt.title("Boxplots")
plt.show()

# Correlation Heatmap
plt.figure(figsize=(6, 4))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()

## 🧪 Section 3: Bootstrap Confidence Interval
We calculate a 95% confidence interval for the mean exam score using bootstrapping.

In [None]:
bootstrap_means = []

for _ in range(1000):
    sample = resample(df['Exam Score'], replace=True, n_samples=len(df))
    bootstrap_means.append(sample.mean())

conf_int = np.percentile(bootstrap_means, [2.5, 97.5])
print(f"95% Bootstrap Confidence Interval for Exam Score Mean: {conf_int}")

## 🧠 Section 4: Hypothesis Test
We test whether students who studied more than 5 hours scored significantly higher.

In [None]:
# Define groups
group1 = df[df['Study Hours'] > 5]['Exam Score']
group2 = df[df['Study Hours'] <= 5]['Exam Score']

# Hypothesis Test
t_stat, p_val = stats.ttest_ind(group1, group2, equal_var=False)

print("T-statistic:", t_stat)
print("P-value:", p_val)

if p_val < 0.05:
    print("Reject the null hypothesis: Students who studied >5 hours scored significantly higher.")
else:
    print("Fail to reject the null: No significant difference found.")

## ✅ Conclusion
- The data showed a generally positive correlation between study hours and exam scores.
- The bootstrap confidence interval helped estimate the average score more robustly.
- Hypothesis testing supported the claim that more study hours may lead to higher scores.

*End of analysis.*