# Practical: A & B

This notebook contains solutions for Practical A and B based on the given mystery_data.csv dataset.

In [None]:
# Import required libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import ttest_1samp

## Part A: Data Preparation

In [None]:
# Load the data without headers
df = pd.read_csv("mystery_data.csv", header=None)

# Rename columns
df.columns = ["ID", "Gender", "Math", "Science", "StudyHours"]

# Display first few rows
df.head()

**Justification for Column Names:**
- `ID`: Unique identifier for each student
- `Gender`: Categorical values (e.g., M/F)
- `Math`, `Science`: Subject scores
- `StudyHours`: Weekly study hours (numeric)

In [None]:
# Remove rows where StudyHours is missing or invalid
df = df[pd.to_numeric(df["StudyHours"], errors='coerce').notnull()]
df["StudyHours"] = df["StudyHours"].astype(float)

# Create TotalScore column
df["TotalScore"] = df["Math"] + df["Science"]

In [None]:
# Print top 5 students based on total score
df.sort_values(by="TotalScore", ascending=False).head(5)

## Part B: Hypothesis Testing

In [None]:
# Filter students with StudyHours > 10
study_gt_10 = df[df["StudyHours"] > 10]

# Perform one-sample t-test on Math scores
t_stat, p_value = ttest_1samp(study_gt_10["Math"], 70)

print(f"T-statistic: {t_stat}, P-value: {p_value}")
if p_value < 0.05:
    print("Reject H0: Students who study more than 10 hours score significantly higher in Math.")
else:
    print("Fail to reject H0: No significant evidence that students who study more than 10 hours score higher.")

## Part B: Visualization - Gender vs Total Score

In [None]:
# Boxplot for Total Score by Gender
sns.boxplot(x="Gender", y="TotalScore", data=df)
plt.title("Total Score Distribution by Gender")
plt.xlabel("Gender")
plt.ylabel("Total Score")
plt.show()

**Insights:**
1. The median total score for one gender appears higher, indicating a potential performance difference.
2. Score distribution is more spread for one gender, suggesting greater variability in performance.