# Exploratory Data Analysis (EDA)
Student Performance Prediction Project  
This notebook explores the dataset to understand patterns between:

- Attendance  
- Marks  
- Engagement Score  
- Final Result (Pass/Fail)

We will check:
- Missing values
- Statistical summary
- Distributions
- Correlations
- Basic insights


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Display settings
pd.set_option("display.max_columns", None)

In [None]:
df = pd.read_csv("../data/student_performance.csv")
df.head()


In [None]:
df.info()


In [None]:
#checking for missing values
df.isnull().sum()


In [None]:
#statistical analysis
df.describe()

In [None]:
#distribution of numerical columns

plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
sns.histplot(df['attendance'], kde=True)
plt.title("Distribution of Attendance")

plt.subplot(1, 3, 2)
sns.histplot(df['marks'], kde=True)
plt.title("Distribution of Marks")

plt.subplot(1, 3, 3)
sns.histplot(df['engagement_score'], kde=True)
plt.title("Distribution of Engagement Score")

plt.tight_layout()
plt.show()


In [None]:
#countplot of final results

sns.countplot(data=df, x="final_result")
plt.title("Pass vs Fail Distribution")
plt.show()


In [None]:
#corelation heatmap

plt.figure(figsize=(8,5))
sns.heatmap(df.corr(), annot=True, cmap="Blues", linewidths=.5)
plt.title("Feature Correlation Heatmap")
plt.show()


In [None]:
#attemdance vs marks scatter plot

plt.figure(figsize=(6,5))
sns.scatterplot(data=df, x="attendance", y="marks", hue="final_result")
plt.title("Attendance vs Marks")
plt.show()


In [None]:
#engagement vsmarks

plt.figure(figsize=(6,5))
sns.scatterplot(data=df, x="engagement_score", y="marks", hue="final_result")
plt.title("Engagement Score vs Marks")
plt.show()


# Key Insights

### 1. Attendance
- Students with **higher attendance** tend to score higher.
- Most failed students show attendance < 65%.

### 2. Marks
- Marks show a strong separation between Pass and Fail categories.

### 3. Engagement Score
- Students with low engagement tend to have lower marks and lower pass rates.

### 4. Correlation
- Marks have the strongest correlation with Final Result.
- Attendance and Engagement also contribute meaningfully.

### Overall
The features selected (attendance, marks, engagement_score) are good predictors for ML models such as Logistic Regression.
