# 🧠 CampusPulse – Task 1:Level 1 - Variable Identification Protocol

This notebook performs Exploratory Data Analysis (EDA) to reverse-engineer the meaning of three anonymized features: `Feature_1`, `Feature_2`, and `Feature_3`.

##  1. Load Dataset

In [None]:
import pandas as pd

# Load dataset
file_path = 'Dataset.csv'  # Make sure this file is in the same folder

# Load the data
df = pd.read_csv(file_path)
df.head()

##  2. Distribution of Anonymized Features

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

features = ['Feature_1', 'Feature_2', 'Feature_3']

sns.set(style="whitegrid")
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

for i, feature in enumerate(features):
    sns.histplot(df[feature], kde=True, ax=axes[i])
    axes[i].set_title(f'Distribution of {feature}')
    axes[i].set_xlabel(feature)

plt.tight_layout()
plt.show()

##  3. Correlation Analysis

In [None]:
# Compute and visualize correlations
corr_matrix = df.corr(numeric_only=True)

# Focused correlation heatmap for Feature_1, Feature_2, Feature_3
import numpy as np
selected_corrs = corr_matrix[['Feature_1', 'Feature_2', 'Feature_3']].sort_values(by='Feature_1', ascending=False)

plt.figure(figsize=(10, 8))
sns.heatmap(selected_corrs, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation with Anonymized Features')
plt.show()

##  4. Visual Exploration

In [None]:
fig, axes = plt.subplots(3, 2, figsize=(14, 14))

# Feature_1 vs G3
sns.scatterplot(x='Feature_1', y='G3', data=df, ax=axes[0, 0])
axes[0, 0].set_title('Feature_1 vs G3 (Grades)')

# Feature_2 vs Goout
sns.violinplot(x='Feature_2', y='goout', data=df, ax=axes[1, 0])
axes[1, 0].set_title('Feature_2 vs Goout (Social Activity)')

# Feature_2 vs Dalc
sns.violinplot(x='Feature_2', y='Dalc', data=df, ax=axes[1, 1])
axes[1, 1].set_title('Feature_2 vs Weekday Alcohol Use')

# Feature_3 vs Health
sns.boxplot(x='Feature_3', y='health', data=df, ax=axes[2, 0])
axes[2, 0].set_title('Feature_3 vs Health')

# Feature_3 vs G3
sns.scatterplot(x='Feature_3', y='G3', data=df, ax=axes[2, 1])
axes[2, 1].set_title('Feature_3 vs G3 (Grades)')

# Feature_1 vs Failures
sns.boxplot(x='Feature_1', y='failures', data=df, ax=axes[0, 1])
axes[0, 1].set_title('Feature_1 vs Failures')

plt.tight_layout()
plt.show()

##  5. Final Feature Interpretations


| Feature     | Inferred Meaning               | Justification                                                                 |
|-------------|--------------------------------|-------------------------------------------------------------------------------|
| `Feature_1` | **Screen Time / Distraction**  | Negatively correlated with grades; aligns with common media use patterns.     |
| `Feature_2` | **Study Time / Academic Focus**| Strong positive correlation with grades; inversely related to social activity.|
| `Feature_3` | **Socializing / Hanging Out**  | Positively tied to goout and alcohol use; negatively related to grades.       |
