# Notebook 02: Schema Mapping

Purpose:
- Map raw dataset columns to NEEL's unified schema
- Identify which columns populate ActivityLog, Outcome, and Context
- Ensure consistent representation across datasets

In [1]:
import pandas as pd

df_study = pd.read_csv("../data/raw/student_study_habits.csv")
df_habits = pd.read_csv("../data/raw/enhanced_student_habits_performance_dataset.csv")
df_time = pd.read_csv("../data/raw/Time Management and Productivity Insights.csv")

## Dataset: Student Study Habits

### ActivityLog Mapping

| Raw Column | NEEL Field | Category |
|----------|------------|----------|
| study_hours_per_week | duration_minutes | Academic |
| sleep_hours_per_day | duration_minutes | Health |
| extracurricular_Yes | activity_name | Leisure |
| part_time_job_Yes | activity_name | Work |

### Outcome Mapping

| Raw Column | NEEL Field |
|----------|------------|
| final_grade | exam_score |
| attendance_percentage | productivity_rating |

### Context (Not Directly Mapped)

- parental_education_*
- internet_access_Yes

## Dataset: Enhanced Student Habits & Performance

### ActivityLog Mapping

| Raw Column | NEEL Field | Category |
|----------|------------|----------|
| study_hours_per_day | duration_minutes | Academic |
| social_media_hours | duration_minutes | Leisure |
| netflix_hours | duration_minutes | Leisure |
| exercise_frequency | duration_minutes | Health |
| sleep_hours | duration_minutes | Health |
| screen_time | duration_minutes | Leisure |
| part_time_job | activity_name | Work |
| extracurricular_participation | activity_name | Leisure |

### Outcome Mapping

| Raw Column | NEEL Field |
|----------|------------|
| exam_score | exam_score |
| previous_gpa | historical_performance |
| time_management_score | productivity_rating |
| dropout_risk | risk_indicator |

### Context (Used Only for Reasoning)

- stress_level
- motivation_level
- mental_health_rating
- parental_support_level


## Dataset: Time Management & Productivity

### ActivityLog Mapping

| Raw Column | NEEL Field | Category |
|----------|------------|----------|
| Daily Work Hours | duration_minutes | Work |
| Daily Leisure Hours | duration_minutes | Leisure |
| Daily Exercise Minutes | duration_minutes | Health |
| Daily Sleep Hours | duration_minutes | Health |
| Commute Time (hours) | duration_minutes | Personal |
| Screen Time (hours) | duration_minutes | Leisure |

### Outcome Mapping

| Raw Column | NEEL Field |
|----------|------------|
| Productivity Score | productivity_rating |

### Context

- Age
- User ID


## Cross-Dataset Consistency Notes

- All datasets map cleanly into ActivityLog and Outcome entities
- Activity categories remain consistent across domains
- Contextual variables are excluded from direct ML training
- No dataset introduces a new schema entity