## Weekly Behavioral Feature Engineering

This notebook transforms event-level LMS data into weekly behavioral snapshots.
Weekly aggregation improves interpretability and enables reliable modeling
even with small learner cohorts.

In [None]:
# Import required libraries for data analysis
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Load CSV files into Pandas DataFrames

courses = pd.read_csv('/content/drive/MyDrive/Lumera data/lms_courses.csv')
activity = pd.read_csv('/content/drive/MyDrive/Lumera data/lms_lesson_activity.csv')
users = pd.read_csv('/content/new_learners.csv.crdownload')
modules = pd.read_csv('/content/drive/MyDrive/Lumera data/lms_modules.csv')
quiz = pd.read_csv('/content/drive/MyDrive/Lumera data/lms_quiz_attempts.csv')

In [None]:
#Keep only rows where role == 'learner'
learners_only = users[users['role'].str.lower() == 'learner'].copy()

print(f"Remaining rows: {len(learners_only)}")

Remaining rows: 39


In [None]:
# Create a week identifier from activity timestamps
activity["event_time"] = pd.to_datetime(activity["event_time"])
activity["week"] = activity["event_time"].dt.to_period("W").astype(str)
quiz["attempt_date"] = pd.to_datetime(quiz["attempt_date"])
quiz["week"] = quiz["attempt_date"].dt.to_period("W").astype(str)

In [None]:
weekly_activity = (
    activity
    .groupby(["user_id", "week"])
    .agg(
        sessions=("activity_id", "count"),
         avg_progress=("progress ", "mean"),
        modules_completed=("status", lambda x: (x == "completed").sum())
    )
    .reset_index()
)

Weekly engagement metrics capture how often learners interact with the platform,
how far they progress, and whether they complete learning modules.

In [None]:
# Aggregate weekly quiz performance per learner
quiz["attempt_date"] = pd.to_datetime(quiz["attempt_date"])
quiz["week"] = quiz["attempt_date"].dt.to_period("W").astype(str)
weekly_quiz = (
    quiz
    .groupby(["user_id", "week"])
    .agg(
        avg_quiz_score=("score", "mean"),
        quiz_attempts=("attempt_id", "count")
    )
    .reset_index()
)

Weekly quiz features reflect learning effectiveness rather than simple activity.

In [None]:
# Check actual columns in lesson activity data
print(activity.columns.tolist())

['activity_id', 'user_id', 'module_id', 'status', 'progress ', 'event_time', 'week']


In [None]:
# Merge weekly activity and quiz data
weekly_features = pd.merge(
    weekly_activity,
    weekly_quiz,
    on=["user_id", "week"],
    how="left"
).fillna(0)

In [None]:
## Estimate behavioral confidence score without surveys
weekly_features["confidence_score"] = (
       0.4 * (weekly_features["avg_progress"] / 100) + 0.4 * (weekly_features["avg_quiz_score"] / 100) +
    0.2 * (weekly_features["modules_completed"] > 0).astype(int)
)

Confidence is estimated using observable behavior rather than self-reported surveys.
This ensures objectivity and consistency across learners.

In [None]:
# Export weekly behavioral features for machine learning and web dashboards
weekly_features.to_csv("weekly_behavioral_features.csv", index=False)

The exported dataset serves as the single source of truth for
machine learning models and web-based analytics dashboards.

In [None]:
print (weekly_features.columns.tolist())

['user_id', 'week', 'sessions', 'avg_progress', 'modules_completed', 'avg_quiz_score', 'quiz_attempts', 'confidence_score']


In [None]:
weekly_features

Unnamed: 0,user_id,week,sessions,avg_progress,modules_completed,avg_quiz_score,quiz_attempts,confidence_score
0,U001,2026-02-02/2026-02-08,2,49.1150,0,73.240,1.0,0.48942
1,U001,2026-02-09/2026-02-15,1,70.5200,1,0.000,0.0,0.48208
2,U001,2026-02-16/2026-02-22,1,31.6000,1,0.000,0.0,0.32640
3,U002,2026-02-02/2026-02-08,2,60.5250,0,43.770,1.0,0.41718
4,U002,2026-02-09/2026-02-15,1,87.9500,0,0.000,0.0,0.35180
...,...,...,...,...,...,...,...,...
92,U049,2026-02-02/2026-02-08,1,79.6600,0,56.370,1.0,0.54412
93,U049,2026-02-09/2026-02-15,1,61.0800,1,42.725,4.0,0.61522
94,U050,2026-02-02/2026-02-08,2,65.8350,0,0.000,0.0,0.26334
95,U050,2026-02-09/2026-02-15,4,48.6725,2,42.495,2.0,0.56467
