# Activity 2: Student Scoring System (Rule-Based Intelligence)

---

## Objective

Convert raw student data into interpretable performance scores. That is easily explanable and can be acted upon

This notebook:
- Computes APS, WWS, PTMS, CRS, and SRI
- Classifies Students into Green / Blue / Yellow / Red categories
- Validates scores using representative student samples

In [1]:
import os 
from pathlib import Path


In [2]:
os.chdir(Path.cwd().parent)
os.getcwd()

'd:\\Desktop\\1DSML\\Internship\\HePro code'

In [3]:
import pandas as pd

from src.scoring.academic_score import academic_score
from src.scoring.wellness_score import wellness_score
from src.scoring.productivity_score import productivity_time_management_score
from src.scoring.career_score import career_score
from src.scoring.sri import compute_sri, classify_risk

## A: Load Raw Dataset 

In [16]:
df = pd.read_csv('data\\raw\\students.csv')
df.head()

Unnamed: 0,student_id,age,program,semester,gpa,attendance,assignment_completion,engagement_score,stress_level,career_clarity,sleep_hours,mental_wellbeing,productivity_score,distractions,skill_readiness
0,S001,20,B.Tech,7,8.5,44.2,93.3,35.9,6,5,5.6,4,7,2,4
1,S002,20,B.Tech,2,8.7,35.5,90.8,34.2,5,7,6.0,6,7,3,3
2,S003,19,MBA,4,8.4,31.8,83.2,43.8,5,5,6.0,4,7,2,4
3,S004,18,BCA,3,8.5,25.4,82.0,43.4,6,5,5.6,5,6,5,3
4,S005,19,BCA,2,9.4,41.3,93.3,35.1,6,6,5.6,5,7,4,3


## B: Data Overview

In [17]:
df.shape

(200, 15)

In [18]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 15 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   student_id             200 non-null    object 
 1   age                    200 non-null    int64  
 2   program                200 non-null    object 
 3   semester               200 non-null    int64  
 4   gpa                    200 non-null    float64
 5   attendance             200 non-null    float64
 6   assignment_completion  200 non-null    float64
 7   engagement_score       200 non-null    float64
 8   stress_level           200 non-null    int64  
 9   career_clarity         200 non-null    int64  
 10  sleep_hours            200 non-null    float64
 11  mental_wellbeing       200 non-null    int64  
 12  productivity_score     200 non-null    int64  
 13  distractions           200 non-null    int64  
 14  skill_readiness        200 non-null    int64  
dtypes: flo

## C: Computing Individual Score

In [19]:
df['APS'] = academic_score(df)
df['WWS'] = wellness_score(df)
df['PTMS'] = productivity_time_management_score(df)
df['CRS'] = career_score(df)

df[['APS', 'WWS', 'PTMS', 'CRS']].head()

Unnamed: 0,APS,WWS,PTMS,CRS
0,79.33,44.8,74.0,45.0
1,77.84,56.0,70.0,50.0
2,73.32,50.0,74.0,45.0
3,72.18,47.8,56.0,40.0
4,83.25,47.8,66.0,45.0


## D: Compute SRI & Risk Category

In [20]:
df['SRI'] = compute_sri(df)
df['risk_category'] = classify_risk(df['SRI'])

df[['SRI', 'risk_category']].head()

Unnamed: 0,SRI,risk_category
0,61.05,Yellow
1,63.85,Yellow
2,60.55,Yellow
3,54.8,Yellow
4,61.38,Yellow


## E: Risk Distribution Check

In [21]:
df['risk_category'].value_counts()

risk_category
Yellow    96
Red       60
Blue      43
Green      1
Name: count, dtype: int64

This distribution confirms that the scoring system produces a healthy spread
of students across risk categories, enabling meaningful mentor prioritization.


## F: Validation with Sample Students 

### High Stress -> Low WWS

In [22]:
df.sort_values("stress_level", ascending=False)[
    ["stress_level", "sleep_hours", "mental_wellbeing", "WWS"]].head()

Unnamed: 0,stress_level,sleep_hours,mental_wellbeing,WWS
50,9,4.4,1,20.2
74,9,4.4,1,20.2
66,9,4.4,2,23.2
59,9,4.4,1,20.2
69,9,4.4,1,20.2


Students with higher stress and poor sleep show lower wellness scores.

### Low GPA + High Engagement -> Lower APS

In [23]:
df[(df["gpa"] < 6) & (df["engagement_score"] > 70)][
    ["gpa", "engagement_score", "APS"]].head()

Unnamed: 0,gpa,engagement_score,APS
42,5.4,88.9,61.98
43,5.8,86.0,65.52
44,5.9,88.2,64.09
46,5.2,88.3,61.22
48,5.6,76.8,61.19


### High GPA + Low Career Clarity -> Low CRS

In [24]:
df[(df["gpa"] > 8) & (df["career_clarity"] < 5)][
    ["gpa", "career_clarity", "CRS"]
].head()


Unnamed: 0,gpa,career_clarity,CRS
80,9.1,4,50.0
82,8.8,4,50.0
83,8.3,4,50.0
84,9.2,4,50.0
85,9.5,3,40.0


In [25]:
df.head()

Unnamed: 0,student_id,age,program,semester,gpa,attendance,assignment_completion,engagement_score,stress_level,career_clarity,...,mental_wellbeing,productivity_score,distractions,skill_readiness,APS,WWS,PTMS,CRS,SRI,risk_category
0,S001,20,B.Tech,7,8.5,44.2,93.3,35.9,6,5,...,4,7,2,4,79.33,44.8,74.0,45.0,61.05,Yellow
1,S002,20,B.Tech,2,8.7,35.5,90.8,34.2,5,7,...,6,7,3,3,77.84,56.0,70.0,50.0,63.85,Yellow
2,S003,19,MBA,4,8.4,31.8,83.2,43.8,5,5,...,4,7,2,4,73.32,50.0,74.0,45.0,60.55,Yellow
3,S004,18,BCA,3,8.5,25.4,82.0,43.4,6,5,...,5,6,5,3,72.18,47.8,56.0,40.0,54.8,Yellow
4,S005,19,BCA,2,9.4,41.3,93.3,35.1,6,6,...,5,7,4,3,83.25,47.8,66.0,45.0,61.38,Yellow


In [27]:
df.to_csv("data/processed/students_scored.csv", index=False)
print("students_scored.csv saved successfully.")


students_scored.csv saved successfully.


## Key Observations

- Academic performance alone does not determine overall readiness.
- Students with strong academics but poor wellness or career clarity
  receive lower SRI scores, highlighting mentoring needs.
- Risk categorization enables mentors to prioritize interventions
  effectively and transparently.

This validates the scoring framework as interpretable, consistent,
and suitable for real mentoring workflows.
