# Training the Model
This system has two brains:
1. **The Neural Brain (XGBoost):** It looks at all the data (`ESCS`, `ICTRES`, `BULLIED`, etc.) to calculate a raw probability of failure.
2. **The Symbolic Brain (Fuzzy Logic):** It applies your specific pedagogical rules (e.g., "Teacher Support mitigates Anxiety") to override or nuance that prediction.

## Install and Import necessary libraries

In [23]:
%pip install xgboost scikit-fuzzy

Collecting scikit-fuzzy
  Downloading scikit_fuzzy-0.5.0-py2.py3-none-any.whl.metadata (2.6 kB)
Downloading scikit_fuzzy-0.5.0-py2.py3-none-any.whl (920 kB)
   ---------------------------------------- 0.0/920.8 kB ? eta -:--:--
   ----------- ---------------------------- 262.1/920.8 kB ? eta -:--:--
   ---------------------------------------- 920.8/920.8 kB 4.0 MB/s eta 0:00:00
Installing collected packages: scikit-fuzzy
Successfully installed scikit-fuzzy-0.5.0
Note: you may need to restart the kernel to use updated packages.


In [24]:
import pandas as pd
import xgboost as xgb
import pickle
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
import numpy as np
import skfuzzy as fuzz
from skfuzzy import control as ctrl

In [17]:
ph_df = pd.read_pickle("ph_pisa_2022_filtered.pkl")
model_features = ['ANXMAT', 'BELONG', 'ESCS', 'TEACHSUP', 'BULLIED', 'ICTRES', 'Gender_Num']
targets = ['PV1MATH', 'PV1READ', 'PV1SCIE']

In [18]:
# Ensure Gender is numeric (1=Female, 2=Male)
if ph_df['ST004D01T'].dtype == 'O': # If it's string
    ph_df['Gender_Num'] = ph_df['ST004D01T'].map({'Female': 1, 'Male': 2})
else:
    ph_df['Gender_Num'] = ph_df['ST004D01T']

## Phase 1: Train the "Neural" Models (XGBoost)

### Define Critical Risk Thresholds (Global Level 2 Baseline)
0=Critical (<358), 1=High (358-420), 2=Moderate (420-482), 3=Low (>482)

Note: Reading/Science thresholds differ slightly, but for this prototype, we use the Math baseline as the general "Academic Risk" standard.

In [6]:
def get_risk_level(score):
    if score < 358: return 0      # Critical Risk
    elif score < 420: return 1    # High Risk
    elif score < 482: return 2    # Moderate Risk
    else: return 3                # Low Risk

### Training Loop (One Model per Subject)

In [19]:
models = {}

print(f"\n--- Training 3 Neuro-Symbolic Models ---")
for subject in targets:
    print(f"\nTraining Model for: {subject}...")
    
    # Prepare X and y
    X = ph_df[model_features]
    y = ph_df[subject].apply(get_risk_level)
    
    # Split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Train XGBoost
    # objective='multi:softprob' is CRITICAL. It gives us probabilities, not just labels.
    model = xgb.XGBClassifier(
        objective='multi:softprob',
        num_class=4,
        learning_rate=0.05,
        max_depth=6,
        n_estimators=150,
        eval_metric='mlogloss'
    )
    
    model.fit(X_train, y_train)
    
    # Evaluate
    acc = accuracy_score(y_test, model.predict(X_test))
    print(f"  -> Accuracy: {acc*100:.2f}%")
    
    # Save to dictionary
    models[subject] = model


--- Training 3 Neuro-Symbolic Models ---

Training Model for: PV1MATH...
  -> Accuracy: 61.92%

Training Model for: PV1READ...
  -> Accuracy: 63.45%

Training Model for: PV1SCIE...
  -> Accuracy: 61.50%


In [20]:
# Save the Brains to Disk
with open("neuro_symbolic_brains.pkl", "wb") as f:
    pickle.dump(models, f)
print("\n‚úÖ All 3 Neural Models saved to 'neuro_symbolic_brains.pkl'")


‚úÖ All 3 Neural Models saved to 'neuro_symbolic_brains.pkl'


## Phase 2: The "Symbolic" Brain (Fuzzy Logic)

This code implements the Fuzzy Inference System. It handles the "Buffer Effect" rule: High Anxiety is dangerous, but High Teacher Support can lower that risk.

### Define Fuzzy Variables

Inputs: Scale 0 to 10 (We will normalize PISA scores to this range)

Output: Risk Adjustment (0 to 100)

Higher Score = Higher Risk

In [26]:
anxiety_in = ctrl.Antecedent(np.arange(0, 11, 1), 'anxiety')
support_in = ctrl.Antecedent(np.arange(0, 11, 1), 'support')

risk_out = ctrl.Consequent(np.arange(0, 101, 1), 'risk_factor')

### Define the membership functions

In [27]:
# Anxiety: Low (0-4), Medium (3-7), High (6-10)
anxiety_in.automf(3, names=['low', 'medium', 'high'])

# Support: Low (0-4), Medium (3-7), High (6-10)
support_in.automf(3, names=['low', 'medium', 'high'])

# Risk: Low, Moderate, Critical
risk_out['low'] = fuzz.trimf(risk_out.universe, [0, 0, 50])
risk_out['moderate'] = fuzz.trimf(risk_out.universe, [25, 50, 75])
risk_out['critical'] = fuzz.trimf(risk_out.universe, [50, 100, 100])

### THE RULE BASE (Pedagogical Axioms)

* Rule 1: The "Danger Zone" (High Anxiety + No Help)
* Rule 2: The "Buffer Effect" (High Anxiety + Good Help = Moderate Risk) This is the "Neuro-Symbolic" magic: Pure stats might say "Fail", but Logic says "Survivor".
* Rule 3: Safe Zone

In [29]:
rule1 = ctrl.Rule(anxiety_in['high'] & support_in['low'], risk_out['critical'])
rule2 = ctrl.Rule(anxiety_in['high'] & support_in['high'], risk_out['moderate'])
rule3 = ctrl.Rule(anxiety_in['low'], risk_out['low'])

### Build the System

In [31]:
risk_ctrl = ctrl.ControlSystem([rule1, rule2, rule3])
fuzzy_brain = ctrl.ControlSystemSimulation(risk_ctrl)

print("‚úÖ Fuzzy Logic System Compiled.")

‚úÖ Fuzzy Logic System Compiled.


### Phase 3: The "Student Advisor" Engine

This is the final function you would hook up to your "BuzzFeed" quiz interface. It takes raw student data, normalizes it, runs the Neural Model, runs the Fuzzy Check, and generates a diagnosis.

In [39]:
def analyze_student(profile):
    print(f"\n--- Analyzing Student Profile: {profile['Name']} ---")
    
    # 1. PREPARE INPUT (Same as before)
    input_row = pd.DataFrame([[
        profile['ANXMAT'], profile['BELONG'], profile['ESCS'], 
        profile['TEACHSUP'], profile['BULLIED'], profile['ICTRES'],
        1 if profile['Gender'] == 'Female' else 2
    ]], columns=model_features)
    
    # 2. NEURAL PREDICTIONS (Same as before)
    print("üß† Neural predictions:")
    risks = {}
    for subject, model in models.items():
        probs = model.predict_proba(input_row)[0]
        fail_prob = (probs[0] + probs[1]) * 100 
        risks[subject] = fail_prob
        print(f"  - {subject[3:]} Risk Probability: {fail_prob:.1f}%")

    weakest_subject = max(risks, key=risks.get)
    max_risk = risks[weakest_subject]
    
    # 3. SYMBOLIC REASONING (Same as before)
    f_anxiety = min(10, max(0, (profile['ANXMAT'] - 1) * 3.3))
    f_support = min(10, max(0, (profile['TEACHSUP'] + 2) * 2.5))
    
    fuzzy_brain.input['anxiety'] = f_anxiety
    fuzzy_brain.input['support'] = f_support
    fuzzy_brain.compute()
    symbolic_risk = fuzzy_brain.output['risk_factor']
    
    print(f"üß© Fuzzy Logic Risk Modifier: {symbolic_risk:.1f}/100")
    
    # --- 4. THE FIX: CORRECTED DECISION TREE ---
    print("üìã FINAL ADVISORY:")
    
    # PRIORITY 1: Check Fuzzy "Red Flags" (Maria's Case)
    # If the Symbolic Brain says risk is high (>60), we alert regardless of wealth/stats.
    if symbolic_risk > 60:
        print(f"  üö© PSYCHOSOCIAL ALERT: Severe Anxiety detected without support.")
        print("  Recommendation: Immediate counseling intervention required. Do not pressure academically.")

    # PRIORITY 2: Check Buffer Effect (Lara's Case)
    elif f_anxiety > 7 and f_support > 7:
        print(f"  ‚úÖ BUFFER ACTIVATED: High Anxiety mitigated by Teacher Support.")
        print("  Recommendation: Focus on confidence-building strategies.")

    # PRIORITY 3: Check Statistical Risk (Ben's Case)
    elif max_risk > 70: # Lowered threshold slightly to catch borderline cases
        print(f"  ‚ö†Ô∏è ACADEMIC ALERT: High probability of failure in {weakest_subject[3:]}.")
        if profile['ICTRES'] < -1:
             print("  Root Cause: Digital Poverty identified (Low ICT Resources).")
        elif profile['BULLIED'] > 0:
             print("  Root Cause: Social exclusion/bullying detected.")
        elif profile['Gender'] == 'Male' and weakest_subject == 'PV1READ':
             print("  Root Cause: Gender Gap Risk (Male student struggling in Reading).")
             
    else:
        print("  ‚úÖ Student is generally stable.")

## Test Cases

### Case 1: "The Silent Sufferer" (Maria)

**Hypothesis:** High SES (Wealth) usually predicts success, but severe Anxiety + NO Support should trigger a warning.

**Tests:** Fuzzy Rule 1 (Danger Zone)

In [40]:

# --- TEST CASE: The "High Anxiety" Student ---
maria = {
    'Name': 'Maria (Hidden Crisis)',
    'ANXMAT': 4.0,       # MAX Anxiety (Panic)
    'BELONG': -1.0,      # Feels isolated
    'ESCS': 1.5,         # High Income (Wealthy)
    'TEACHSUP': -2.0,    # ZERO Teacher Support
    'BULLIED': 0.0,
    'ICTRES': 1.0,       # Good Tech
    'Gender': 'Female'
}

analyze_student(maria)


--- Analyzing Student Profile: Maria (Hidden Crisis) ---
üß† Neural predictions:
  - MATH Risk Probability: 31.8%
  - READ Risk Probability: 8.3%
  - SCIE Risk Probability: 19.3%
üß© Fuzzy Logic Risk Modifier: 83.3/100
üìã FINAL ADVISORY:
  üö© PSYCHOSOCIAL ALERT: Severe Anxiety detected without support.
  Recommendation: Immediate counseling intervention required. Do not pressure academically.


### Case 2: "The Digital Divide Victim" (Ben)

**Hypothesis:** Capable student (Low Anxiety), but crippled by lack of resources (Poverty + No Tech).

**Tests:** Neural Model's weighting of ESCS and ICTRES.

In [41]:
ben = {
    'Name': 'Ben (Digital Divide)',
    'ANXMAT': 1.5,       # Low Anxiety (Confident)
    'BELONG': 0.0,       # Average
    'ESCS': -2.5,        # Extreme Poverty
    'TEACHSUP': 0.0,     # Average Support
    'BULLIED': 0.0,
    'ICTRES': -3.0,      # NO Tech/Internet (Critical Barrier)
    'Gender': 'Male'
}

analyze_student(ben)


--- Analyzing Student Profile: Ben (Digital Divide) ---
üß† Neural predictions:
  - MATH Risk Probability: 95.9%
  - READ Risk Probability: 90.2%
  - SCIE Risk Probability: 82.8%
üß© Fuzzy Logic Risk Modifier: 18.0/100
üìã FINAL ADVISORY:
  ‚ö†Ô∏è ACADEMIC ALERT: High probability of failure in MATH.
  Root Cause: Digital Poverty identified (Low ICT Resources).


### Case 3: "The Resilient Survivor" (Lara)
**Hypothesis:** Poor, but protected by the "Buffer Effect" (High Support + Low Anxiety).

**Tests:** Fuzzy Rule 2 (Buffer Effect) overriding the Poverty flag.

In [42]:
lara = {
    'Name': 'Lara (The Resilient)',
    'ANXMAT': 2.0,       # Moderate/Low Anxiety
    'BELONG': 1.0,       # High Belonging
    'ESCS': -1.5,        # Low Income
    'TEACHSUP': 1.5,     # HIGH Teacher Support
    'BULLIED': -1.0,     # Not bullied
    'ICTRES': -0.5,      # Basic Tech
    'Gender': 'Female'
}

analyze_student(lara)


--- Analyzing Student Profile: Lara (The Resilient) ---
üß† Neural predictions:
  - MATH Risk Probability: 56.3%
  - READ Risk Probability: 41.9%
  - SCIE Risk Probability: 45.7%
üß© Fuzzy Logic Risk Modifier: 21.0/100
üìã FINAL ADVISORY:
  ‚úÖ Student is generally stable.


### Case 4: "The Reading Risk" (Rico)
**Hypothesis:** A boy who looks "average" in Math might be failing Reading due to the gender gap.

**Tests:** Multi-Model comparison (Math vs. Reading Risk).

In [43]:
rico = {
    'Name': 'Rico (Gender Gap)',
    'ANXMAT': 2.5,       # Average Anxiety
    'BELONG': -0.5,      # Slightly detached
    'ESCS': -0.5,        # Lower Middle Class
    'TEACHSUP': 0.0,
    'BULLIED': 0.5,      # Slight bullying
    'ICTRES': 0.0,
    'Gender': 'Male'     # Risk Factor for Reading
}

analyze_student(rico)


--- Analyzing Student Profile: Rico (Gender Gap) ---
üß† Neural predictions:
  - MATH Risk Probability: 63.0%
  - READ Risk Probability: 43.2%
  - SCIE Risk Probability: 58.2%
üß© Fuzzy Logic Risk Modifier: 24.9/100
üìã FINAL ADVISORY:
  ‚úÖ Student is generally stable.


## Export the brain (WOw)

In [44]:
import joblib

# 1. Define the Master Dictionary
system_export = {
    'neural_models': models,          # The 3 XGBoost Brains
    'fuzzy_brain': risk_ctrl,         # The Fuzzy Logic Rules (ControlSystem, not Simulation)
    'features': model_features,       # The list of columns ['ANXMAT', 'BELONG'...]
    'thresholds': {                   # Your Decision Logic Cutoffs
        'critical_risk': 70, 
        'psychosocial_trigger': 60
    }
}

# 2. Save to a single file
joblib.dump(system_export, "NeuroSymbolic_System_v1.pkl")

print("üéâ SYSTEM SAVED: 'NeuroSymbolic_System_v1.pkl'")
print("You can now load this file in any Python app (Streamlit, Flask, etc.) to run the Advisor.")

üéâ SYSTEM SAVED: 'NeuroSymbolic_System_v1.pkl'
You can now load this file in any Python app (Streamlit, Flask, etc.) to run the Advisor.
