# üìä Auditor√≠a de Coherencia: Datos Crudos vs Features IA

Este notebook verifica que el pipeline de Feature Engineering (`features.py`) 
transforma correctamente los datos brutos de OULAD en los tensores que consume el Autoencoder.

**Dimensiones auditadas:**
1. üïê Alineaci√≥n temporal semana a semana (clics, notas, procrastinaci√≥n)
2. üìã Variables est√°ticas (cr√©ditos, g√©nero, discapacidad, intentos, IMD, educaci√≥n, edad)
3. üîÑ Coherencia entre splits (Training / Validation / Test)

**Metodolog√≠a:** Se seleccionan 5 alumnos al azar y se compara, campo a campo, el valor bruto de Moodle con el valor procesado en el tensor de la IA.

## 1. Carga de Datos
Cargamos tanto los datos brutos (interactions, assessments, students) como las features procesadas (clicks, performance, procrastination, static).

In [1]:
import pandas as pd
import numpy as np

path_raw = "../data/processed/training"
path_feat = "../data/processed/training/features"

# Datos brutos de Moodle
df_inter = pd.read_csv(f"{path_raw}/interactions.csv")
df_assess = pd.read_csv(f"{path_raw}/assessments.csv")
df_stud = pd.read_csv(f"{path_raw}/students.csv")

# Features procesadas por el pipeline
df_clicks = pd.read_csv(f"{path_feat}/ts_clicks.csv", index_col=0)
df_perf = pd.read_csv(f"{path_feat}/ts_performance.csv", index_col=0)
df_proc = pd.read_csv(f"{path_feat}/ts_procrastination.csv", index_col=0)
df_static = pd.read_csv(f"{path_feat}/static_features.csv", index_col=0)
df_target = pd.read_csv(f"{path_feat}/target.csv", index_col=0)

# Calculamos 'week' EXACTAMENTE como el pipeline (date // 7)
df_inter['week'] = df_inter['date'] // 7
df_assess['week_sub'] = pd.to_numeric(df_assess['date_submitted'], errors='coerce') // 7

# Constantes id√©nticas a features.py
TIPO_CONTENT = ['oucontent', 'resource', 'url', 'page', 'subpage']
WEEK_START, WEEK_END = -2, 35

# Mapas id√©nticos a features.py (para validar las est√°ticas)
IMD_MAP = {
    '0-10%': 0, '10-20%': 1, '20-30%': 2, '30-40%': 3, '40-50%': 4,
    '50-60%': 5, '60-70%': 6, '70-80%': 7, '80-90%': 8, '90-100%': 9
}
EDUCATION_MAP = {
    'No Formal quals': 0, 'Lower Than A Level': 1,
    'A Level or Equivalent': 2, 'HE Qualification': 3,
    'Post Graduate Qualification': 4
}
AGE_MAP = {'0-35': 0, '35-55': 1, '55<=': 2}
GENDER_MAP = {'M': 0, 'F': 1}
DISABILITY_MAP = {'N': 0, 'Y': 1}

# Muestra de 5 alumnos reproducible
np.random.seed(42)
ALUMNOS_AUDITADOS = np.random.choice(df_clicks.index, 5, replace=False)

print(f"‚úÖ Datos cargados: {len(df_clicks)} alumnos en Training")
print(f"   Clicks: {df_clicks.shape[1]} cols | Perf: {df_perf.shape[1]} cols | Proc: {df_proc.shape[1]} cols | Static: {df_static.shape[1]} cols")
print(f"\nüéØ Alumnos seleccionados para auditor√≠a:")
for i, sid in enumerate(ALUMNOS_AUDITADOS):
    print(f"   {i+1}. {sid}")

‚úÖ Datos cargados: 22785 alumnos en Training
   Clicks: 160 cols | Perf: 240 cols | Proc: 40 cols | Static: 10 cols

üéØ Alumnos seleccionados para auditor√≠a:
   1. 134212_EEE_2014J
   2. 695075_BBB_2014J
   3. 412081_BBB_2014J
   4. 682298_FFF_2014J
   5. 609764_BBB_2013J


---
## 2. üïê Auditor√≠a Temporal (Semana a Semana)

Para cada alumno, comparamos **semana por semana** (prev, -2, -1, 0, 1, ..., 35, post):

| Campo | Qu√© se compara | Criterio de √©xito |
|---|---|---|
| **Clics Content** | `sum_click` bruto vs `content_w_X` del tensor | Si raw > 0, IA debe ser > 0 (y viceversa) |
| **Nota TMA** | `score` bruto vs `TMA_avg_w_X` del tensor | Muestra visual (el avg acumulado puede diferir de la nota puntual) |
| **Procrastinaci√≥n** | `date - date_submitted` vs `days_early_w_X` | Muestra visual |

La semana se calcula como `date // 7`, id√©ntico al pipeline.

In [2]:
def auditar_temporal(student_id):
    """Audita la alineaci√≥n temporal semana a semana para un alumno."""
    id_s, mod, pres = student_id.split('_')
    id_s_int = int(id_s)

    inter = df_inter[(df_inter['id_student']==id_s_int) & (df_inter['code_module']==mod) & (df_inter['code_presentation']==pres)]
    assess = df_assess[(df_assess['id_student']==id_s_int) & (df_assess['code_module']==mod) & (df_assess['code_presentation']==pres)]

    t_clicks = df_clicks.loc[student_id]
    t_perf = df_perf.loc[student_id]
    t_proc = df_proc.loc[student_id]

    print(f"\n{'='*130}")
    print(f"üî¨ ALUMNO: {student_id}")
    print(f"{'='*130}")
    print(f"{'SEM':<8} | {'CLICS_RAW':<10} | {'CLICS_IA':<10} | {'NOTA_RAW':<10} | {'NOTA_IA':<10} | {'PROC_RAW':<10} | {'PROC_IA':<10} | {'OK?'}")
    print(f"{'-'*130}")

    errores = 0

    # --- PREV (semanas < -2) ---
    raw_c = inter[(inter['week'] < WEEK_START) & (inter['activity_type'].isin(TIPO_CONTENT))]['sum_click'].sum()
    ia_c = t_clicks['content_w_prev']
    ok = (raw_c == 0 and ia_c == 0) or (raw_c > 0 and ia_c > 0)
    if not ok: errores += 1
    if raw_c > 0 or ia_c > 0:
        print(f"{'prev':<8} | {raw_c:<10.0f} | {ia_c:<10.4f} | {'-':<10} | {t_perf['TMA_avg_w_prev']:<10.4f} | {'-':<10} | {t_proc['days_early_w_prev']:<10.4f} | {'‚úÖ' if ok else '‚ùå'}")

    # --- SEMANAS CORE (-2 a 35) ---
    for w in range(WEEK_START, WEEK_END + 1):
        col = f'w_{w}' if w >= 0 else f'w_neg{abs(w)}'

        raw_c = inter[(inter['week']==w) & (inter['activity_type'].isin(TIPO_CONTENT))]['sum_click'].sum()
        ia_c = t_clicks[f'content_{col}']
        ok_c = (raw_c == 0 and ia_c == 0) or (raw_c > 0 and ia_c > 0)

        notas_w = assess[(assess['week_sub']==w) & (assess['assessment_type']=='TMA')]
        raw_n = f"{notas_w['score'].values[0]:.0f}" if not notas_w.empty else "-"
        ia_n = t_perf[f'TMA_avg_{col}']

        proc_w = assess[assess['week_sub']==w].dropna(subset=['date','date_submitted'])
        raw_p = f"{(proc_w['date'] - proc_w['date_submitted']).values[0]:.0f}" if not proc_w.empty else "-"
        ia_p = t_proc[f'days_early_{col}']

        if not ok_c: errores += 1

        if raw_c > 0 or raw_n != "-" or ia_c > 0:
            print(f"{col:<8} | {raw_c:<10.0f} | {ia_c:<10.4f} | {raw_n:<10} | {ia_n:<10.4f} | {raw_p:<10} | {ia_p:<10.4f} | {'‚úÖ' if ok_c else '‚ùå'}")

    # --- POST (semanas > 35) ---
    raw_c = inter[(inter['week'] > WEEK_END) & (inter['activity_type'].isin(TIPO_CONTENT))]['sum_click'].sum()
    ia_c = t_clicks['content_w_post']
    ok = (raw_c == 0 and ia_c == 0) or (raw_c > 0 and ia_c > 0)
    if not ok: errores += 1
    if raw_c > 0 or ia_c > 0:
        print(f"{'post':<8} | {raw_c:<10.0f} | {ia_c:<10.4f} | {'-':<10} | {t_perf['TMA_avg_w_post']:<10.4f} | {'-':<10} | {t_proc['days_early_w_post']:<10.4f} | {'‚úÖ' if ok else '‚ùå'}")

    print(f"{'-'*130}")
    print(f"üìä Errores de alineaci√≥n temporal: {errores}")
    return errores


# --- EJECUTAR SOBRE LOS 5 ALUMNOS ---
total_err_temp = 0
for sid in ALUMNOS_AUDITADOS:
    total_err_temp += auditar_temporal(sid)

print(f"\n{'='*130}")
print(f"üèÜ RESULTADO TEMPORAL: {total_err_temp} errores en {len(ALUMNOS_AUDITADOS)} alumnos ‚Üí {'‚úÖ OK' if total_err_temp == 0 else '‚ùå REVISAR'}")


üî¨ ALUMNO: 134212_EEE_2014J
SEM      | CLICS_RAW  | CLICS_IA   | NOTA_RAW   | NOTA_IA    | PROC_RAW   | PROC_IA    | OK?
----------------------------------------------------------------------------------------------------------------------------------
w_neg1   | 21         | 0.3631     | -          | 0.0000     | -          | 0.6019     | ‚úÖ
w_0      | 31         | 0.4071     | -          | 0.0000     | -          | 0.6019     | ‚úÖ
w_1      | 110        | 0.5533     | -          | 0.0000     | -          | 0.6019     | ‚úÖ
w_2      | 93         | 0.5337     | -          | 0.0000     | -          | 0.6019     | ‚úÖ
w_3      | 46         | 0.4523     | -          | 0.0000     | -          | 0.6019     | ‚úÖ
w_4      | 55         | 0.4729     | 90         | 0.9000     | 1          | 0.6036     | ‚úÖ
w_5      | 4          | 0.1891     | -          | 0.9000     | -          | 0.6019     | ‚úÖ
w_6      | 41         | 0.4391     | -          | 0.9000     | -          | 0.6019     | ‚úÖ
w

---
## 3. üìã Auditor√≠a de Variables Est√°ticas

Para cada alumno, comparamos **todas** las variables del perfil est√°tico:

| Variable | Campo Moodle | Campo IA | Validaci√≥n |
|---|---|---|---|
| **Cr√©ditos** | `studied_credits` | `credits` | Escalado: si raw > 0 ‚Üí IA > 0 |
| **Intentos** | `num_of_prev_attempts` | `num_of_prev_attempts` | Escalado: ambos 0 o ambos > 0 |
| **Duraci√≥n m√≥dulo** | `module_presentation_length` | `module_presentation_length` | Escalado: si raw > 0 ‚Üí IA > 0 |
| **Fecha registro** | `date_registration` | `date_registration` | Puede ser negativo (registro previo al curso) |
| **IMD Band** | `imd_band` (ej: '50-60%') | `imd_band_numeric` (ej: 5) | Mapeo ordinal 0-9 |
| **Educaci√≥n** | `highest_education` | `education_level` | Mapeo ordinal 0-4 |
| **Edad** | `age_band` | `age_numeric` | Mapeo ordinal 0-2 |
| **G√©nero** | `gender` (M/F) | `gender_bool` (0/1) | **Exacto**: M‚Üí0, F‚Üí1 |
| **Discapacidad** | `disability` (N/Y) | `disability_bool` (0/1) | **Exacto**: N‚Üí0, Y‚Üí1 |

In [3]:
def auditar_estaticas(student_id):
    """Audita TODAS las variables est√°ticas con validaciones exactas."""
    id_s, mod, pres = student_id.split('_')
    s_raw = df_stud[
        (df_stud['id_student']==int(id_s)) &
        (df_stud['code_module']==mod) &
        (df_stud['code_presentation']==pres)
    ].iloc[0]

    ia = df_static.loc[student_id]

    checks = {
        'Cr√©ditos':       (s_raw['studied_credits'], ia['credits'],
                           lambda r, i: i > 0 or r == 0),
        'Intentos':       (s_raw['num_of_prev_attempts'], ia['num_of_prev_attempts'],
                           lambda r, i: (r == 0 and i == 0) or (r > 0 and i > 0)),
        'Mod. Length':    (s_raw['module_presentation_length'], ia['module_presentation_length'],
                           lambda r, i: i > 0 or r == 0),
        'Fecha Reg.':     (s_raw['date_registration'], ia['date_registration'],
                           lambda r, i: True),
        'IMD Band':       (s_raw['imd_band'], ia['imd_band_numeric'],
                           lambda r, i: (str(r) in IMD_MAP and i >= 0) or (str(r) not in IMD_MAP)),
        'Educaci√≥n':      (s_raw['highest_education'], ia['education_level'],
                           lambda r, i: (r in EDUCATION_MAP and i >= 0) or (r not in EDUCATION_MAP)),
        'Edad':           (s_raw['age_band'], ia['age_numeric'],
                           lambda r, i: (r in AGE_MAP and i >= 0) or (r not in AGE_MAP)),
        'G√©nero':         (s_raw['gender'], ia['gender_bool'],
                           lambda r, i: (r == 'M' and i == 0) or (r == 'F' and i == 1)),
        'Discapacidad':   (s_raw['disability'], ia['disability_bool'],
                           lambda r, i: (r == 'N' and i == 0) or (r == 'Y' and i == 1)),
    }

    print(f"\n{'='*70}")
    print(f"üìã ALUMNO: {student_id}")
    print(f"{'='*70}")
    print(f"{'VARIABLE':<15} | {'MOODLE (Raw)':<25} | {'IA (Tensor)':<12} | {'OK?'}")
    print(f"{'-'*70}")
    errores = 0
    for label, (raw, ia_val, check_fn) in checks.items():
        ok = check_fn(raw, ia_val)
        if not ok: errores += 1
        print(f"{label:<15} | {str(raw):<25} | {ia_val:<12.4f} | {'‚úÖ' if ok else '‚ùå'}")
    print(f"{'-'*70}")
    print(f"Errores est√°ticos: {errores}")
    return errores


# --- EJECUTAR SOBRE LOS MISMOS 5 ALUMNOS ---
total_err_stat = 0
for sid in ALUMNOS_AUDITADOS:
    total_err_stat += auditar_estaticas(sid)

print(f"\n{'='*70}")
print(f"üèÜ RESULTADO EST√ÅTICAS: {total_err_stat} errores en {len(ALUMNOS_AUDITADOS)} alumnos ‚Üí {'‚úÖ OK' if total_err_stat == 0 else '‚ùå REVISAR'}")


üìã ALUMNO: 134212_EEE_2014J
VARIABLE        | MOODLE (Raw)              | IA (Tensor)  | OK?
----------------------------------------------------------------------
Cr√©ditos        | 60                        | 0.0480       | ‚úÖ
Intentos        | 0                         | 0.0000       | ‚úÖ
Mod. Length     | 269                       | 1.0000       | ‚úÖ
Fecha Reg.      | -37.0                     | 0.6382       | ‚úÖ
IMD Band        | 20-30%                    | 0.3000       | ‚úÖ
Educaci√≥n       | A Level or Equivalent     | 0.5000       | ‚úÖ
Edad            | 0-35                      | 0.0000       | ‚úÖ
G√©nero          | M                         | 0.0000       | ‚úÖ
Discapacidad    | N                         | 0.0000       | ‚úÖ
----------------------------------------------------------------------
Errores est√°ticos: 0

üìã ALUMNO: 695075_BBB_2014J
VARIABLE        | MOODLE (Raw)              | IA (Tensor)  | OK?
--------------------------------------------------------

---
## 4. üîÑ Coherencia entre Splits

Verificamos que los 3 splits (Training, Validation, Test) tienen:
- El mismo n√∫mero de columnas (dimensiones del tensor)
- Rangos de cr√©ditos coherentes (el scaler se entrena en Training, as√≠ que Val/Test pueden tener m√°x < 1.0)
- Medias similares (indica que la estratificaci√≥n es correcta)

In [4]:
print("üìä COHERENCIA ENTRE SPLITS")
print(f"{'SPLIT':<12} | {'ALUMNOS':<8} | {'CRED_MAX':<10} | {'CRED_MEAN':<10} | {'CLICKS_COLS':<12} | {'PERF_COLS':<10}")
print("-" * 75)

for split in ['training', 'validation', 'test']:
    p = f"../data/processed/{split}/features"
    try:
        s = pd.read_csv(f"{p}/static_features.csv", index_col=0)
        c = pd.read_csv(f"{p}/ts_clicks.csv", index_col=0)
        pf = pd.read_csv(f"{p}/ts_performance.csv", index_col=0)
        print(f"{split:<12} | {len(s):<8} | {s['credits'].max():<10.4f} | {s['credits'].mean():<10.4f} | {c.shape[1]:<12} | {pf.shape[1]:<10}")
    except Exception as e:
        print(f"{split:<12} | ‚ö†Ô∏è Error: {e}")

üìä COHERENCIA ENTRE SPLITS
SPLIT        | ALUMNOS  | CRED_MAX   | CRED_MEAN  | CLICKS_COLS  | PERF_COLS 
---------------------------------------------------------------------------
training     | 22785    | 1.0000     | 0.0798     | 160          | 240       
validation   | 4889     | 0.8160     | 0.0800     | 160          | 240       
test         | 4919     | 0.5760     | 0.0785     | 160          | 240       


---
## 5. üèÜ Veredicto Final

Resumen global de todos los checks realizados.

In [5]:
total = total_err_temp + total_err_stat
print(f"{'='*70}")
print(f"üèÜ VEREDICTO FINAL DE LA AUDITOR√çA")
print(f"{'='*70}")
print(f"   Alumnos auditados:   {len(ALUMNOS_AUDITADOS)}")
print(f"   Errores temporales:  {total_err_temp}")
print(f"   Errores est√°ticos:   {total_err_stat}")
print(f"   Total errores:       {total}")
print(f"{'='*70}")
if total == 0:
    print(f"   ‚úÖ PIPELINE √çNTEGRO")
    print(f"   ‚úÖ Los datos crudos se transforman correctamente en features")
    print(f"   ‚úÖ El Autoencoder recibir√° features fieles a la realidad de Moodle")
else:
    print(f"   ‚ùå HAY {total} PROBLEMAS - Revisar antes de entrenar")

üèÜ VEREDICTO FINAL DE LA AUDITOR√çA
   Alumnos auditados:   5
   Errores temporales:  0
   Errores est√°ticos:   0
   Total errores:       0
   ‚úÖ PIPELINE √çNTEGRO
   ‚úÖ Los datos crudos se transforman correctamente en features
   ‚úÖ El Autoencoder recibir√° features fieles a la realidad de Moodle


In [1]:
import pandas as pd
import numpy as np

# ============================================================
# üîé RADIOGRAF√çA COMPLETA DE UN ALUMNO
# ============================================================

path_raw = "../data/processed/training"
path_feat = "../data/processed/training/features"

df_inter = pd.read_csv(f"{path_raw}/interactions.csv")
df_assess = pd.read_csv(f"{path_raw}/assessments.csv")
df_stud = pd.read_csv(f"{path_raw}/students.csv")
df_clicks = pd.read_csv(f"{path_feat}/ts_clicks.csv", index_col=0)
df_perf = pd.read_csv(f"{path_feat}/ts_performance.csv", index_col=0)
df_proc = pd.read_csv(f"{path_feat}/ts_procrastination.csv", index_col=0)
df_static = pd.read_csv(f"{path_feat}/static_features.csv", index_col=0)
df_target = pd.read_csv(f"{path_feat}/target.csv", index_col=0)

# 1. BUSCAR UN ALUMNO CON POCAS INTERACCIONES
# Contamos interacciones por alumno y pillamos uno con pocas
df_inter['unique_id'] = df_inter['id_student'].astype(str) + '_' + df_inter['code_module'] + '_' + df_inter['code_presentation']
conteo = df_inter.groupby('unique_id').size().sort_values()
# Elegimos uno que tenga entre 3 y 10 registros (f√°cil de verificar a mano)
candidatos = conteo[(conteo >= 3) & (conteo <= 10)]
alumno_id = candidatos.index[0]

id_s, mod, pres = alumno_id.split('_')
id_s_int = int(id_s)

# Filtrar TODOS los datos de este alumno en este curso
inter = df_inter[(df_inter['id_student']==id_s_int) & (df_inter['code_module']==mod) & (df_inter['code_presentation']==pres)]
assess = df_assess[(df_assess['id_student']==id_s_int) & (df_assess['code_module']==mod) & (df_assess['code_presentation']==pres)]
stud = df_stud[(df_stud['id_student']==id_s_int) & (df_stud['code_module']==mod) & (df_stud['code_presentation']==pres)].iloc[0]

print("üî¨" * 30)
print(f"  RADIOGRAF√çA COMPLETA: {alumno_id}")
print("üî¨" * 30)

# ============================================================
# PARTE A: DATOS BRUTOS DE MOODLE (Lo que hizo el alumno)
# ============================================================
print(f"\n{'='*80}")
print(f"üì± PARTE A: LO QUE MOODLE REGISTR√ì (Datos Brutos)")
print(f"{'='*80}")

print(f"\nüë§ PERFIL DEL ALUMNO:")
print(f"   Estudiante:       {id_s_int}")
print(f"   Curso:            {mod} ({pres})")
print(f"   G√©nero:           {stud['gender']}")
print(f"   Edad:             {stud['age_band']}")
print(f"   Educaci√≥n:        {stud['highest_education']}")
print(f"   IMD Band:         {stud['imd_band']}")
print(f"   Cr√©ditos:         {stud['studied_credits']}")
print(f"   Intentos previos: {stud['num_of_prev_attempts']}")
print(f"   Discapacidad:     {stud['disability']}")
print(f"   Resultado final:  {stud['final_result']}")

print(f"\nüìä TODAS SUS INTERACCIONES ({len(inter)} registros):")
print(f"   {'D√çA':<6} | {'SEMANA':<7} | {'TIPO ACTIVIDAD':<20} | {'CLICS'}")
print(f"   {'-'*55}")
for _, row in inter.sort_values('date').iterrows():
    semana = int(row['date'] // 7)
    print(f"   {int(row['date']):<6} | {semana:<7} | {row['activity_type']:<20} | {int(row['sum_click'])}")

print(f"\nüìù TODAS SUS EVALUACIONES ({len(assess)} registros):")
if assess.empty:
    print("   (Sin evaluaciones)")
else:
    print(f"   {'TIPO':<5} | {'D√çA ENTREGA':<12} | {'D√çA L√çMITE':<10} | {'SEMANA':<7} | {'NOTA':<6} | {'D√çAS EARLY'}")
    print(f"   {'-'*65}")
    for _, row in assess.sort_values('date_submitted').iterrows():
        d_sub = row['date_submitted']
        d_lim = row['date']
        semana = int(d_sub // 7) if pd.notna(d_sub) else '?'
        early = int(d_lim - d_sub) if pd.notna(d_sub) and pd.notna(d_lim) else '?'
        print(f"   {row['assessment_type']:<5} | {str(d_sub):<12} | {str(d_lim):<10} | {str(semana):<7} | {row['score']:<6} | {early}")

# ============================================================
# PARTE B: LO QUE LA IA VE (Tensor Procesado)
# ============================================================
print(f"\n\n{'='*80}")
print(f"ü§ñ PARTE B: LO QUE LA IA VE (Tensor Procesado)")
print(f"{'='*80}")

# --- Est√°ticas ---
ia = df_static.loc[alumno_id]
print(f"\nüìã VARIABLES EST√ÅTICAS:")
print(f"   {'VARIABLE':<20} | {'MOODLE':<25} | {'IA TENSOR':<12} | {'L√ìGICA'}")
print(f"   {'-'*75}")
print(f"   {'Cr√©ditos':<20} | {stud['studied_credits']:<25} | {ia['credits']:<12.4f} | {'escalado 0-1'}")
print(f"   {'Intentos':<20} | {stud['num_of_prev_attempts']:<25} | {ia['num_of_prev_attempts']:<12.4f} | {'escalado 0-1'}")
print(f"   {'Mod. Length':<20} | {stud['module_presentation_length']:<25} | {ia['module_presentation_length']:<12.4f} | {'escalado 0-1'}")
print(f"   {'Fecha Reg.':<20} | {stud['date_registration']:<25} | {ia['date_registration']:<12.4f} | {'escalado 0-1'}")
print(f"   {'IMD Band':<20} | {str(stud['imd_band']):<25} | {ia['imd_band_numeric']:<12.4f} | {'ordinal 0-9 ‚Üí escalado'}")
print(f"   {'Educaci√≥n':<20} | {str(stud['highest_education']):<25} | {ia['education_level']:<12.4f} | {'ordinal 0-4 ‚Üí escalado'}")
print(f"   {'Edad':<20} | {str(stud['age_band']):<25} | {ia['age_numeric']:<12.4f} | {'ordinal 0-2 ‚Üí escalado'}")
print(f"   {'G√©nero':<20} | {stud['gender']:<25} | {ia['gender_bool']:<12.4f} | {'M‚Üí0, F‚Üí1'}")
print(f"   {'Discapacidad':<20} | {stud['disability']:<25} | {ia['disability_bool']:<12.4f} | {'N‚Üí0, Y‚Üí1'}")

# --- Target ---
target_val = df_target.loc[alumno_id].values[0]
print(f"\nüéØ TARGET:")
print(f"   Resultado Moodle: {stud['final_result']}")
print(f"   Clase IA:         {target_val}  (0=Pass, 1=Distinction, 2=Fail, 3=Withdrawn)")

# --- Clics por semana (solo semanas con actividad) ---
t_clicks = df_clicks.loc[alumno_id]
TIPO_CONTENT = ['oucontent', 'resource', 'url', 'page', 'subpage']

print(f"\nüñ±Ô∏è  CLICS CONTENT (Solo semanas con actividad):")
print(f"   {'SEMANA':<8} | {'RAW (suma)':<12} | {'log1p(raw)':<12} | {'IA TENSOR':<12} | {'¬øCUADRA?'}")
print(f"   {'-'*65}")

inter['week'] = inter['date'] // 7
for w in range(-2, 36):
    col = f'w_{w}' if w >= 0 else f'w_neg{abs(w)}'
    raw = inter[(inter['week']==w) & (inter['activity_type'].isin(TIPO_CONTENT))]['sum_click'].sum()
    ia_val = t_clicks[f'content_{col}']
    if raw > 0 or ia_val > 0:
        log_val = np.log1p(raw)
        ok = (raw == 0 and ia_val == 0) or (raw > 0 and ia_val > 0)
        print(f"   {col:<8} | {raw:<12.0f} | {log_val:<12.4f} | {ia_val:<12.4f} | {'‚úÖ' if ok else '‚ùå'}")

# --- Notas ---
t_perf = df_perf.loc[alumno_id]
print(f"\nüìù RENDIMIENTO TMA (Solo semanas con nota):")
print(f"   {'SEMANA':<8} | {'NOTA RAW':<10} | {'IA (avg acum.)':<15} | {'EXPLICACI√ìN'}")
print(f"   {'-'*65}")

assess['week_sub'] = pd.to_numeric(assess['date_submitted'], errors='coerce') // 7
notas_acum = []
for w in range(-2, 36):
    col = f'w_{w}' if w >= 0 else f'w_neg{abs(w)}'
    notas_w = assess[(assess['week_sub']==w) & (assess['assessment_type']=='TMA')]
    ia_val = t_perf[f'TMA_avg_{col}']
    if not notas_w.empty:
        nota = notas_w['score'].values[0]
        notas_acum.append(nota / 100)
        avg = np.mean(notas_acum)
        print(f"   {col:<8} | {nota:<10.1f} | {ia_val:<15.4f} | avg({[f'{n:.2f}' for n in notas_acum]}) = {avg:.4f}")
    elif ia_val != 0 and len(notas_acum) > 0:
        # Semana sin nota pero con valor (se mantiene el promedio anterior)
        pass  # No imprimir para no saturar

print(f"\n{'='*80}")
print(f"üí° INSTRUCCIONES: Compara la columna 'MOODLE' con 'IA TENSOR'.")
print(f"   Si todo cuadra, el pipeline es correcto para este alumno.")
print(f"{'='*80}")

üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨
  RADIOGRAF√çA COMPLETA: 688028_CCC_2014J
üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨üî¨

üì± PARTE A: LO QUE MOODLE REGISTR√ì (Datos Brutos)

üë§ PERFIL DEL ALUMNO:
   Estudiante:       688028
   Curso:            CCC (2014J)
   G√©nero:           F
   Edad:             0-35
   Educaci√≥n:        A Level or Equivalent
   IMD Band:         20-30%
   Cr√©ditos:         120
   Intentos previos: 0
   Discapacidad:     N
   Resultado final:  Withdrawn

üìä TODAS SUS INTERACCIONES (3 registros):
   D√çA    | SEMANA  | TIPO ACTIVIDAD       | CLICS
   -------------------------------------------------------
   4      | 0       | url                  | 1
   4      | 0       | homepage             | 3
   4      | 0       | resource             | 1

üìù TODAS SUS EVALUACIONES (0 registros):
   (Sin ev