# MAT-119 Curriculum & Student Performance Analysis (Portfolio-Safe)

**Purpose:** Demonstrate an end-to-end analytics workflow on LMS-style assessment data **without publishing any raw student records**.

This notebook is intentionally designed to be public:
- It does **not** include or load any raw institutional data by default.
- It can run on **synthetic sample data** to show methodology and outputs.
- When running privately, you may point the pipeline to your own local LMS export.

## Outputs
- Clean numeric dataset (local only)
- Predictor correlations (Spearman ranked; Pearson reference)
- Focused correlation heatmap
- Chapter-level HW/Quiz aggregates + correlations


## 1) Setup
This project uses a script-based pipeline:
- `scripts/clean_data.py`
- `scripts/correlation_analysis.py`
- `scripts/chapter_aggregation.py`

Below we demonstrate the workflow using **synthetic data**. Replace the synthetic generation step with a local file path if running privately.


In [None]:
import numpy as np
import pandas as pd
from pathlib import Path

# If you want to run on private local data, set a path here (DO NOT commit the file).
LOCAL_DATA_PATH = None  # Path('data/private_export.csv') cannot share data


## 2) Create synthetic LMS-style data (public demo)
This mimics a realistic LMS export with:
- pre-test
- attendance
- tutoring
- chapter homework + quiz columns
- final score as a function of fundamentals + engagement


In [None]:
def make_synthetic_lms(n=200, seed=7):
    rng = np.random.default_rng(seed)
    pre = rng.normal(60, 12, n).clip(0, 100)
    attend = rng.normal(80, 10, n).clip(0, 100)
    tutor = rng.binomial(1, 0.35, n) * rng.normal(85, 8, n).clip(0, 100)

    # Foundational chapters influence later chapters
    ch12 = (pre * 0.55 + rng.normal(20, 10, n)).clip(0, 100)
    ch13 = (pre * 0.50 + rng.normal(25, 10, n)).clip(0, 100)
    ch14 = (pre * 0.48 + rng.normal(28, 11, n)).clip(0, 100)
    ch15 = (pre * 0.46 + rng.normal(30, 12, n)).clip(0, 100)

    ch2_hw = (0.35*attend + 0.50*(ch12+ch13+ch14+ch15)/4 + rng.normal(10, 8, n)).clip(0, 100)
    ch2_qz = (0.40*ch2_hw + rng.normal(35, 12, n)).clip(0, 100)

    ch3_hw = (0.55*ch2_hw + rng.normal(20, 10, n)).clip(0, 100)
    ch3_qz = (0.50*ch3_hw + rng.normal(25, 11, n)).clip(0, 100)

    ch4_hw = (0.60*ch3_hw + rng.normal(18, 10, n)).clip(0, 100)
    ch4_qz = (0.50*ch4_hw + rng.normal(22, 11, n)).clip(0, 100)

    ch5_hw = (0.60*ch4_hw + rng.normal(18, 10, n)).clip(0, 100)
    ch5_qz = (0.45*ch5_hw + rng.normal(28, 12, n)).clip(0, 100)

    # Final score combines fundamentals + engagement + later chapter mastery
    final = (
        0.20*pre + 0.20*attend + 0.10*(tutor>0).astype(float)*100 +
        0.15*ch2_hw + 0.15*ch3_hw + 0.10*ch4_hw + 0.10*ch5_hw +
        rng.normal(0, 6, n)
    ).clip(0, 100)

    df = pd.DataFrame({
        'Final Score': final,
        'Roll Call Attendance (demo)': attend,
        'Tutoring (demo)': tutor,
        'MAT 119 Pre-Test (demo)': pre,
        'Chapter 1.2 -Real Numbers: HW (demo)': ch12,
        'Chapter 1.3 Operations with Real Numbers HW (demo)': ch13,
        'Chapter 1.4 Simplifying Algebraic Exp.: HW (demo)': ch14,
        'Chapter 1.5 Solving Linear Equations : HW (demo)': ch15,
        'Chapter 2.1 Graphs: HW (demo)': ch2_hw,
        'Chapter 2 Quiz (demo)': ch2_qz,
        'Chapter 3.1 Solving Systems by Graphing :HW (demo)': ch3_hw,
        'Chapter 3 Quiz (demo)': ch3_qz,
        'Chapter 4.1 Solving Linear Inequalities : HW (demo)': ch4_hw,
        'Chapter 4 Quiz (demo)': ch4_qz,
        'Chapter 5.1 Exponents HW (demo)': ch5_hw,
        'Chapter 5 Quiz (demo)': ch5_qz,
    })
    return df

df = make_synthetic_lms()
df.head()

## 3) Run the pipeline (public demo)
We run the same cleaning/correlation/aggregation logic used on real LMS exports.


In [None]:
import sys
sys.path.append(str(Path('..').resolve() / 'scripts'))

from clean_data import CleanConfig, clean_lms_export, build_predictor_view
from correlation_analysis import corr_with_final

cfg = CleanConfig(final_col='Final Score')
clean_numeric = clean_lms_export(df, cfg)
predictors_only = build_predictor_view(clean_numeric, cfg)

spearman = corr_with_final(predictors_only, 'Final Score', 'spearman').sort_values(key=lambda s: s.abs(), ascending=False)
pearson  = corr_with_final(predictors_only, 'Final Score', 'pearson').reindex(spearman.index)

pd.DataFrame({'Spearman': spearman, 'Pearson': pearson}).head(10)

## 4) Key takeaways (example)
In your real analysis, you will translate the ranked signals into decision-ready insights:
- Early foundational mastery correlates strongly with downstream chapter performance
- Engagement measures (attendance/tutoring) provide actionable intervention levers
- Chapter-level aggregation improves interpretability over raw per-assignment features


## Running privately on institutional data
1) Place your export locally (do not commit it) and add it under `data/`.
2) Run:

```bash
python scripts/clean_data.py --input data/private_export.csv \
  --out outputs/clean_numeric.parquet \
  --predictors-out outputs/predictors_only.parquet

python scripts/correlation_analysis.py --input outputs/predictors_only.parquet
python scripts/chapter_aggregation.py --input outputs/predictors_only.parquet
```

Only aggregated outputs and figures should be committed.
