# üìò Analytics-Ready Table: `student_course_analytics`

The `student_course_analytics` table is a **BI-ready unified dataset** built **entirely using the Gold Layer tables** of the OULAD Medallion Architecture. It is designed to support:

* **Overview dashboards** (pass/fail, engagement, performance)
* **Diagnostic analysis** (engagement vs. score, module comparisons)
* **Predictive/at-risk identification**
* **Recommendations based on learning behaviour**

This dataset avoids dependencies on Silver or Bronze layers and uses **only these Gold tables**:

* `dim_student`
* `dim_course`
* `fact_assessment_score`
* `fact_vle_interactions`

All additional metrics, KPIs, and risk flags are **derived** in the final SQL logic.

---

## üß± Dataset Grain (Level of Detail)

Each row represents:

> **Student √ó Module √ó Presentation**

This grain ensures that analysis can target:

* individual student behaviour
* module-level performance
* presentation-level trends

---

# üß© Source Tables and Role in Analytics

### 1. **dim_student**

Contains deduplicated demographic attributes used for segmentation:

* gender
* region
* highest_education
* imd_band
* disability

### 2. **dim_course**

Provides context on course structure:

* presentation_length
* module/presentation codes

### 3. **fact_assessment_score**

Used to derive academic performance indicators:

* score distribution
* weighted averages
* number of assessments submitted
* pass/fail behaviour

### 4. **fact_vle_interactions**

Used to derive engagement indicators:

* total clicks
* active days
* average interactions per day
* overall engagement patterns

---

# üìä Derived Performance Metrics (Assessment Aggregates)

From `fact_assessment_score`:

* `num_assessments_submitted`
* `avg_score`
* `max_score` / `min_score`
* `weighted_avg_score`
* `passed_any_assessment`

These metrics describe a student's academic progress throughout the course.

---

# üñ•Ô∏è Derived Engagement Metrics (VLE Aggregates)

From `fact_vle_interactions`:

* `total_clicks`
* `active_days`
* `avg_clicks_per_row`
* `avg_clicks_per_active_day`

These help determine how frequently and consistently the student engages with learning materials.

---

# üî• Risk Indicators (Multiple Methods)

To support predictive insights and early-warning diagnostics, the model includes **four risk detection methods**, each aligned with a different analytical perspective.

## 1Ô∏è‚É£ `risk_performance_based`

Focuses on **academic performance** and basic engagement:

* `weighted_avg_score < 40` ‚Üí At risk
* OR `total_clicks < 5` ‚Üí At risk

This is a simple early identifier of struggling students.

---

## 2Ô∏è‚É£ `risk_low_engagement_percentile`

Identifies students in the **lowest 30% of engagement** within each module/presentation:

* Uses percentile-based comparison
* Ideal for diagnostic dashboards and statistical insights

Marks students significantly deviating from the engagement norm.

---

## 3Ô∏è‚É£ `risk_composite_score`

A hybrid scoring model combining:

* normalized weighted average score
* normalized total clicks

Formula:

```
risk_score = 0.6 * academic_score + 0.4 * engagement_score
```

Students with **risk_score < 0.35** are flagged as at risk.

This mimics ML-like risk scoring and is suitable for predictive analytics.

---

## 4Ô∏è‚É£ `risk_rule_based`

A rule-driven early warning model commonly used in academic analytics:

Student is at risk if:

* average score < 50
* active days < 5
* total clicks < 20
* has not passed any assessment

Simple, interpretable, and actionable for decision-makers.

---

# üöÄ Dashboard Use Cases Enabled by This Table

### üîπ Overview Page

* Pass vs. fail distribution
* Engagement summaries
* Performance KPIs
* At-risk counts by method (v1‚Äìv4)




In [0]:
spark.sql("USE CATALOG analytics_oulad")
spark.sql("USE SCHEMA gold_star")

In [0]:
%sql
-- =====================================================================
-- ANALYTICS-READY TABLE FOR BI DASHBOARDS
-- Sources used: ONLY GOLD LAYER TABLES
--    dim_student
--    dim_course
--    fact_assessment_score
--    fact_vle_interactions
-- =====================================================================

CREATE OR REPLACE TABLE analytics_oulad.gold_star.student_course_analytics AS

-- =====================================================================
-- 1. Base student √ó module √ó presentation combinations
-- =====================================================================
WITH base AS (
    SELECT DISTINCT student_id, code_module, code_presentation
    FROM analytics_oulad.gold_star.fact_assessment_score

    UNION

    SELECT DISTINCT student_id, code_module, code_presentation
    FROM analytics_oulad.gold_star.fact_vle_interactions
),

-- =====================================================================
-- 2. Assessment Aggregates (Performance Metrics)
-- =====================================================================
assessment_agg AS (
    SELECT
        student_id,
        code_module,
        code_presentation,

        COUNT(*) AS num_assessments_submitted,
        AVG(score) AS avg_score,
        MAX(score) AS max_score,
        MIN(score) AS min_score,

        -- Weighted average score calculation
        CASE 
            WHEN SUM(weight) = 0 OR SUM(weight) IS NULL 
                THEN NULL 
            ELSE SUM(score * weight) / SUM(weight)
        END AS weighted_avg_score,

        -- Whether student passed ANY assessment (score >= 40)
        MAX(CASE WHEN score >= 40 THEN 1 ELSE 0 END) AS passed_any_assessment

    FROM analytics_oulad.gold_star.fact_assessment_score
    GROUP BY student_id, code_module, code_presentation
),

-- =====================================================================
-- 3. VLE Aggregates (Engagement Metrics)
-- =====================================================================
vle_agg AS (
    SELECT
        student_id,
        code_module,
        code_presentation,

        SUM(clicks) AS total_clicks,
        COUNT(DISTINCT date) AS active_days,
        AVG(clicks) AS avg_clicks_per_row,

        CASE 
            WHEN COUNT(DISTINCT date) = 0 THEN NULL
            ELSE SUM(clicks) / COUNT(DISTINCT date)
        END AS avg_clicks_per_active_day

    FROM analytics_oulad.gold_star.fact_vle_interactions
    GROUP BY student_id, code_module, code_presentation
),

-- =====================================================================
-- 4. Student Dimension (Deduped)
-- =====================================================================
student_dim AS (
    SELECT DISTINCT
        student_id,
        gender,
        region,
        highest_education,
        imd_band,
        disability
    FROM analytics_oulad.gold_star.dim_student
),

-- =====================================================================
-- 5. Course Dimension
-- =====================================================================
course_dim AS (
    SELECT DISTINCT
        code_module,
        code_presentation,
        presentation_length
    FROM analytics_oulad.gold_star.dim_course
)

-- =====================================================================
-- 6. FINAL OUTPUT TABLE
-- =====================================================================
SELECT
    b.student_id,
    b.code_module,
    b.code_presentation,

    -- STUDENT DEMOGRAPHICS
    sd.gender,
    sd.region,
    sd.highest_education,
    sd.imd_band,
    sd.disability,

    -- COURSE ATTRIBUTES
    cd.presentation_length,

    -- PERFORMANCE METRICS
    aa.num_assessments_submitted,
    aa.avg_score,
    aa.max_score,
    aa.min_score,
    aa.weighted_avg_score,
    aa.passed_any_assessment,

    -- ENGAGEMENT METRICS
    va.total_clicks,
    va.active_days,
    va.avg_clicks_per_row,
    va.avg_clicks_per_active_day,


    -- =================================================================
    -- RISK METHOD 1: PERFORMANCE-BASED
    -- =================================================================
    CASE 
        WHEN aa.weighted_avg_score < 40 THEN 'At risk'
        WHEN va.total_clicks < 5 THEN 'At risk'
        ELSE 'Not at risk'
    END AS risk_performance_based,


    -- =================================================================
    -- RISK METHOD 2: LOW ENGAGEMENT PERCENTILE (30th percentile)
    -- =================================================================
    CASE
        WHEN va.total_clicks < 
             PERCENTILE(va.total_clicks, 0.3) 
             OVER (PARTITION BY b.code_module, b.code_presentation)
        THEN 'At risk'
        ELSE 'Not at risk'
    END AS risk_low_engagement_percentile,


    -- =================================================================
    -- RISK METHOD 3: COMPOSITE SCORE MODEL (like ML scoring)
    -- mixture of performance + engagement
    -- =================================================================
    CASE 
        WHEN (
            (0.6 * (aa.weighted_avg_score / 100.0)) +
            (0.4 * (va.total_clicks / NULLIF(MAX(va.total_clicks) OVER (), 0)))
        ) < 0.35
        THEN 'At risk'
        ELSE 'Not at risk'
    END AS risk_composite_score,


    -- =================================================================
    -- RISK METHOD 4: RULE-BASED EARLY WARNING SYSTEM
    -- widely used in educational analytics
    -- =================================================================
    CASE
        WHEN aa.avg_score < 50 THEN 'At risk'
        WHEN va.active_days < 5 THEN 'At risk'
        WHEN va.total_clicks < 20 THEN 'At risk'
        WHEN aa.passed_any_assessment = 0 THEN 'At risk'
        ELSE 'Not at risk'
    END AS risk_rule_based

FROM base b
LEFT JOIN assessment_agg aa ON aa.student_id = b.student_id
                           AND aa.code_module = b.code_module
                           AND aa.code_presentation = b.code_presentation

LEFT JOIN vle_agg va ON va.student_id = b.student_id
                     AND va.code_module = b.code_module
                     AND va.code_presentation = b.code_presentation

LEFT JOIN student_dim sd ON b.student_id = sd.student_id

LEFT JOIN course_dim cd ON b.code_module = cd.code_module
                        AND b.code_presentation = cd.code_presentation;


In [0]:
print("‚úÖ Created table: analytics_oulad.gold_star.student_course_analytics")