# Gold Layer – Star Schema for OULAD (Open University Learning Analytics Dataset)

This notebook builds the **Gold layer** of our Lakehouse for the **OULAD dataset**, using a
**star schema** design.

We assume the **Bronze (raw)** and **Silver (clean)** layers are already created in the catalog:

- Catalog: `analytics_oulad`
- Schemas:
  - `bronze_raw`     – raw CSV ingested as Delta
  - `silver_clean`   – cleaned, typed base tables
  - `gold_star`      – **(this notebook)** BI-ready star schema

The Gold layer will expose **dimension tables** and **fact tables** that can be consumed by
BI tools (Looker Studio, Power BI, etc.) and by downstream ML pipelines.


## 1. Star Schema Concepts

A **star schema** is a dimensional modeling pattern used in data warehouses and BI systems.
It consists of:

- **Fact tables**
  - Contain numeric, aggregatable measures (e.g., `score`, `clicks`)
  - Have foreign keys that reference dimensions
  - Represent business events at a specific **grain** (level of detail)

- **Dimension tables**
  - Contain descriptive attributes (e.g., `gender`, `region`, `activity_type`)
  - Provide context for slicing and dicing facts
  - Typically smaller and denormalized for fast querying

The structure looks like a ⭐ where the **fact** table is at the center and
the **dimensions** are the points of the star.

In this project we will build:

**Dimensions**
- `gold_star.dim_student`
- `gold_star.dim_course`
- `gold_star.dim_assessment`

**Facts**
- `gold_star.fact_assessment_score`
- `gold_star.fact_vle_interactions`


In [0]:
%sql
USE CATALOG analytics_oulad;
USE SCHEMA gold_star;

## 2. Grain (Level of Detail) of Fact Tables

Before creating tables, we must define the **grain** for each fact table.

### 2.1 `fact_assessment_score`
Each row represents **one student's result for one assessment**.

Grain:
> *"Student × Assessment"*

Key columns:
- `student_id`
- `assessment_id`

Measures:
- `score`
- `is_banked` (banked result)
- `date_submitted` (for timeliness analysis)

Context columns:
- `code_module`
- `code_presentation`
- `assessment_type`
- `weight`

---

### 2.2 `fact_vle_interactions`
Each row represents **one student's interactions with one VLE site on one day**.

Grain:
> *"Student × VLE Site × Date"*

Key columns:
- `student_id`
- `id_site`
- `date`

Measures:
- `clicks` (number of clicks)

Context columns:
- `code_module`
- `code_presentation`
- `activity_type`


## 3. Dimension Tables

### 3.1 `dim_student`
Describes **who** the student is.

Columns (attributes):
- `student_id` (natural key from OULAD)
- `gender`
- `region`
- `highest_education`
- `imd_band` (deprivation index band)
- `age_band`
- `disability`

### 3.2 `dim_course`
Describes **which course and presentation** the data relates to.

Columns:
- `code_module`
- `code_presentation`
- `presentation_length` (days)

### 3.3 `dim_assessment`
Describes **what assessment** the score belongs to.

Columns:
- `assessment_id`
- `code_module`
- `code_presentation`
- `assessment_type` (TMA, CMA, exam, etc.)
- `assessment_date` (day number relative to course start)
- `weight` (contribution to final score)

> **Note:** We are using **natural keys** (e.g., `student_id`, `assessment_id`) as primary keys here.
> In a larger production system we might introduce **surrogate keys** (integer identities)
> to decouple the warehouse from source system changes.


In [0]:
%sql
CREATE OR REPLACE TABLE analytics_oulad.gold_star.dim_student AS
SELECT DISTINCT
    student_id,
    gender,
    region,
    highest_education,
    imd_band,
    age_band,
    disability
FROM analytics_oulad.silver_clean.student_info_base;


In [0]:
%sql
SELECT * FROM analytics_oulad.gold_star.dim_student
LIMIT 5

In [0]:
%sql
CREATE OR REPLACE TABLE analytics_oulad.gold_star.dim_course AS
SELECT DISTINCT
    code_module,
    code_presentation,
    presentation_length
FROM analytics_oulad.silver_clean.courses_base;

In [0]:
%sql
CREATE OR REPLACE TABLE analytics_oulad.gold_star.dim_assessment AS
SELECT
    assessment_id,
    code_module,
    code_presentation,
    assessment_type,
    assessment_date,
    weight
FROM analytics_oulad.silver_clean.assessments_base;
