## Gold Layer

In [0]:
import dlt
from pyspark.sql.functions import col, count

In [0]:
silver_path = "abfss://silver@storageforproject.dfs.core.windows.net/netflix_cast"

In [0]:
@dlt.table(comment="Read Delta files from silver container (batch)")
def netflix_cast():
    
    return (spark.read.format("delta")
            .load(silver_path)
            )

If the Silver data is static and only updated occasionally, `batch` reading makes sense. Batch reads are straightforward and efficient for one-time processing. However, if the Silver layer is continuously being updated with new data (like in a real-time pipeline), `streaming` would be better to process those updates incrementally.

In [0]:
@dlt.table(comment="Stream Delta files from silver container")
def netflix_casts():
      
    df = (spark.readStream.format("delta")
          .load(silver_path))
    return df

In [0]:
# @dlt.table

when reading from a Delta table that's part of the same pipeline, using dlt.read is better for lineage and dependency management.

In [0]:
# DLT expectations are for validating data, not transforming it.
# EXPECTATION RULES (DATA QUALITY CHECKS)
expectation_rules = {
    "valid_show_id": "show_id IS NOT NULL",          # Drop rows with null show_id
    "valid_release_year": "release_year IS NOT NULL",# Drop rows with null release_year
    "valid_date_added": "date_added IS NOT NULL"     # Drop rows with null date_added
}