# <span style="color:#1f77b4">**Data Analytics 04 - Visualization**</span>

This notebook uses Delta Live Tables SQL to create raw and aggregated tables, then views results.



### Unity Catalog storage setup

Set the catalog/schema/volume for this project. Use an existing catalog, or create one in the UI if your metastore requires a managed location.


In [0]:
# Unity Catalog config for this project
dbutils.widgets.removeAll()
dbutils.widgets.text("CATALOG", "")
dbutils.widgets.text("SCHEMA", "default")
dbutils.widgets.text("VOLUME", "spark_lab")

catalog_widget = dbutils.widgets.get("CATALOG")
if catalog_widget:
    CATALOG = catalog_widget
else:
    # Prefer current catalog, otherwise pick the first non-system catalog
    current = spark.sql("SELECT current_catalog()").first()[0]
    catalogs = [r.catalog for r in spark.sql("SHOW CATALOGS").collect()]
    CATALOG = current if current not in ("system",) else next(c for c in catalogs if c not in ("system",))

SCHEMA = dbutils.widgets.get("SCHEMA")
VOLUME = dbutils.widgets.get("VOLUME")
BASE = f"dbfs:/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}"


In [0]:
# Ensure schema and volume exist
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {CATALOG}.{SCHEMA}")
spark.sql(f"CREATE VOLUME IF NOT EXISTS {CATALOG}.{SCHEMA}.{VOLUME}")


DataFrame[]

### <span style="color:#1f77b4">**Storing CSV file to the Unity Catalog volume**</span>

Sync the product CSV into the Unity Catalog volume for this project.

- `dbutils.fs.mkdirs` creates the target directory.
- `dbutils.fs.cp` downloads the CSV file into the volume.


In [0]:
# Sync raw data files into the UC volume
products_dir = f"{BASE}/products"
dbutils.fs.rm(products_dir, recurse=True)
dbutils.fs.mkdirs(products_dir)
dbutils.fs.cp('https://raw.githubusercontent.com/Ch3rry-Pi3-Azure/DataBricks-Data-Analytics/refs/heads/main/data/products.csv', f'{BASE}/products/products.csv')


True

### <span style="color:#1f77b4">**Creating Delta tables using Spark**</span>

Create Delta tables with Spark for a UC-safe workflow.

- `write.format("delta")` persists DataFrames as Delta tables.



#### <span style="color:#1f77b4">**Loading the raw dataset in the very first layer**</span>

Create the raw Delta table from the CSV source.

- `spark.read.csv` reads from the Unity Catalog volume.


In [0]:
# Load raw products CSV from the UC volume
raw_products = (spark.read
    .option("header", "true")
    .csv(BASE + "/products/products.csv"))

# Write raw Delta table
raw_path = BASE + "/delta/raw_products"
raw_products.write.format("delta").mode("overwrite").save(raw_path)


#### <span style="color:#1f77b4">**Generating Business Aggregates from the raw layer dataset**</span>

Aggregate the raw Delta table into category totals.

- `sum` computes totals per category.
- `GROUP BY` creates the aggregation.



In [0]:
# Aggregate by category and write Delta table
raw_typed = raw_products.withColumn("ListPrice", raw_products["ListPrice"].cast("double"))
aggregated_products = (raw_typed
    .groupBy("Category")
    .sum("ListPrice")
    .withColumnRenamed("sum(ListPrice)", "Total_Price"))

agg_path = BASE + "/delta/aggregated_products"
aggregated_products.write.format("delta").mode("overwrite").save(agg_path)


### <span style="color:#1f77b4">**Viewing the final results**</span>

Load the aggregated Delta table and display the results.

- `spark.read.format("delta")` reads the DLT output.
- `display` renders the table.



In [0]:
df = spark.read.format("delta").load(BASE + "/delta/aggregated_products")
display(df)


Category,Total_Price
Headsets,261.22
Wheels,3093.0099999999998
Bottom Brackets,276.72
Touring Frames,11365.48
Mountain Bikes,53867.67999999994
Pedals,448.13
Derailleurs,212.95
Chains,20.24
Pumps,44.98
Hydration Packs,54.99
