# LAB 01: Platform & Workspace Setup

**Duration:** ~35 min | **Day:** 1 | **Difficulty:** Beginner

> Complete the `# TODO` cells below. Each task has an `assert` cell to verify your solution.

## Setup

Run the cell below to initialize your environment.

In [None]:
%run ../../setup/00_setup

## Task 1: Verify Catalog Context

Use `spark.sql()` to run `SELECT current_catalog(), current_schema()` and display the result.

In [None]:
# TODO: Run SQL to check current catalog and schema
# Expected: your catalog (retailhub_...) and default schema

df_context = spark.sql("________")
display(df_context)

In [None]:
# -- Validation --
row = df_context.first()
assert "retailhub" in row[0].lower(), f"Expected catalog starting with 'retailhub', got: {row[0]}"
print(f"Catalog: {row[0]}, Schema: {row[1]}")

## Task 2: List Files in Your Volume

Use `dbutils.fs.ls()` to list files in the Volume you created.

Volume path format: `/Volumes/{catalog}/{schema}/{volume_name}/`

In [None]:
# TODO: List files in your datasets Volume
# Hint: use the CATALOG variable from setup

volume_path = f"/Volumes/{CATALOG}/default/datasets/"
files = dbutils.fs.ls(________)
for f in files:
    print(f"{f.name:40s} {f.size:>10,} bytes")

In [None]:
# -- Validation --
file_names = [f.name for f in files]
assert len(files) > 0, "No files found in Volume! Did you upload them?"
print(f"Found {len(files)} items in Volume. OK!")

## Task 3: Read CSV File

Read the `customers.csv` file from your Volume into a DataFrame.

Requirements:
- Use `spark.read.format("csv")`
- Enable header: `.option("header", True)`
- Enable schema inference: `.option("inferSchema", True)`

In [None]:
# TODO: Read customers.csv from your Volume

customers_path = f"/Volumes/{CATALOG}/default/datasets/customers.csv"

df_customers = (
    spark.read
    .format(________)
    .option("header", ________)
    .option("inferSchema", ________)
    .load(customers_path)
)

display(df_customers.limit(5))

In [None]:
# -- Validation --
assert df_customers.count() > 0, "DataFrame is empty!"
assert "customer_id" in df_customers.columns or "id" in df_customers.columns, "Expected customer ID column"
print(f"Loaded {df_customers.count()} customers with {len(df_customers.columns)} columns. OK!")

## Task 4: Inspect Schema

Print the schema of the `df_customers` DataFrame using `.printSchema()`.

In [None]:
# TODO: Print the schema

df_customers.________()

## Task 5: Explore dbutils

Use `dbutils.fs.head()` to read the first 200 bytes of the customers file (raw content).

In [None]:
# TODO: Read first 200 bytes of the raw file

raw_content = dbutils.fs.head(customers_path, ________)
print(raw_content)

In [None]:
# -- Validation --
assert len(raw_content) > 0, "No content returned"
assert "," in raw_content, "Expected CSV format with commas"
print("Raw CSV content displayed. OK!")

## Lab Complete!

You have:
- Verified your catalog context
- Listed files in a Unity Catalog Volume
- Read a CSV file into a Spark DataFrame
- Inspected schema and raw file content

> **Next:** LAB 01 - ELT Ingestion & Transformations