# 01 — Setup & First Iceberg Table

In this notebook we will:
1. Initialize a SparkSession with Iceberg support
2. Create a namespace (database)
3. Create our first Iceberg table
4. Insert sample data
5. Run basic queries

## 1. Initialize Spark with Iceberg

In [None]:
from pyspark.sql import SparkSession

spark = (
    SparkSession.builder
    .appName("IcebergDemo")
    .master("local[*]")
    # Pull in the Iceberg runtime JAR (downloaded automatically)
    .config("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.1")
    # Register the Iceberg catalog
    .config("spark.sql.catalog.demo", "org.apache.iceberg.spark.SparkCatalog")
    .config("spark.sql.catalog.demo.type", "hadoop")
    .config("spark.sql.catalog.demo.warehouse", "../warehouse")
    # Iceberg SQL extensions (MERGE INTO, time-travel, etc.)
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
    .getOrCreate()
)

print(f"Spark version: {spark.version}")
print("Iceberg catalog 'demo' is ready!")

## 2. Create a Namespace

A namespace in Iceberg is like a database — it groups related tables.

In [None]:
spark.sql("CREATE NAMESPACE IF NOT EXISTS demo.ecommerce")
spark.sql("SHOW NAMESPACES IN demo").show()

## 3. Create an Iceberg Table

In [None]:
spark.sql("""
    CREATE TABLE IF NOT EXISTS demo.ecommerce.orders (
        order_id   INT,
        customer   STRING,
        product    STRING,
        quantity   INT,
        price      DOUBLE,
        order_date DATE
    )
    USING iceberg
""")

print("Table created!")
spark.sql("SHOW TABLES IN demo.ecommerce").show()

## 4. Insert Sample Data

In [None]:
spark.sql("""
    INSERT INTO demo.ecommerce.orders VALUES
        (1,  'Alice',   'Laptop',     1, 999.99,  DATE '2024-01-15'),
        (2,  'Bob',     'Mouse',      2, 29.99,   DATE '2024-01-16'),
        (3,  'Charlie', 'Keyboard',   1, 79.99,   DATE '2024-01-16'),
        (4,  'Alice',   'Monitor',    1, 349.99,  DATE '2024-01-17'),
        (5,  'Diana',   'Headphones', 3, 59.99,   DATE '2024-01-18')
""")

print("5 rows inserted.")

## 5. Query the Table

In [None]:
spark.sql("SELECT * FROM demo.ecommerce.orders ORDER BY order_id").show()

In [None]:
# Quick aggregation
spark.sql("""
    SELECT customer, COUNT(*) AS num_orders, ROUND(SUM(price * quantity), 2) AS total_spent
    FROM demo.ecommerce.orders
    GROUP BY customer
    ORDER BY total_spent DESC
""").show()

## What Just Happened?

Under the hood, Iceberg created:
- **Data files** (Parquet) in `warehouse/ecommerce/orders/data/`
- **Metadata files** (JSON + Avro) in `warehouse/ecommerce/orders/metadata/`

This metadata layer is what gives Iceberg its superpowers — time travel, schema evolution, and more.

**Next up:** CRUD operations in notebook 02!