# 02 — CRUD Operations

Iceberg supports true row-level updates — unlike Hive tables where you'd have to rewrite entire partitions.

In this notebook:
1. INSERT more rows
2. UPDATE existing rows
3. DELETE rows
4. MERGE INTO (upsert)

In [None]:
from pyspark.sql import SparkSession

spark = (
    SparkSession.builder
    .appName("IcebergDemo")
    .master("local[*]")
    .config("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.1")
    .config("spark.sql.catalog.demo", "org.apache.iceberg.spark.SparkCatalog")
    .config("spark.sql.catalog.demo.type", "hadoop")
    .config("spark.sql.catalog.demo.warehouse", "../warehouse")
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
    .getOrCreate()
)
print("Spark + Iceberg ready.")

## Current state of the table

In [None]:
spark.sql("SELECT * FROM demo.ecommerce.orders ORDER BY order_id").show()

## 1. INSERT — Add More Orders

In [None]:
spark.sql("""
    INSERT INTO demo.ecommerce.orders VALUES
        (6,  'Eve',   'Webcam',   1, 89.99,  DATE '2024-01-19'),
        (7,  'Bob',   'USB Hub',  2, 24.99,  DATE '2024-01-20'),
        (8,  'Charlie', 'Laptop', 1, 1099.99, DATE '2024-02-01')
""")

spark.sql("SELECT * FROM demo.ecommerce.orders ORDER BY order_id").show()

## 2. UPDATE — Fix a Price

Oops — Alice's laptop was supposed to be $899.99, not $999.99.

In [None]:
spark.sql("""
    UPDATE demo.ecommerce.orders
    SET price = 899.99
    WHERE order_id = 1
""")

spark.sql("SELECT * FROM demo.ecommerce.orders WHERE order_id = 1").show()

## 3. DELETE — Remove a Cancelled Order

Eve cancelled her webcam order.

In [None]:
spark.sql("""
    DELETE FROM demo.ecommerce.orders
    WHERE order_id = 6
""")

spark.sql("SELECT * FROM demo.ecommerce.orders ORDER BY order_id").show()

## 4. MERGE INTO — Upsert New Data

`MERGE INTO` lets you do an "upsert": update rows that match, insert rows that don't.

This is one of Iceberg's most powerful features for incremental data loading.

In [None]:
# Create a temporary view with incoming data
spark.sql("""
    CREATE OR REPLACE TEMP VIEW incoming_orders AS
    SELECT * FROM VALUES
        (2,  'Bob',   'Mouse',      2,  24.99,  DATE '2024-01-16'),  -- updated price
        (9,  'Diana', 'Mouse',      1,  29.99,  DATE '2024-02-02'),  -- new order
        (10, 'Frank', 'Keyboard',   2,  79.99,  DATE '2024-02-03')   -- new order
    AS t(order_id, customer, product, quantity, price, order_date)
""")

spark.sql("""
    MERGE INTO demo.ecommerce.orders target
    USING incoming_orders source
    ON target.order_id = source.order_id
    WHEN MATCHED THEN UPDATE SET *
    WHEN NOT MATCHED THEN INSERT *
""")

spark.sql("SELECT * FROM demo.ecommerce.orders ORDER BY order_id").show()

## Key Takeaway

| Operation  | Traditional Hive           | Iceberg                    |
|------------|----------------------------|----------------------------|
| INSERT     | Supported                  | Supported                  |
| UPDATE     | Rewrite entire partition   | Row-level update           |
| DELETE     | Rewrite entire partition   | Row-level delete           |
| MERGE INTO | Not natively supported     | Built-in upsert support    |

Every operation we just ran created a **new snapshot** — we'll explore those in the next notebook!

**Next up:** Time travel in notebook 03!