# Changing the Schema in Apache Iceberg

In [1]:
from pyspark.sql import SparkSession
import os

In [2]:
spark = (
    SparkSession.builder
    .appName("Schema Evolution in Iceberg")
    .master("spark://spark:7077") 
    .getOrCreate()
)

In [3]:
spark.sql("""DESCRIBE TABLE ice.demo.customers;""").show()

+--------------------+---------+-------+
|            col_name|data_type|comment|
+--------------------+---------+-------+
|                  id|   bigint|   NULL|
|                name|   string|   NULL|
|               email|   string|   NULL|
|# Partition Infor...|         |       |
|          # col_name|data_type|comment|
|               email|   string|   NULL|
+--------------------+---------+-------+



In [4]:
spark.sql("""ALTER TABLE ice.demo.customers
ADD COLUMN country STRING COMMENT 'ISO 3166-1 code' AFTER email;""")

DataFrame[]

In [5]:
spark.sql("""DESCRIBE TABLE ice.demo.customers;""").show()

+--------------------+---------+---------------+
|            col_name|data_type|        comment|
+--------------------+---------+---------------+
|                  id|   bigint|           NULL|
|                name|   string|           NULL|
|               email|   string|           NULL|
|             country|   string|ISO 3166-1 code|
|# Partition Infor...|         |               |
|          # col_name|data_type|        comment|
|               email|   string|           NULL|
+--------------------+---------+---------------+



In [6]:
spark.sql("""UPDATE ice.demo.customers
SET country = 'US'
WHERE email LIKE '%@example.com';""").show()

++
||
++
++



In [7]:
spark.sql("SELECT * FROM ice.demo.customers").show()

+---+--------------+--------------------+-------+
| id|          name|               email|country|
+---+--------------+--------------------+-------+
| 22|      Zoe King|zoe.king@example.com|     US|
| 21|   Jack Wilson|jack.wilson@examp...|     US|
| 15|Isabella Rossi|isabella.rossi@ex...|     US|
|  1|   Alice Smith|   alice@example.com|     US|
|  2|   Bob Johnson|     bob@example.com|     US|
| 12|  Lucas Martin|lucas.martin@exam...|     US|
| 18|   Henry Scott|henry.scott@examp...|     US|
| 16|  James Nguyen|james.nguyen@exam...|     US|
|  5|    Maya Patel|maya.patel@exampl...|     US|
|  7| Sofia Almeida|sofia.almeida@exa...|     US|
| 19|  Aria Johnson|aria.johnson@exam...|     US|
|  3|   Carol Adams|   carol@example.com|     US|
| 20| Daniela Costa|daniela.costa@exa...|     US|
|  4| Diego Ramirez|diego.ramirez@exa...|     US|
| 17|    Mila Novak|mila.novak@exampl...|     US|
|  8| Noah Williams|noah.williams@exa...|     US|
|  9|  Ava Thompson|ava.thompson@exam...|     US|


Let's go back before we edited the schema. Here is a list of operations in Iceberg

| Operation            | Meaning                                                                 | Typical Trigger / Example                                     |
|----------------------|-------------------------------------------------------------------------|---------------------------------------------------------------|
| **append**           | Adds new data files to the table without touching existing ones         | `INSERT INTO ...`, batch ingest                               |
| **overwrite**        | Replaces existing data files with new ones                              | `INSERT OVERWRITE`, Spark `.mode("overwrite")` writes         |
| **replace partitions** | Overwrites only affected partitions, leaving others intact            | Dynamic partition overwrite in Spark streaming                |
| **delete**           | Removes rows from files (position deletes or equality deletes)          | `DELETE FROM table WHERE ...`                                 |
| **update**           | Updates rows (internally: delete + insert of modified rows)             | `UPDATE table SET ... WHERE ...`                              |
| **rewrite** (or `replace`) | Rewrites data files without logical changes (optimization/compaction) | `REWRITE DATA`, clustering, file compaction                   |
| **fast-append**      | Fast ingestion, skips some validation checks (legacy mode)              | Optimized append from some engines                            |

In [8]:
spark.sql("""SELECT snapshot_id, committed_at, operation
FROM ice.demo.customers.snapshots
ORDER BY committed_at DESC
LIMIT 5;""").show(truncate=False)

+-------------------+-----------------------+---------+
|snapshot_id        |committed_at           |operation|
+-------------------+-----------------------+---------+
|6069821547749819533|2025-10-27 16:47:48.751|overwrite|
|6233251210790417521|2025-10-27 16:46:37.232|append   |
|5355358217514325757|2025-10-27 16:46:30.234|append   |
+-------------------+-----------------------+---------+



Notice at this point that there is the old schema since we had a different schema back then

In [10]:
spark.sql("""SELECT *
FROM ice.demo.customers
VERSION AS OF '6233251210790417521';""").show(truncate=False)

+---+--------------+--------------------------+
|id |name          |email                     |
+---+--------------+--------------------------+
|3  |Carol Adams   |carol@example.com         |
|1  |Alice Smith   |alice@example.com         |
|2  |Bob Johnson   |bob@example.com           |
|22 |Zoe King      |zoe.king@example.com      |
|21 |Jack Wilson   |jack.wilson@example.com   |
|15 |Isabella Rossi|isabella.rossi@example.com|
|12 |Lucas Martin  |lucas.martin@example.com  |
|18 |Henry Scott   |henry.scott@example.com   |
|16 |James Nguyen  |james.nguyen@example.com  |
|5  |Maya Patel    |maya.patel@example.com    |
|7  |Sofia Almeida |sofia.almeida@example.com |
|19 |Aria Johnson  |aria.johnson@example.com  |
|20 |Daniela Costa |daniela.costa@example.com |
|4  |Diego Ramirez |diego.ramirez@example.com |
|8  |Noah Williams |noah.williams@example.com |
|17 |Mila Novak    |mila.novak@example.com    |
|9  |Ava Thompson  |ava.thompson@example.com  |
|11 |Olivia Garcia |olivia.garcia@exampl

In [11]:
spark.stop()