# 03 — Time Travel

Every write to an Iceberg table creates an immutable **snapshot**. This means you can:
1. Query data as it existed at any point in time
2. List all snapshots and their metadata
3. Roll back to a previous version

This is a game-changer for debugging, auditing, and recovering from mistakes.

In [1]:
from pyspark.sql import SparkSession

spark = (
    SparkSession.builder
    .appName("IcebergDemo")
    .master("local[*]")
    .config("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.1")
    .config("spark.sql.catalog.demo", "org.apache.iceberg.spark.SparkCatalog")
    .config("spark.sql.catalog.demo.type", "hadoop")
    .config("spark.sql.catalog.demo.warehouse", "../warehouse")
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
    .getOrCreate()
)
print("Spark + Iceberg ready.")

26/02/23 13:55:50 WARN Utils: Your hostname, barkha-xg1 resolves to a loopback address: 127.0.1.1; using 192.168.1.227 instead (on interface enp195s0)
26/02/23 13:55:50 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Ivy Default Cache set to: /home/barkha/.ivy2/cache
The jars for the packages stored in: /home/barkha/.ivy2/jars
org.apache.iceberg#iceberg-spark-runtime-3.5_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-fc714ebb-10a6-4eca-98a5-379fd5292bba;1.0
	confs: [default]


:: loading settings :: url = jar:file:/home/barkha/iceberg-demo/.venv/lib/python3.13/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml


	found org.apache.iceberg#iceberg-spark-runtime-3.5_2.12;1.7.1 in central
:: resolution report :: resolve 55ms :: artifacts dl 2ms
	:: modules in use:
	org.apache.iceberg#iceberg-spark-runtime-3.5_2.12;1.7.1 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   1   |   0   |   0   |   0   ||   1   |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-fc714ebb-10a6-4eca-98a5-379fd5292bba
	confs: [default]
	0 artifacts copied, 1 already retrieved (0kB/2ms)
26/02/23 13:55:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".


Spark + Iceberg ready.


## 1. List All Snapshots

Iceberg exposes metadata tables that you can query with SQL.

In [2]:
snapshots_df = spark.sql("""
    SELECT snapshot_id, committed_at, operation, summary
    FROM demo.ecommerce.orders.snapshots
    ORDER BY committed_at
""")

snapshots_df.show(truncate=False)

+-------------------+-----------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|snapshot_id        |committed_at           |operation|summary                                                                                                                                                                                                                                                                                                                                                            

In [3]:
# Save the first snapshot ID for time-travel queries below
snapshot_ids = [row.snapshot_id for row in snapshots_df.collect()]
first_snapshot_id = snapshot_ids[0]
print(f"First snapshot ID: {first_snapshot_id}")
print(f"Total snapshots: {len(snapshot_ids)}")

First snapshot ID: 8255495233418216272
Total snapshots: 5


26/02/23 13:56:08 WARN GarbageCollectionMetrics: To enable non-built-in garbage collector(s) List(G1 Concurrent GC), users should configure it(them) to spark.eventLog.gcMetrics.youngGenerationGarbageCollectors or spark.eventLog.gcMetrics.oldGenerationGarbageCollectors


## 2. Query a Previous Snapshot

Use `VERSION AS OF <snapshot_id>` to read data as it was at a specific snapshot.

In [4]:
print(f"Data at the FIRST snapshot (snapshot_id = {first_snapshot_id}):")
print("This was right after the initial INSERT in notebook 01.")
print()

spark.sql(f"""
    SELECT * FROM demo.ecommerce.orders
    VERSION AS OF {first_snapshot_id}
    ORDER BY order_id
""").show()

Data at the FIRST snapshot (snapshot_id = 8255495233418216272):
This was right after the initial INSERT in notebook 01.

+--------+--------+----------+--------+------+----------+
|order_id|customer|   product|quantity| price|order_date|
+--------+--------+----------+--------+------+----------+
|       1|   Alice|    Laptop|       1|999.99|2024-01-15|
|       2|     Bob|     Mouse|       2| 29.99|2024-01-16|
|       3| Charlie|  Keyboard|       1| 79.99|2024-01-16|
|       4|   Alice|   Monitor|       1|349.99|2024-01-17|
|       5|   Diana|Headphones|       3| 59.99|2024-01-18|
+--------+--------+----------+--------+------+----------+



In [5]:
print("Data at the CURRENT (latest) snapshot:")
print()

spark.sql("SELECT * FROM demo.ecommerce.orders ORDER BY order_id").show()

Data at the CURRENT (latest) snapshot:

+--------+--------+----------+--------+-------+----------+
|order_id|customer|   product|quantity|  price|order_date|
+--------+--------+----------+--------+-------+----------+
|       1|   Alice|    Laptop|       1| 899.99|2024-01-15|
|       2|     Bob|     Mouse|       2|  24.99|2024-01-16|
|       3| Charlie|  Keyboard|       1|  79.99|2024-01-16|
|       4|   Alice|   Monitor|       1| 349.99|2024-01-17|
|       5|   Diana|Headphones|       3|  59.99|2024-01-18|
|       7|     Bob|   USB Hub|       2|  24.99|2024-01-20|
|       8| Charlie|    Laptop|       1|1099.99|2024-02-01|
|       9|   Diana|     Mouse|       1|  29.99|2024-02-02|
|      10|   Frank|  Keyboard|       2|  79.99|2024-02-03|
+--------+--------+----------+--------+-------+----------+



## 3. View the History Table

The `history` metadata table shows which snapshot was current at each point in time.

In [6]:
spark.sql("""
    SELECT * FROM demo.ecommerce.orders.history
""").show(truncate=False)

+-----------------------+-------------------+-------------------+-------------------+
|made_current_at        |snapshot_id        |parent_id          |is_current_ancestor|
+-----------------------+-------------------+-------------------+-------------------+
|2026-02-23 13:52:53.103|8255495233418216272|NULL               |true               |
|2026-02-23 13:54:56.341|6868300044641218102|8255495233418216272|true               |
|2026-02-23 13:55:02.902|6199043620124285204|6868300044641218102|true               |
|2026-02-23 13:55:11.259|991196052439615954 |6199043620124285204|true               |
|2026-02-23 13:55:16.238|741606390684588596 |991196052439615954 |true               |
+-----------------------+-------------------+-------------------+-------------------+



## 4. Rollback to a Previous Snapshot

Made a mistake? Roll back the table to any previous snapshot.

This is a metadata-only operation — no data files are rewritten!

In [7]:
print("Before rollback:")
spark.sql("SELECT COUNT(*) AS row_count FROM demo.ecommerce.orders").show()

# Roll back to the first snapshot
spark.sql(f"""
    CALL demo.system.rollback_to_snapshot('ecommerce.orders', {first_snapshot_id})
""")

print(f"After rollback to snapshot {first_snapshot_id}:")
spark.sql("SELECT * FROM demo.ecommerce.orders ORDER BY order_id").show()

Before rollback:
+---------+
|row_count|
+---------+
|        9|
+---------+

After rollback to snapshot 8255495233418216272:
+--------+--------+----------+--------+------+----------+
|order_id|customer|   product|quantity| price|order_date|
+--------+--------+----------+--------+------+----------+
|       1|   Alice|    Laptop|       1|999.99|2024-01-15|
|       2|     Bob|     Mouse|       2| 29.99|2024-01-16|
|       3| Charlie|  Keyboard|       1| 79.99|2024-01-16|
|       4|   Alice|   Monitor|       1|349.99|2024-01-17|
|       5|   Diana|Headphones|       3| 59.99|2024-01-18|
+--------+--------+----------+--------+------+----------+



## 5. Restore the Latest State

Let's re-insert data so the next notebooks have something to work with.

In [8]:
# Roll forward to the latest snapshot
last_snapshot_id = snapshot_ids[-1]

spark.sql(f"""
    CALL demo.system.rollback_to_snapshot('ecommerce.orders', {last_snapshot_id})
""")

print("Restored to latest snapshot.")
spark.sql("SELECT * FROM demo.ecommerce.orders ORDER BY order_id").show()

Py4JJavaError: An error occurred while calling o38.sql.
: org.apache.iceberg.exceptions.ValidationException: Cannot roll back to snapshot, not an ancestor of the current state: 741606390684588596
	at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
	at org.apache.iceberg.SetSnapshotOperation.rollbackTo(SetSnapshotOperation.java:84)
	at org.apache.iceberg.SnapshotManager.rollbackTo(SnapshotManager.java:67)
	at org.apache.iceberg.spark.procedures.RollbackToSnapshotProcedure.lambda$call$0(RollbackToSnapshotProcedure.java:88)
	at org.apache.iceberg.spark.procedures.BaseProcedure.execute(BaseProcedure.java:107)
	at org.apache.iceberg.spark.procedures.BaseProcedure.modifyIcebergTable(BaseProcedure.java:88)
	at org.apache.iceberg.spark.procedures.RollbackToSnapshotProcedure.call(RollbackToSnapshotProcedure.java:83)
	at org.apache.spark.sql.execution.datasources.v2.CallExec.run(CallExec.scala:34)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:638)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:629)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:659)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:75)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:52)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:1583)


## Key Takeaway

| Feature              | How it works                                    |
|----------------------|-------------------------------------------------|
| Snapshot history     | Every write creates an immutable snapshot        |
| Time-travel queries  | `VERSION AS OF <snapshot_id>`                    |
| Metadata tables      | `.snapshots`, `.history`, `.files`, etc.         |
| Rollback             | Metadata-only — instant, no data rewrite         |

**Next up:** Schema evolution in notebook 04!