
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>


# Manipulate Delta Tables Lab

This notebook provides a hands-on review of some of the more esoteric features Delta Lake brings to the data lakehouse.

## Learning Objectives
By the end of this lab, you should be able to:
- Review table history
- Query previous table versions and rollback a table to a specific version

## REQUIRED - SELECT CLASSIC COMPUTE

Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:

1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

1. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

  - In the drop-down, select **More**.

  - In the **Attach to an existing compute resource** pop-up, select the first drop-down. You will see a unique cluster name in that drop-down. Please select that cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

1. Find the triangle icon to the right of your compute cluster name and click it.

1. Wait a few minutes for the cluster to start.

1. Once the cluster is running, complete the steps above to select your cluster.

## Classroom Setup

Run the following cell to configure your working environment for this course. It will also set your default catalog to **dbacademy** and the schema to your specific schema name shown below using the `USE` statements.
<br></br>


```
USE CATALOG dbacademy;
USE SCHEMA dbacademy.<your unique schema name>;
```

**NOTE:** The `DA` object is only used in Databricks Academy courses and is not available outside of these courses. It will dynamically reference the information needed to run the course.

In [0]:
%run ./Includes/Classroom-Setup-8L

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


0,1
Course Catalog:,
Your Schema:,


## Create History of Bean Collection

The cell below includes various table operations, resulting in the following schema for the **`beans`** table:

| Field Name | Field type |
| --- | --- |
| name | STRING |
| color | STRING |
| grams | FLOAT |
| delicious | BOOLEAN |

In [0]:
CREATE OR REPLACE TABLE beans 
(name STRING, color STRING, grams FLOAT, delicious BOOLEAN);

INSERT INTO beans VALUES
("black", "black", 500, true),
("lentils", "brown", 1000, true),
("jelly", "rainbow", 42.5, false);

INSERT INTO beans VALUES
('pinto', 'brown', 1.5, true),
('green', 'green', 178.3, true),
('beanbag chair', 'white', 40000, false);

UPDATE beans
SET delicious = true
WHERE name = "jelly";

UPDATE beans
SET grams = 1500
WHERE name = 'pinto';

DELETE FROM beans
WHERE delicious = false;

CREATE OR REPLACE TEMP VIEW new_beans(name, color, grams, delicious) AS VALUES
('black', 'black', 60.5, true),
('lentils', 'green', 500, true),
('kidney', 'red', 387.2, true),
('castor', 'brown', 25, false);

MERGE INTO beans a
USING new_beans b
ON a.name=b.name AND a.color = b.color
WHEN MATCHED THEN
  UPDATE SET grams = a.grams + b.grams
WHEN NOT MATCHED AND b.delicious = true THEN
  INSERT *;

num_affected_rows,num_updated_rows,num_deleted_rows,num_inserted_rows
3,1,0,2


## Review the Table History

Delta Lake's transaction log stores information about each transaction that modifies a table's contents or settings.

Review the history of the **`beans`** table below.

In [0]:
DESCRIBE HISTORY beans

version,timestamp,userId,userName,operation,operationParameters,job,notebook,clusterId,readVersion,isolationLevel,isBlindAppend,operationMetrics,userMetadata,engineInfo
8,2025-05-23T06:45:47Z,7363748412033156,labuser10342477_1747973406@vocareum.com,MERGE,"Map(predicate -> [""((name#57349 = name#57337) AND (color#57350 = color#57338))""], matchedPredicates -> [{""actionType"":""update""}], statsOnLoad -> false, notMatchedBySourcePredicates -> [], notMatchedPredicates -> [{""predicate"":""delicious#57340: boolean"",""actionType"":""insert""}])",,List(296149350687239),0523-041054-ysw25i30,6.0,WriteSerializable,False,"Map(numTargetRowsCopied -> 0, numTargetRowsDeleted -> 0, numTargetFilesAdded -> 3, numTargetBytesAdded -> 4010, numTargetBytesRemoved -> 0, numTargetDeletionVectorsAdded -> 1, numTargetRowsMatchedUpdated -> 1, executionTimeMs -> 4109, materializeSourceTimeMs -> 156, numTargetRowsInserted -> 2, conflictDetectionTimeMs -> 142, numTargetRowsMatchedDeleted -> 0, numTargetDeletionVectorsUpdated -> 1, scanTimeMs -> 1664, numTargetRowsUpdated -> 1, numOutputRows -> 3, numTargetDeletionVectorsRemoved -> 1, numTargetRowsNotMatchedBySourceUpdated -> 0, numTargetChangeFilesAdded -> 0, numSourceRows -> 4, numTargetFilesRemoved -> 0, numTargetRowsNotMatchedBySourceDeleted -> 0, rewriteTimeMs -> 2244)",,Databricks-Runtime/15.4.x-scala2.12
7,2025-05-23T06:45:46Z,7363748412033156,labuser10342477_1747973406@vocareum.com,OPTIMIZE,"Map(predicate -> [], auto -> true, clusterBy -> [], zOrderBy -> [], batchId -> 0)",,List(296149350687239),0523-041054-ysw25i30,6.0,SnapshotIsolation,False,"Map(numRemovedFiles -> 2, numRemovedBytes -> 2810, p25FileSize -> 1432, numDeletionVectorsRemoved -> 1, minFileSize -> 1432, numAddedFiles -> 1, maxFileSize -> 1432, p75FileSize -> 1432, p50FileSize -> 1432, numAddedBytes -> 1432)",,Databricks-Runtime/15.4.x-scala2.12
6,2025-05-23T06:45:41Z,7363748412033156,labuser10342477_1747973406@vocareum.com,DELETE,"Map(predicate -> [""NOT delicious#56021""])",,List(296149350687239),0523-041054-ysw25i30,5.0,WriteSerializable,False,"Map(numRemovedFiles -> 0, numRemovedBytes -> 0, numCopiedRows -> 0, numDeletionVectorsAdded -> 1, numDeletionVectorsRemoved -> 1, numAddedChangeFiles -> 0, executionTimeMs -> 1319, numDeletionVectorsUpdated -> 1, numDeletedRows -> 1, scanTimeMs -> 817, numAddedFiles -> 0, numAddedBytes -> 0, rewriteTimeMs -> 502)",,Databricks-Runtime/15.4.x-scala2.12
5,2025-05-23T06:45:38Z,7363748412033156,labuser10342477_1747973406@vocareum.com,OPTIMIZE,"Map(predicate -> [], auto -> true, clusterBy -> [], zOrderBy -> [], batchId -> 0)",,List(296149350687239),0523-041054-ysw25i30,3.0,SnapshotIsolation,False,"Map(numRemovedFiles -> 3, numRemovedBytes -> 4092, p25FileSize -> 1475, numDeletionVectorsRemoved -> 1, conflictDetectionTimeMs -> 95, minFileSize -> 1475, numAddedFiles -> 1, maxFileSize -> 1475, p75FileSize -> 1475, p50FileSize -> 1475, numAddedBytes -> 1475)",,Databricks-Runtime/15.4.x-scala2.12
4,2025-05-23T06:45:37Z,7363748412033156,labuser10342477_1747973406@vocareum.com,UPDATE,"Map(predicate -> [""(name#54633 = pinto)""])",,List(296149350687239),0523-041054-ysw25i30,3.0,WriteSerializable,False,"Map(numRemovedFiles -> 0, numRemovedBytes -> 0, numCopiedRows -> 0, numDeletionVectorsAdded -> 1, numDeletionVectorsRemoved -> 0, numAddedChangeFiles -> 0, executionTimeMs -> 2269, numDeletionVectorsUpdated -> 0, scanTimeMs -> 648, numAddedFiles -> 1, numUpdatedRows -> 1, numAddedBytes -> 1335, rewriteTimeMs -> 1617)",,Databricks-Runtime/15.4.x-scala2.12
3,2025-05-23T06:45:34Z,7363748412033156,labuser10342477_1747973406@vocareum.com,UPDATE,"Map(predicate -> [""(name#53589 = jelly)""])",,List(296149350687239),0523-041054-ysw25i30,2.0,WriteSerializable,False,"Map(numRemovedFiles -> 0, numRemovedBytes -> 0, numCopiedRows -> 0, numDeletionVectorsAdded -> 1, numDeletionVectorsRemoved -> 0, numAddedChangeFiles -> 0, executionTimeMs -> 2304, numDeletionVectorsUpdated -> 0, scanTimeMs -> 984, numAddedFiles -> 1, numUpdatedRows -> 1, numAddedBytes -> 1349, rewriteTimeMs -> 1312)",,Databricks-Runtime/15.4.x-scala2.12
2,2025-05-23T06:45:30Z,7363748412033156,labuser10342477_1747973406@vocareum.com,WRITE,"Map(mode -> Append, statsOnLoad -> false, partitionBy -> [])",,List(296149350687239),0523-041054-ysw25i30,1.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 3, numOutputBytes -> 1379)",,Databricks-Runtime/15.4.x-scala2.12
1,2025-05-23T06:45:28Z,7363748412033156,labuser10342477_1747973406@vocareum.com,WRITE,"Map(mode -> Append, statsOnLoad -> false, partitionBy -> [])",,List(296149350687239),0523-041054-ysw25i30,0.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 3, numOutputBytes -> 1364)",,Databricks-Runtime/15.4.x-scala2.12
0,2025-05-23T06:45:26Z,7363748412033156,labuser10342477_1747973406@vocareum.com,CREATE OR REPLACE TABLE,"Map(partitionBy -> [], clusterBy -> [], description -> null, isManaged -> true, properties -> {""delta.enableDeletionVectors"":""true""}, statsOnLoad -> false)",,List(296149350687239),0523-041054-ysw25i30,,WriteSerializable,True,Map(),,Databricks-Runtime/15.4.x-scala2.12


If all the previous operations were completed as described you should see 9 versions of the table (**NOTE**: Delta Lake versioning starts with 0, so the max version number will be 8).

The operations should be as follows:

| version | operation |
| --- | --- |
| 0 | CREATE TABLE |
| 1 | WRITE |
| 2 | WRITE |
| 3 | UPDATE |
| 4 | OPTIMIZE |
| 5 | UPDATE |
| 6 | DELETE |
| 7 | OPTIMIZE |
| 8 | MERGE |

The **`operationsParameters`** column will let you review predicates used for updates, deletes, and merges. The **`operationMetrics`** column indicates how many rows and files are added in each operation.

Spend some time reviewing the Delta Lake history to understand which table version matches with a given transaction.

**NOTE**: The **`version`** column designates the state of a table once a given transaction completes. The **`readVersion`** column indicates the version of the table an operation executed against. In this simple demo (with no concurrent transactions), this relationship should always increment by 1.

## Query a Specific Version

After reviewing the table history, you decide you want to view the state of your table after your very first data was inserted.

Run the query below to see this.

In [0]:
SELECT * 
FROM beans VERSION AS OF 1;

name,color,grams,delicious
black,black,500.0,True
lentils,brown,1000.0,True
jelly,rainbow,42.5,False


And now review the current state of your data.

In [0]:
SELECT * 
FROM beans;

name,color,grams,delicious
jelly,rainbow,42.5,True
lentils,brown,1000.0,True
green,green,178.3,True
pinto,brown,1500.0,True
lentils,green,500.0,True
black,black,560.5,True
kidney,red,387.2,True


You want to review the weights of your beans before you deleted any records.

Fill in the statement below to register a temporary view of the version just before data was deleted, then run the following cell to query the view.

In [0]:
CREATE OR REPLACE TEMP VIEW pre_delete_vw AS
SELECT * FROM beans VERSION AS OF 4;

In [0]:
SELECT * 
FROM pre_delete_vw;

name,color,grams,delicious
green,green,178.3,True
beanbag chair,white,40000.0,False
black,black,500.0,True
lentils,brown,1000.0,True
jelly,rainbow,42.5,True
pinto,brown,1500.0,True


Run the cell below to check that you have captured the correct version.

In [0]:
%python
assert spark.catalog.tableExists("pre_delete_vw"), "Make sure you have registered the temporary view with the provided name `pre_delete_vw`"
assert spark.table("pre_delete_vw").count() == 6, "Make sure you're querying a version of the table with 6 records"
assert spark.table("pre_delete_vw").selectExpr("int(sum(grams))").first()[0] == 43220, "Make sure you query the version of the table after updates were applied"

## Restore a Previous Version

Apparently there was a misunderstanding; the beans your friend gave you that you merged into your collection were not intended for you to keep.

Revert your table to the version before this **`MERGE`** statement completed.

In [0]:
RESTORE TABLE beans TO VERSION AS OF 6;

table_size_after_restore,num_of_files_after_restore,num_removed_files,num_restored_files,removed_files_size,restored_files_size
2810,2,4,2,5442,2810


Review the history of your table. Make note of the fact that restoring to a previous version adds another table version.

In [0]:
DESCRIBE HISTORY beans;

version,timestamp,userId,userName,operation,operationParameters,job,notebook,clusterId,readVersion,isolationLevel,isBlindAppend,operationMetrics,userMetadata,engineInfo
9,2025-05-23T06:52:54Z,7363748412033156,labuser10342477_1747973406@vocareum.com,RESTORE,"Map(version -> 6, timestamp -> null)",,List(296149350687239),0523-041054-ysw25i30,8.0,Serializable,False,"Map(numRestoredFiles -> 2, removedFilesSize -> 5442, numRemovedFiles -> 4, restoredFilesSize -> 2810, numOfFilesAfterRestore -> 2, tableSizeAfterRestore -> 2810)",,Databricks-Runtime/15.4.x-scala2.12
8,2025-05-23T06:45:47Z,7363748412033156,labuser10342477_1747973406@vocareum.com,MERGE,"Map(predicate -> [""((name#57349 = name#57337) AND (color#57350 = color#57338))""], matchedPredicates -> [{""actionType"":""update""}], statsOnLoad -> false, notMatchedBySourcePredicates -> [], notMatchedPredicates -> [{""predicate"":""delicious#57340: boolean"",""actionType"":""insert""}])",,List(296149350687239),0523-041054-ysw25i30,6.0,WriteSerializable,False,"Map(numTargetRowsCopied -> 0, numTargetRowsDeleted -> 0, numTargetFilesAdded -> 3, numTargetBytesAdded -> 4010, numTargetBytesRemoved -> 0, numTargetDeletionVectorsAdded -> 1, numTargetRowsMatchedUpdated -> 1, executionTimeMs -> 4109, materializeSourceTimeMs -> 156, numTargetRowsInserted -> 2, conflictDetectionTimeMs -> 142, numTargetRowsMatchedDeleted -> 0, numTargetDeletionVectorsUpdated -> 1, scanTimeMs -> 1664, numTargetRowsUpdated -> 1, numOutputRows -> 3, numTargetDeletionVectorsRemoved -> 1, numTargetRowsNotMatchedBySourceUpdated -> 0, numTargetChangeFilesAdded -> 0, numSourceRows -> 4, numTargetFilesRemoved -> 0, numTargetRowsNotMatchedBySourceDeleted -> 0, rewriteTimeMs -> 2244)",,Databricks-Runtime/15.4.x-scala2.12
7,2025-05-23T06:45:46Z,7363748412033156,labuser10342477_1747973406@vocareum.com,OPTIMIZE,"Map(predicate -> [], auto -> true, clusterBy -> [], zOrderBy -> [], batchId -> 0)",,List(296149350687239),0523-041054-ysw25i30,6.0,SnapshotIsolation,False,"Map(numRemovedFiles -> 2, numRemovedBytes -> 2810, p25FileSize -> 1432, numDeletionVectorsRemoved -> 1, minFileSize -> 1432, numAddedFiles -> 1, maxFileSize -> 1432, p75FileSize -> 1432, p50FileSize -> 1432, numAddedBytes -> 1432)",,Databricks-Runtime/15.4.x-scala2.12
6,2025-05-23T06:45:41Z,7363748412033156,labuser10342477_1747973406@vocareum.com,DELETE,"Map(predicate -> [""NOT delicious#56021""])",,List(296149350687239),0523-041054-ysw25i30,5.0,WriteSerializable,False,"Map(numRemovedFiles -> 0, numRemovedBytes -> 0, numCopiedRows -> 0, numDeletionVectorsAdded -> 1, numDeletionVectorsRemoved -> 1, numAddedChangeFiles -> 0, executionTimeMs -> 1319, numDeletionVectorsUpdated -> 1, numDeletedRows -> 1, scanTimeMs -> 817, numAddedFiles -> 0, numAddedBytes -> 0, rewriteTimeMs -> 502)",,Databricks-Runtime/15.4.x-scala2.12
5,2025-05-23T06:45:38Z,7363748412033156,labuser10342477_1747973406@vocareum.com,OPTIMIZE,"Map(predicate -> [], auto -> true, clusterBy -> [], zOrderBy -> [], batchId -> 0)",,List(296149350687239),0523-041054-ysw25i30,3.0,SnapshotIsolation,False,"Map(numRemovedFiles -> 3, numRemovedBytes -> 4092, p25FileSize -> 1475, numDeletionVectorsRemoved -> 1, conflictDetectionTimeMs -> 95, minFileSize -> 1475, numAddedFiles -> 1, maxFileSize -> 1475, p75FileSize -> 1475, p50FileSize -> 1475, numAddedBytes -> 1475)",,Databricks-Runtime/15.4.x-scala2.12
4,2025-05-23T06:45:37Z,7363748412033156,labuser10342477_1747973406@vocareum.com,UPDATE,"Map(predicate -> [""(name#54633 = pinto)""])",,List(296149350687239),0523-041054-ysw25i30,3.0,WriteSerializable,False,"Map(numRemovedFiles -> 0, numRemovedBytes -> 0, numCopiedRows -> 0, numDeletionVectorsAdded -> 1, numDeletionVectorsRemoved -> 0, numAddedChangeFiles -> 0, executionTimeMs -> 2269, numDeletionVectorsUpdated -> 0, scanTimeMs -> 648, numAddedFiles -> 1, numUpdatedRows -> 1, numAddedBytes -> 1335, rewriteTimeMs -> 1617)",,Databricks-Runtime/15.4.x-scala2.12
3,2025-05-23T06:45:34Z,7363748412033156,labuser10342477_1747973406@vocareum.com,UPDATE,"Map(predicate -> [""(name#53589 = jelly)""])",,List(296149350687239),0523-041054-ysw25i30,2.0,WriteSerializable,False,"Map(numRemovedFiles -> 0, numRemovedBytes -> 0, numCopiedRows -> 0, numDeletionVectorsAdded -> 1, numDeletionVectorsRemoved -> 0, numAddedChangeFiles -> 0, executionTimeMs -> 2304, numDeletionVectorsUpdated -> 0, scanTimeMs -> 984, numAddedFiles -> 1, numUpdatedRows -> 1, numAddedBytes -> 1349, rewriteTimeMs -> 1312)",,Databricks-Runtime/15.4.x-scala2.12
2,2025-05-23T06:45:30Z,7363748412033156,labuser10342477_1747973406@vocareum.com,WRITE,"Map(mode -> Append, statsOnLoad -> false, partitionBy -> [])",,List(296149350687239),0523-041054-ysw25i30,1.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 3, numOutputBytes -> 1379)",,Databricks-Runtime/15.4.x-scala2.12
1,2025-05-23T06:45:28Z,7363748412033156,labuser10342477_1747973406@vocareum.com,WRITE,"Map(mode -> Append, statsOnLoad -> false, partitionBy -> [])",,List(296149350687239),0523-041054-ysw25i30,0.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 3, numOutputBytes -> 1364)",,Databricks-Runtime/15.4.x-scala2.12
0,2025-05-23T06:45:26Z,7363748412033156,labuser10342477_1747973406@vocareum.com,CREATE OR REPLACE TABLE,"Map(partitionBy -> [], clusterBy -> [], description -> null, isManaged -> true, properties -> {""delta.enableDeletionVectors"":""true""}, statsOnLoad -> false)",,List(296149350687239),0523-041054-ysw25i30,,WriteSerializable,True,Map(),,Databricks-Runtime/15.4.x-scala2.12


In [0]:
%python
last_tx = spark.conf.get("spark.databricks.delta.lastCommitVersionInSession")
assert spark.sql(f"DESCRIBE HISTORY beans").select("operation").first()[0] == "RESTORE", "Make sure you reverted your table with the `RESTORE` keyword"
assert spark.table("beans").count() == 5, "Make sure you reverted to the version after deleting records but before merging"

By completing this lab, you should now feel comfortable:
* Completing standard Delta Lake table creation and data manipulation commands
* Reviewing table metadata including table history
* Leverage Delta Lake versioning for snapshot queries and rollbacks


&copy; 2025 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use">Terms of Use</a> | 
<a href="https://help.databricks.com/">Support</a>