-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

<i18n value="ce4b28fc-fbe2-47d3-976a-776345ac869b"/>


# Delta Lake Versioning, Optimization, and Vacuuming

This notebook provides a hands-on review of some of the more esoteric features Delta Lake brings to the data lakehouse.

## Learning Objectives
By the end of this lab, you should be able to:
- Review table history
- Query previous table versions and rollback a table to a specific version
- Perform file compaction and Z-order indexing
- Preview files marked for permanent deletion and commit these deletes

<i18n value="f75fd28d-aa78-4d58-b9b7-b8ea93a99b1b"/>


## Setup
Run the following script to setup necessary variables and clear out past runs of this notebook. Note that re-executing this cell will allow you to start the lab over.

In [0]:
%run ../Includes/Classroom-Setup-02.4L

Python interpreter will be restarted.
Python interpreter will be restarted.


Resetting the learning environment:
| dropping the schema "munirsheikhcloudseekho_0lj9_da_dewd"...(2 seconds)
| removing the working directory "dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks"...(0 seconds)

Skipping install of existing datasets to "dbfs:/mnt/dbacademy-datasets/data-engineering-with-databricks/v02"

Validating the locally installed datasets:
| listing local files...(7 seconds)
| completed (7 seconds total)

Creating & using the schema "munirsheikhcloudseekho_0lj9_da_dewd"...(1 seconds)
Predefined tables in "munirsheikhcloudseekho_0lj9_da_dewd":
| -none-

Predefined paths variables:
| DA.paths.working_dir: dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks
| DA.paths.user_db:     dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks/database.db
| DA.paths.datasets:    dbfs:/mnt/dbacademy-datasets/data-engineering-with-databricks/v02
| DA.paths.check

<i18n value="ea2fae13-227c-4c03-8617-87e06826526e"/>


## Recreate the History of your Bean Collection

This lab picks up where the last lab left off. The cell below condenses all the operations from the last lab into a single cell (other than the final **`DROP TABLE`** statement).

For quick reference, the schema of the **`beans`** table created is:

| Field Name | Field type |
| --- | --- |
| name | STRING |
| color | STRING |
| grams | FLOAT |
| delicious | BOOLEAN |

In [0]:
%sql
CREATE TABLE beans 
(name STRING, color STRING, grams FLOAT, delicious BOOLEAN);

INSERT INTO beans VALUES
("black", "black", 500, true),
("lentils", "brown", 1000, true),
("jelly", "rainbow", 42.5, false);

INSERT INTO beans VALUES
('pinto', 'brown', 1.5, true),
('green', 'green', 178.3, true),
('beanbag chair', 'white', 40000, false);

UPDATE beans
SET delicious = true
WHERE name = "jelly";

UPDATE beans
SET grams = 1500
WHERE name = 'pinto';

DELETE FROM beans
WHERE delicious = false;

CREATE OR REPLACE TEMP VIEW new_beans(name, color, grams, delicious) AS VALUES
('black', 'black', 60.5, true),
('lentils', 'green', 500, true),
('kidney', 'red', 387.2, true),
('castor', 'brown', 25, false);

MERGE INTO beans a
USING new_beans b
ON a.name=b.name AND a.color = b.color
WHEN MATCHED THEN
  UPDATE SET grams = a.grams + b.grams
WHEN NOT MATCHED AND b.delicious = true THEN
  INSERT *;

num_affected_rows,num_updated_rows,num_deleted_rows,num_inserted_rows
3,1,0,2


<i18n value="ec611b15-e52e-4bce-8a74-7d55e72d3189"/>


## Review the Table History

Delta Lake's transaction log stores information about each transaction that modifies a table's contents or settings.

Review the history of the **`beans`** table below.

In [0]:
%sql
-- ANSWER
DESCRIBE HISTORY beans

version,timestamp,userId,userName,operation,operationParameters,job,notebook,clusterId,readVersion,isolationLevel,isBlindAppend,operationMetrics,userMetadata,engineInfo
6,2022-11-13T04:29:01.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,MERGE,"Map(predicate -> ((a.name = b.name) AND (a.color = b.color)), matchedPredicates -> [{""actionType"":""update""}], notMatchedPredicates -> [{""predicate"":""b.delicious"",""actionType"":""insert""}])",,List(4094000743660116),1113-035301-4efipd3u,5.0,WriteSerializable,False,"Map(numTargetRowsCopied -> 2, numTargetRowsDeleted -> 0, numTargetFilesAdded -> 4, executionTimeMs -> 3757, numTargetRowsInserted -> 2, scanTimeMs -> 1810, numTargetRowsUpdated -> 1, numOutputRows -> 5, numTargetChangeFilesAdded -> 0, numSourceRows -> 4, numTargetFilesRemoved -> 1, rewriteTimeMs -> 1491)",,Databricks-Runtime/11.3.x-scala2.12
5,2022-11-13T04:28:55.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,DELETE,"Map(predicate -> [""(NOT spark_catalog.munirsheikhcloudseekho_0lj9_da_dewd.beans.delicious)""])",,List(4094000743660116),1113-035301-4efipd3u,4.0,WriteSerializable,False,"Map(numRemovedFiles -> 1, numCopiedRows -> 2, numAddedChangeFiles -> 0, executionTimeMs -> 1908, numDeletedRows -> 1, scanTimeMs -> 1064, numAddedFiles -> 1, rewriteTimeMs -> 844)",,Databricks-Runtime/11.3.x-scala2.12
4,2022-11-13T04:28:50.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,UPDATE,Map(predicate -> (name#24874 = pinto)),,List(4094000743660116),1113-035301-4efipd3u,3.0,WriteSerializable,False,"Map(numRemovedFiles -> 1, numCopiedRows -> 2, numAddedChangeFiles -> 0, executionTimeMs -> 1709, scanTimeMs -> 618, numAddedFiles -> 1, numUpdatedRows -> 1, rewriteTimeMs -> 1091)",,Databricks-Runtime/11.3.x-scala2.12
3,2022-11-13T04:28:46.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,UPDATE,Map(predicate -> (name#24322 = jelly)),,List(4094000743660116),1113-035301-4efipd3u,2.0,WriteSerializable,False,"Map(numRemovedFiles -> 1, numCopiedRows -> 2, numAddedChangeFiles -> 0, executionTimeMs -> 1858, scanTimeMs -> 1020, numAddedFiles -> 1, numUpdatedRows -> 1, rewriteTimeMs -> 838)",,Databricks-Runtime/11.3.x-scala2.12
2,2022-11-13T04:28:42.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,WRITE,"Map(mode -> Append, partitionBy -> [])",,List(4094000743660116),1113-035301-4efipd3u,1.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 3, numOutputBytes -> 1328)",,Databricks-Runtime/11.3.x-scala2.12
1,2022-11-13T04:28:39.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,WRITE,"Map(mode -> Append, partitionBy -> [])",,List(4094000743660116),1113-035301-4efipd3u,0.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 3, numOutputBytes -> 1313)",,Databricks-Runtime/11.3.x-scala2.12
0,2022-11-13T04:28:35.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,CREATE TABLE,"Map(isManaged -> true, description -> null, partitionBy -> [], properties -> {})",,List(4094000743660116),1113-035301-4efipd3u,,WriteSerializable,True,Map(),,Databricks-Runtime/11.3.x-scala2.12


<i18n value="6c5aaad5-d6ac-4a46-943f-81720d7d1d92"/>


If all the previous operations were completed as described you should see 7 versions of the table (**NOTE**: Delta Lake versioning starts with 0, so the max version number will be 6).

The operations should be as follows:

| version | operation |
| --- | --- |
| 0 | CREATE TABLE |
| 1 | WRITE |
| 2 | WRITE |
| 3 | UPDATE |
| 4 | UPDATE |
| 5 | DELETE |
| 6 | MERGE |

The **`operationsParameters`** column will let you review predicates used for updates, deletes, and merges. The **`operationMetrics`** column indicates how many rows and files are added in each operation.

Spend some time reviewing the Delta Lake history to understand which table version matches with a given transaction.

**NOTE**: The **`version`** column designates the state of a table once a given transaction completes. The **`readVersion`** column indicates the version of the table an operation executed against. In this simple demo (with no concurrent transactions), this relationship should always increment by 1.

<i18n value="4cb66440-1d20-4f76-8110-6f872dc59800"/>


## Query a Specific Version

After reviewing the table history, you decide you want to view the state of your table after your very first data was inserted.

Run the query below to see this.

In [0]:
%sql
SELECT * FROM beans VERSION AS OF 1

name,color,grams,delicious
black,black,500.0,True
lentils,brown,1000.0,True
jelly,rainbow,42.5,False


<i18n value="3043618d-abb4-46db-9b13-bd1c4a02d235"/>


And now review the current state of your data.

In [0]:
%sql
SELECT * FROM beans

name,color,grams,delicious
lentils,green,500.0,True
lentils,brown,1000.0,True
jelly,rainbow,42.5,True
black,black,560.5,True
pinto,brown,1500.0,True
green,green,178.3,True
kidney,red,387.2,True


<i18n value="91947cec-f2ff-4590-9bdb-d996fa93cd04"/>


You want to review the weights of your beans before you deleted any records.

Fill in the statement below to register a temporary view of the version just before data was deleted, then run the following cell to query the view.

In [0]:
%sql
-- ANSWER
CREATE OR REPLACE TEMP VIEW pre_delete_vw AS
  SELECT * FROM beans VERSION AS OF 4;

In [0]:
%sql
SELECT * FROM pre_delete_vw

name,color,grams,delicious
pinto,brown,1500.0,True
green,green,178.3,True
beanbag chair,white,40000.0,False
black,black,500.0,True
lentils,brown,1000.0,True
jelly,rainbow,42.5,True


<i18n value="b10dccdf-cf1e-43fe-bed0-1da2166f0884"/>


Run the cell below to check that you have captured the correct version.

In [0]:
%python
assert spark.table("pre_delete_vw"), "Make sure you have registered the temporary view with the provided name `pre_delete_vw`"
assert spark.table("pre_delete_vw").count() == 6, "Make sure you're querying a version of the table with 6 records"
assert spark.table("pre_delete_vw").selectExpr("int(sum(grams))").first()[0] == 43220, "Make sure you query the version of the table after updates were applied"

<i18n value="bcedb128-6a39-46a6-b418-c889a2587751"/>


## Restore a Previous Version

Apparently there was a misunderstanding; the beans your friend gave you that you merged into your collection were not intended for you to keep.

Revert your table to the version before this **`MERGE`** statement completed.

In [0]:
%sql
-- ANSWER
RESTORE TABLE beans TO VERSION AS OF 5

table_size_after_restore,num_of_files_after_restore,num_removed_files,num_restored_files,removed_files_size,restored_files_size
2590,2,4,1,5147,1313


<i18n value="b0ca1fc8-da6f-444e-9105-f0d6bc7893d9"/>


Review the history of your table. Make note of the fact that restoring to a previous version adds another table version.

In [0]:
%sql
DESCRIBE HISTORY beans

version,timestamp,userId,userName,operation,operationParameters,job,notebook,clusterId,readVersion,isolationLevel,isBlindAppend,operationMetrics,userMetadata,engineInfo
7,2022-11-13T04:31:56.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,RESTORE,"Map(version -> 5, timestamp -> null)",,List(4094000743660116),1113-035301-4efipd3u,6.0,Serializable,False,"Map(numRestoredFiles -> 1, removedFilesSize -> 5147, numRemovedFiles -> 4, restoredFilesSize -> 1313, numOfFilesAfterRestore -> 2, tableSizeAfterRestore -> 2590)",,Databricks-Runtime/11.3.x-scala2.12
6,2022-11-13T04:29:01.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,MERGE,"Map(predicate -> ((a.name = b.name) AND (a.color = b.color)), matchedPredicates -> [{""actionType"":""update""}], notMatchedPredicates -> [{""predicate"":""b.delicious"",""actionType"":""insert""}])",,List(4094000743660116),1113-035301-4efipd3u,5.0,WriteSerializable,False,"Map(numTargetRowsCopied -> 2, numTargetRowsDeleted -> 0, numTargetFilesAdded -> 4, executionTimeMs -> 3757, numTargetRowsInserted -> 2, scanTimeMs -> 1810, numTargetRowsUpdated -> 1, numOutputRows -> 5, numTargetChangeFilesAdded -> 0, numSourceRows -> 4, numTargetFilesRemoved -> 1, rewriteTimeMs -> 1491)",,Databricks-Runtime/11.3.x-scala2.12
5,2022-11-13T04:28:55.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,DELETE,"Map(predicate -> [""(NOT spark_catalog.munirsheikhcloudseekho_0lj9_da_dewd.beans.delicious)""])",,List(4094000743660116),1113-035301-4efipd3u,4.0,WriteSerializable,False,"Map(numRemovedFiles -> 1, numCopiedRows -> 2, numAddedChangeFiles -> 0, executionTimeMs -> 1908, numDeletedRows -> 1, scanTimeMs -> 1064, numAddedFiles -> 1, rewriteTimeMs -> 844)",,Databricks-Runtime/11.3.x-scala2.12
4,2022-11-13T04:28:50.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,UPDATE,Map(predicate -> (name#24874 = pinto)),,List(4094000743660116),1113-035301-4efipd3u,3.0,WriteSerializable,False,"Map(numRemovedFiles -> 1, numCopiedRows -> 2, numAddedChangeFiles -> 0, executionTimeMs -> 1709, scanTimeMs -> 618, numAddedFiles -> 1, numUpdatedRows -> 1, rewriteTimeMs -> 1091)",,Databricks-Runtime/11.3.x-scala2.12
3,2022-11-13T04:28:46.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,UPDATE,Map(predicate -> (name#24322 = jelly)),,List(4094000743660116),1113-035301-4efipd3u,2.0,WriteSerializable,False,"Map(numRemovedFiles -> 1, numCopiedRows -> 2, numAddedChangeFiles -> 0, executionTimeMs -> 1858, scanTimeMs -> 1020, numAddedFiles -> 1, numUpdatedRows -> 1, rewriteTimeMs -> 838)",,Databricks-Runtime/11.3.x-scala2.12
2,2022-11-13T04:28:42.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,WRITE,"Map(mode -> Append, partitionBy -> [])",,List(4094000743660116),1113-035301-4efipd3u,1.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 3, numOutputBytes -> 1328)",,Databricks-Runtime/11.3.x-scala2.12
1,2022-11-13T04:28:39.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,WRITE,"Map(mode -> Append, partitionBy -> [])",,List(4094000743660116),1113-035301-4efipd3u,0.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 3, numOutputBytes -> 1313)",,Databricks-Runtime/11.3.x-scala2.12
0,2022-11-13T04:28:35.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,CREATE TABLE,"Map(isManaged -> true, description -> null, partitionBy -> [], properties -> {})",,List(4094000743660116),1113-035301-4efipd3u,,WriteSerializable,True,Map(),,Databricks-Runtime/11.3.x-scala2.12


In [0]:
%python
last_tx = spark.conf.get("spark.databricks.delta.lastCommitVersionInSession")
assert spark.sql(f"DESCRIBE HISTORY beans").select("operation").first()[0] == "RESTORE", "Make sure you reverted your table with the `RESTORE` keyword"
assert spark.table("beans").count() == 5, "Make sure you reverted to the version after deleting records but before merging"

<i18n value="e16c9c00-1ac7-444e-9f99-6ceccb7795d3"/>


## File Compaction
Looking at the transaction metrics during your reversion, you are surprised you have so many files for such a small collection of data.

While indexing on a table of this size is unlikely to improve performance, you decide to add a Z-order index on the **`name`** field in anticipation of your bean collection growing exponentially over time.

Use the cell below to perform file compaction and Z-order indexing.

In [0]:
%sql
-- ANSWER
OPTIMIZE beans
ZORDER BY name

path,metrics
dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks/database.db/beans,"List(1, 2, List(1380, 1380, 1380.0, 1, 1380), List(1277, 1313, 1295.0, 2, 2590), 0, List(minCubeSize(107374182400), List(0, 0), List(2, 2590), 0, List(2, 2590), 1, null), 1, 2, 0, false, 0, 0, 1668314083863, 1668314091449, 8, 1, null)"


<i18n value="f97af267-9d81-4035-803b-2d54e5f037af"/>


Your data should have been compacted to a single file; confirm this manually by running the following cell.

In [0]:
%sql
DESCRIBE DETAIL beans

format,id,name,description,location,createdAt,lastModified,partitionColumns,numFiles,sizeInBytes,properties,minReaderVersion,minWriterVersion
delta,c9bbe766-39a3-47ba-bf6d-694f74741c8f,spark_catalog.munirsheikhcloudseekho_0lj9_da_dewd.beans,,dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks/database.db/beans,2022-11-13T04:28:33.946+0000,2022-11-13T04:34:48.000+0000,List(),1,1380,Map(),1,2


<i18n value="4510898e-045e-493b-8882-26d1366219ff"/>


Run the cell below to check that you've successfully optimized and indexed your table.

In [0]:
%python
last_tx = spark.sql("DESCRIBE HISTORY beans").first()
assert last_tx["operation"] == "OPTIMIZE", "Make sure you used the `OPTIMIZE` command to perform file compaction"
assert last_tx["operationParameters"]["zOrderBy"] == '["name"]', "Use `ZORDER BY name` with your optimize command to index your table"

<i18n value="2704d55d-c54a-4e44-baf8-6bf186363870"/>


## Cleaning Up Stale Data Files

You know that while all your data now resides in 1 data file, the data files from previous versions of your table are still being stored alongside this. You wish to remove these files and remove access to previous versions of the table by running **`VACUUM`** on the table.

Executing **`VACUUM`** performs garbage cleanup on the table directory. By default, a retention threshold of 7 days will be enforced.

The cell below modifies some Spark configurations. The first command overrides the retention threshold check to allow us to demonstrate permanent removal of data. 

**NOTE**: Vacuuming a production table with a short retention can lead to data corruption and/or failure of long-running queries. This is for demonstration purposes only and extreme caution should be used when disabling this setting.

The second command sets **`spark.databricks.delta.vacuum.logging.enabled`** to **`true`** to ensure that the **`VACUUM`** operation is recorded in the transaction log.

**NOTE**: Because of slight differences in storage protocols on various clouds, logging **`VACUUM`** commands is not on by default for some clouds as of DBR 9.1.

In [0]:
%sql
SET spark.databricks.delta.retentionDurationCheck.enabled = false;
SET spark.databricks.delta.vacuum.logging.enabled = true;

key,value
spark.databricks.delta.vacuum.logging.enabled,True


<i18n value="b4aa9f86-b65a-4b58-a303-01ce01c1dda9"/>


Before permanently deleting data files, review them manually using the **`DRY RUN`** option.

In [0]:
%sql
VACUUM beans RETAIN 0 HOURS DRY RUN

path
dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks/database.db/beans/part-00001-f0031154-6434-4877-8f67-8cd813d052f8-c000.snappy.parquet
dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks/database.db/beans/part-00000-6700694d-2a2c-43a3-8c96-55200bd394a2-c000.snappy.parquet
dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks/database.db/beans/part-00000-74126c31-70b3-45e3-99c6-2b6b51b74bf6-c000.snappy.parquet
dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks/database.db/beans/part-00000-4855e36c-fd70-4627-85b0-7918b9ba8c3c-c000.snappy.parquet
dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks/database.db/beans/part-00000-5496e21b-26a8-4f92-a73d-87f9b47ba048-c000.snappy.parquet
dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks/database.db/beans/part-00000-3899c2eb-b7f7-4d6c-a035-0274ce8c418b-c000.snappy.parquet
dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks/database.db/beans/part-00000-9e0e2072-d1b2-4639-88be-fc0de89e2ef6-c000.snappy.parquet
dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks/database.db/beans/part-00000-9331e10a-c95b-4267-a970-b1e4c17f326e-c000.snappy.parquet
dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks/database.db/beans/part-00002-b835f4e0-9017-491f-8fee-5b1ec4e94405-c000.snappy.parquet


<i18n value="d9ebfa03-c7b2-4eba-8e25-71b41a78965d"/>


All data files not in the current version of the table will be shown in the preview above.

Run the command again without **`DRY RUN`** to permanently delete these files.

**NOTE**: All previous versions of the table will no longer be accessible.

In [0]:
%sql
VACUUM beans RETAIN 0 HOURS

path
dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks/database.db/beans


<i18n value="21bb3d2d-5c7b-4e49-ad16-b27eeecbd915"/>


Because **`VACUUM`** can be such a destructive act for important datasets, it's always a good idea to turn the retention duration check back on. Run the cell below to reactive this setting.

In [0]:
%sql
SET spark.databricks.delta.retentionDurationCheck.enabled = true

key,value
spark.databricks.delta.retentionDurationCheck.enabled,True


<i18n value="fdd81ce0-d88a-4cf4-9fe3-6bfdd2319a9b"/>


Note that the table history will indicate the user that completed the **`VACUUM`** operation, the number of files deleted, and log that the retention check was disabled during this operation.

In [0]:
%sql
DESCRIBE HISTORY beans

version,timestamp,userId,userName,operation,operationParameters,job,notebook,clusterId,readVersion,isolationLevel,isBlindAppend,operationMetrics,userMetadata,engineInfo
10,2022-11-13T04:39:12.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,VACUUM END,Map(status -> COMPLETED),,List(4094000743660116),1113-035301-4efipd3u,9.0,SnapshotIsolation,True,"Map(numDeletedFiles -> 9, numVacuumedDirectories -> 1)",,Databricks-Runtime/11.3.x-scala2.12
9,2022-11-13T04:39:01.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,VACUUM START,"Map(retentionCheckEnabled -> false, specifiedRetentionMillis -> 0, defaultRetentionMillis -> 604800000)",,List(4094000743660116),1113-035301-4efipd3u,8.0,SnapshotIsolation,True,Map(numFilesToDelete -> 9),,Databricks-Runtime/11.3.x-scala2.12
8,2022-11-13T04:34:48.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,OPTIMIZE,"Map(predicate -> [], zOrderBy -> [""name""], batchId -> 0, auto -> false)",,List(4094000743660116),1113-035301-4efipd3u,7.0,SnapshotIsolation,False,"Map(numRemovedFiles -> 2, numRemovedBytes -> 2590, p25FileSize -> 1380, minFileSize -> 1380, numAddedFiles -> 1, maxFileSize -> 1380, p75FileSize -> 1380, p50FileSize -> 1380, numAddedBytes -> 1380)",,Databricks-Runtime/11.3.x-scala2.12
7,2022-11-13T04:31:56.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,RESTORE,"Map(version -> 5, timestamp -> null)",,List(4094000743660116),1113-035301-4efipd3u,6.0,Serializable,False,"Map(numRestoredFiles -> 1, removedFilesSize -> 5147, numRemovedFiles -> 4, restoredFilesSize -> 1313, numOfFilesAfterRestore -> 2, tableSizeAfterRestore -> 2590)",,Databricks-Runtime/11.3.x-scala2.12
6,2022-11-13T04:29:01.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,MERGE,"Map(predicate -> ((a.name = b.name) AND (a.color = b.color)), matchedPredicates -> [{""actionType"":""update""}], notMatchedPredicates -> [{""predicate"":""b.delicious"",""actionType"":""insert""}])",,List(4094000743660116),1113-035301-4efipd3u,5.0,WriteSerializable,False,"Map(numTargetRowsCopied -> 2, numTargetRowsDeleted -> 0, numTargetFilesAdded -> 4, executionTimeMs -> 3757, numTargetRowsInserted -> 2, scanTimeMs -> 1810, numTargetRowsUpdated -> 1, numOutputRows -> 5, numTargetChangeFilesAdded -> 0, numSourceRows -> 4, numTargetFilesRemoved -> 1, rewriteTimeMs -> 1491)",,Databricks-Runtime/11.3.x-scala2.12
5,2022-11-13T04:28:55.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,DELETE,"Map(predicate -> [""(NOT spark_catalog.munirsheikhcloudseekho_0lj9_da_dewd.beans.delicious)""])",,List(4094000743660116),1113-035301-4efipd3u,4.0,WriteSerializable,False,"Map(numRemovedFiles -> 1, numCopiedRows -> 2, numAddedChangeFiles -> 0, executionTimeMs -> 1908, numDeletedRows -> 1, scanTimeMs -> 1064, numAddedFiles -> 1, rewriteTimeMs -> 844)",,Databricks-Runtime/11.3.x-scala2.12
4,2022-11-13T04:28:50.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,UPDATE,Map(predicate -> (name#24874 = pinto)),,List(4094000743660116),1113-035301-4efipd3u,3.0,WriteSerializable,False,"Map(numRemovedFiles -> 1, numCopiedRows -> 2, numAddedChangeFiles -> 0, executionTimeMs -> 1709, scanTimeMs -> 618, numAddedFiles -> 1, numUpdatedRows -> 1, rewriteTimeMs -> 1091)",,Databricks-Runtime/11.3.x-scala2.12
3,2022-11-13T04:28:46.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,UPDATE,Map(predicate -> (name#24322 = jelly)),,List(4094000743660116),1113-035301-4efipd3u,2.0,WriteSerializable,False,"Map(numRemovedFiles -> 1, numCopiedRows -> 2, numAddedChangeFiles -> 0, executionTimeMs -> 1858, scanTimeMs -> 1020, numAddedFiles -> 1, numUpdatedRows -> 1, rewriteTimeMs -> 838)",,Databricks-Runtime/11.3.x-scala2.12
2,2022-11-13T04:28:42.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,WRITE,"Map(mode -> Append, partitionBy -> [])",,List(4094000743660116),1113-035301-4efipd3u,1.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 3, numOutputBytes -> 1328)",,Databricks-Runtime/11.3.x-scala2.12
1,2022-11-13T04:28:39.000+0000,2682279945671776,munirsheikhcloudseekho@gmail.com,WRITE,"Map(mode -> Append, partitionBy -> [])",,List(4094000743660116),1113-035301-4efipd3u,0.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 3, numOutputBytes -> 1313)",,Databricks-Runtime/11.3.x-scala2.12


<i18n value="c28d1de2-ff12-426c-9c97-11fced9145cc"/>


Query your table again to confirm you still have access to the current version.

In [0]:
%sql
SELECT * FROM beans

name,color,grams,delicious
black,black,500.0,True
lentils,brown,1000.0,True
jelly,rainbow,42.5,True
pinto,brown,1500.0,True
green,green,178.3,True


<i18n value="a9d17cf0-7d2e-4537-93ed-35c37801bdae"/>


<img src="https://files.training.databricks.com/images/icon_warn_32.png"> Because Delta Cache stores copies of files queried in the current session on storage volumes deployed to your currently active cluster, you may still be able to temporarily access previous table versions (though systems should **not** be designed to expect this behavior). 

Restarting the cluster will ensure that these cached data files are permanently purged.

You can see an example of this by uncommenting and running the following cell that may, or may not, fail
(depending on the state of the cache).

In [0]:
%sql
-- SELECT * FROM beans@v1

<i18n value="6381dbea-0e05-4dae-9015-cfa9c8bdf40a"/>


By completing this lab, you should now feel comfortable:
* Completing standard Delta Lake table creation and data manipulation commands
* Reviewing table metadata including table history
* Leverage Delta Lake versioning for snapshot queries and rollbacks
* Compacting small files and indexing tables
* Using **`VACUUM`** to review files marked for deletion and committing these deletes

<i18n value="6fa65337-c805-4e8e-a3ab-13820a60e6fb"/>

 
Run the following cell to delete the tables and files associated with this lesson.

In [0]:
%python
DA.cleanup()

-sandbox
&copy; 2022 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>