
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>


# Advanced Delta Lake Features

Now that you feel comfortable performing basic data tasks with Delta Lake, we can discuss a few advanced features unique to Delta Lake. We are going to talk about Liquid Clustering, Optimization, and Versioning in Delta Lake.

Note that while some of the keywords used here aren't part of standard ANSI SQL, all Delta Lake operations can be run on Databricks using SQL

## Learning Objectives
By the end of this lesson, you should be able to:
* Use **`CLUSTER BY`** for liquid clustering
* Use **`OPTIMIZE`** to manually trigger liquid clustering
* Review a history of table transactions
* Query and roll back to previous table version
* Describe how to enable **`Predictive Optimization`**

**Resources**
* <a href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-optimize.html" target="_blank">Delta Optimize - Databricks Docs</a>
* <a href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-vacuum.html" target="_blank">Delta Vacuum - Databricks Docs</a>

## REQUIRED - SELECT CLASSIC COMPUTE

Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:

1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

1. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

  - In the drop-down, select **More**.

  - In the **Attach to an existing compute resource** pop-up, select the first drop-down. You will see a unique cluster name in that drop-down. Please select that cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

1. Find the triangle icon to the right of your compute cluster name and click it.

1. Wait a few minutes for the cluster to start.

1. Once the cluster is running, complete the steps above to select your cluster.

## Classroom Setup

Run the following cell to configure your working environment for this course. It will also set your default catalog to **dbacademy** and the schema to your specific schema name shown below using the `USE` statements.
<br></br>


```
USE CATALOG dbacademy;
USE SCHEMA dbacademy.<your unique schema name>;
```

**NOTE:** The `DA` object is only used in Databricks Academy courses and is not available outside of these courses. It will dynamically reference the information needed to run the course.

In [0]:
%run ./Includes/Classroom-Setup-7

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


0,1
Course Catalog:,
Your Schema:,


## Liquid Clustering
Delta Lake liquid clustering replaces table partitioning and ZORDER to simplify data layout decisions and optimize query performance. Liquid clustering provides flexibility to redefine clustering keys without rewriting existing data, allowing data layout to evolve alongside analytic needs over time.

Databricks recommends using liquid clustering for all new Delta tables.

We enable liquid clustering on a table by using **`CLUSTER BY`**.

Run **`DESCRIBE events`** and note the names of the columns.

In [0]:
DESCRIBE events;

col_name,data_type,comment
device,string,
ecommerce,struct,
event_name,string,
event_previous_timestamp,bigint,
event_timestamp,bigint,
geo,struct,
items,array>,
traffic_source,string,
user_first_touch_timestamp,bigint,
user_id,string,


There are [many reasons](https://docs.databricks.com/en/delta/clustering.html#what-is-liquid-clustering-used-for) to use liquid clustering on a table. We know the **`events`** table will be growing quickly and will require maintenance and tuning, so we are going to enable liquid clustering for this table. Now, we could have enable liquid clustering at the time the table was created by adding **`CLUSTER BY`** to the **`CREATE TABLE`** statement, like this:

In [0]:
CREATE OR REPLACE TABLE events_liquid 
CLUSTER BY (user_id) AS 
SELECT * 
FROM events;

num_affected_rows,num_inserted_rows


However, we can also add liquid clustering to an existing table using **`ALTER TABLE`**.

In [0]:
ALTER TABLE events
CLUSTER BY (user_id);

When we run **`DESCRIBE events`**, we see the column(s) on which we are currently clustering under **`Clustering Information`**.

In [0]:
DESCRIBE events;

col_name,data_type,comment
device,string,
ecommerce,struct,
event_name,string,
event_previous_timestamp,bigint,
event_timestamp,bigint,
geo,struct,
items,array>,
traffic_source,string,
user_first_touch_timestamp,bigint,
user_id,string,


## Choosing Clustering Keys
Databricks recommends choosing clustering keys based on commonly used query filters. Clustering keys can be defined in any order. 

In the **`CLUSTER BY`** above, we chose **`user_id`** as the clustering key, but we may also want to add **`device`**. Note that we can change clustering keys, as needed, by altering the table in the future.

With liquid clustering, we no longer have to worry about how we have data partitioned or deal with the complexities of using zorder. We get the benefits of both without the struggle.

## Triggering Liquid Clustering
Liquid clustering is incremental, meaning that data is only rewritten as necessary to accommodate data that needs to be clustered. Data files with clustering keys that do not match data to be clustered are not rewritten.

For best performance, **Databricks recommends scheduling regular** **`OPTIMIZE`** **jobs to cluster data. For tables experiencing many updates or inserts, Databricks recommends scheduling an **`OPTIMIZE`** job every one or two hours. Because liquid clustering is incremental, most **`OPTIMIZE`** jobs for clustered tables run quickly.**

In [0]:
OPTIMIZE events;

path,metrics
s3://unity-catalogs-us-west-2/metastore/3665583-root/8590962d-5b67-4403-8302-d03ddc277141/tables/c811de46-d215-4fe1-8131-00df8034968b,"List(0, 0, List(null, null, 0.0, 0, 0), List(null, null, 0.0, 0, 0), 0, null, null, 0, 0, 1, 0, false, 0, 0, 1747981643892, 1747981650906, 4, 0, null, List(0, 0), 10, 10, 0, 0, List(14998213, true, false, 0, 0, 0, 0, 1, 14998213, 14998213, null, log, 16777216, 67108864, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, List(253, 154, 0, 0, 0, 1163), 2, 1, 5, sizeAware))"


## Creating a Delta Table with History

In the next cell, we create a table and run a handful of commands that make updates to the table. As you're waiting for this query to run, see if you can identify the total number of transactions being executed.

In [0]:
CREATE TABLE students 
  (id INT, name STRING, value DOUBLE);
  
INSERT INTO students VALUES (1, "Yve", 1.0);
INSERT INTO students VALUES (2, "Omar", 2.5);
INSERT INTO students VALUES (3, "Elia", 3.3);

INSERT INTO students
VALUES 
  (4, "Ted", 4.7),
  (5, "Tiffany", 5.5),
  (6, "Vini", 6.3);
  
UPDATE students 
SET value = value + 1
WHERE name LIKE "T%";

DELETE FROM students 
WHERE value > 6;

CREATE OR REPLACE TEMP VIEW updates(id, name, value, type) AS VALUES
  (2, "Omar", 15.2, "update"),
  (3, "", null, "delete"),
  (7, "Blue", 7.7, "insert"),
  (11, "Diya", 8.8, "update");
  
MERGE INTO students b
USING updates u
ON b.id=u.id
WHEN MATCHED AND u.type = "update"
  THEN UPDATE SET *
WHEN MATCHED AND u.type = "delete"
  THEN DELETE
WHEN NOT MATCHED AND u.type = "insert"
  THEN INSERT *;

num_affected_rows,num_updated_rows,num_deleted_rows,num_inserted_rows
3,1,1,1


## The Delta Log
Each change to a table results in a new entry being written to the Delta Lake transaction log. 

The command, `DESCRIBE HISTORY` allows us to see this log

In [0]:
DESCRIBE HISTORY students;

version,timestamp,userId,userName,operation,operationParameters,job,notebook,clusterId,readVersion,isolationLevel,isBlindAppend,operationMetrics,userMetadata,engineInfo
9,2025-05-23T06:29:41Z,7363748412033156,labuser10342477_1747973406@vocareum.com,OPTIMIZE,"Map(predicate -> [], auto -> true, clusterBy -> [], zOrderBy -> [], batchId -> 0)",,List(296149350687057),0523-041054-ysw25i30,8.0,SnapshotIsolation,False,"Map(numRemovedFiles -> 3, numRemovedBytes -> 3420, p25FileSize -> 1153, numDeletionVectorsRemoved -> 1, minFileSize -> 1153, numAddedFiles -> 1, maxFileSize -> 1153, p75FileSize -> 1153, p50FileSize -> 1153, numAddedBytes -> 1153)",,Databricks-Runtime/15.4.x-scala2.12
8,2025-05-23T06:29:37Z,7363748412033156,labuser10342477_1747973406@vocareum.com,MERGE,"Map(predicate -> [""(id#47195 = id#46962)""], matchedPredicates -> [{""predicate"":""(type#46965 = update)"",""actionType"":""update""},{""predicate"":""(type#46965 = delete)"",""actionType"":""delete""}], statsOnLoad -> false, notMatchedBySourcePredicates -> [], notMatchedPredicates -> [{""predicate"":""(type#46965 = insert)"",""actionType"":""insert""}])",,List(296149350687057),0523-041054-ysw25i30,7.0,WriteSerializable,False,"Map(numTargetRowsCopied -> 0, numTargetRowsDeleted -> 1, numTargetFilesAdded -> 2, numTargetBytesAdded -> 2228, numTargetBytesRemoved -> 0, numTargetDeletionVectorsAdded -> 1, numTargetRowsMatchedUpdated -> 1, executionTimeMs -> 3642, materializeSourceTimeMs -> 140, numTargetRowsInserted -> 1, numTargetRowsMatchedDeleted -> 1, numTargetDeletionVectorsUpdated -> 1, scanTimeMs -> 1432, numTargetRowsUpdated -> 1, numOutputRows -> 2, numTargetDeletionVectorsRemoved -> 1, numTargetRowsNotMatchedBySourceUpdated -> 0, numTargetChangeFilesAdded -> 0, numSourceRows -> 4, numTargetFilesRemoved -> 0, numTargetRowsNotMatchedBySourceDeleted -> 0, rewriteTimeMs -> 2031)",,Databricks-Runtime/15.4.x-scala2.12
7,2025-05-23T06:29:31Z,7363748412033156,labuser10342477_1747973406@vocareum.com,OPTIMIZE,"Map(predicate -> [], auto -> true, clusterBy -> [], zOrderBy -> [], batchId -> 0)",,List(296149350687057),0523-041054-ysw25i30,5.0,SnapshotIsolation,False,"Map(numRemovedFiles -> 5, numRemovedBytes -> 5602, p25FileSize -> 1192, numDeletionVectorsRemoved -> 1, conflictDetectionTimeMs -> 415, minFileSize -> 1192, numAddedFiles -> 1, maxFileSize -> 1192, p75FileSize -> 1192, p50FileSize -> 1192, numAddedBytes -> 1192)",,Databricks-Runtime/15.4.x-scala2.12
6,2025-05-23T06:29:29Z,7363748412033156,labuser10342477_1747973406@vocareum.com,DELETE,"Map(predicate -> [""(value#45631 > 6.0)""])",,List(296149350687057),0523-041054-ysw25i30,5.0,WriteSerializable,False,"Map(numRemovedFiles -> 1, numRemovedBytes -> 1140, numCopiedRows -> 0, numDeletionVectorsAdded -> 1, numDeletionVectorsRemoved -> 1, numAddedChangeFiles -> 0, executionTimeMs -> 1923, numDeletionVectorsUpdated -> 0, numDeletedRows -> 2, scanTimeMs -> 919, numAddedFiles -> 0, numAddedBytes -> 0, rewriteTimeMs -> 998)",,Databricks-Runtime/15.4.x-scala2.12
5,2025-05-23T06:29:25Z,7363748412033156,labuser10342477_1747973406@vocareum.com,UPDATE,"Map(predicate -> [""StartsWith(name#44611, T)""])",,List(296149350687057),0523-041054-ysw25i30,4.0,WriteSerializable,False,"Map(numRemovedFiles -> 0, numRemovedBytes -> 0, numCopiedRows -> 0, numDeletionVectorsAdded -> 1, numDeletionVectorsRemoved -> 0, numAddedChangeFiles -> 0, executionTimeMs -> 2575, numDeletionVectorsUpdated -> 0, scanTimeMs -> 1191, numAddedFiles -> 1, numUpdatedRows -> 2, numAddedBytes -> 1128, rewriteTimeMs -> 1367)",,Databricks-Runtime/15.4.x-scala2.12
4,2025-05-23T06:29:21Z,7363748412033156,labuser10342477_1747973406@vocareum.com,WRITE,"Map(mode -> Append, statsOnLoad -> false, partitionBy -> [])",,List(296149350687057),0523-041054-ysw25i30,3.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 3, numOutputBytes -> 1140)",,Databricks-Runtime/15.4.x-scala2.12
3,2025-05-23T06:29:20Z,7363748412033156,labuser10342477_1747973406@vocareum.com,WRITE,"Map(mode -> Append, statsOnLoad -> false, partitionBy -> [])",,List(296149350687057),0523-041054-ysw25i30,2.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 1, numOutputBytes -> 1114)",,Databricks-Runtime/15.4.x-scala2.12
2,2025-05-23T06:29:18Z,7363748412033156,labuser10342477_1747973406@vocareum.com,WRITE,"Map(mode -> Append, statsOnLoad -> false, partitionBy -> [])",,List(296149350687057),0523-041054-ysw25i30,1.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 1, numOutputBytes -> 1114)",,Databricks-Runtime/15.4.x-scala2.12
1,2025-05-23T06:29:16Z,7363748412033156,labuser10342477_1747973406@vocareum.com,WRITE,"Map(mode -> Append, statsOnLoad -> false, partitionBy -> [])",,List(296149350687057),0523-041054-ysw25i30,0.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 1, numOutputBytes -> 1106)",,Databricks-Runtime/15.4.x-scala2.12
0,2025-05-23T06:29:14Z,7363748412033156,labuser10342477_1747973406@vocareum.com,CREATE TABLE,"Map(partitionBy -> [], clusterBy -> [], description -> null, isManaged -> true, properties -> {""delta.enableDeletionVectors"":""true""}, statsOnLoad -> false)",,List(296149350687057),0523-041054-ysw25i30,,WriteSerializable,True,Map(),,Databricks-Runtime/15.4.x-scala2.12


## Deletion Vectors
Note that the log includes an **OPTIMIZE** operation, yet we never called **`OPTIMIZE`** on the **`students`** table. If you open the `operationParameters` for the **`OPTIMIZE`** operation, you will see that `auto: true`. This is because Deletion Vectors triggered auto-compaction. When we delete rows from a table, Deletion Vectors mark those rows for deletion but do not re-write the underlying Parquet files. This helps reduce the so-called small file problem, where a table is made up of a large number of small Parquet files. However, Deletion Vectors will trigger auto-compaction, and the underlying files are re-written.



## Delta Lake Time Travel

Delta Lake gives us the opportunity to query tables at any point in the transaction log. These time travel queries can be performed by specifying either the version number or the timestamp.

**NOTE**: In most cases, you'll use a timestamp to recreate data at a time of interest. For our demo we'll use version.

In [0]:
SELECT * 
FROM students VERSION AS OF 3;

id,name,value
3,Elia,3.3
2,Omar,2.5
1,Yve,1.0




What's important to note about time travel is that we're not recreating a previous state of the table by undoing transactions against our current version; rather, we're just querying all those data files that were indicated as valid as of the specified version.

## Rollback Versions

Suppose you're typing up a query to manually delete some records from a table and you accidentally delete all records.

In [0]:
DELETE FROM students;

num_affected_rows
4


From the output above, we can see that 4 rows were removed.

Let's confirm this below.

In [0]:
SELECT * 
FROM students;

id,name,value


Deleting all the records in your table is probably not a desired outcome. Luckily, we can simply rollback this commit.

In [0]:
RESTORE TABLE students TO VERSION AS OF 8;

table_size_after_restore,num_of_files_after_restore,num_removed_files,num_restored_files,removed_files_size,restored_files_size
3420,3,0,3,0,3420


In [0]:
-- Confirm table has been 'Restored'
SELECT * 
FROM students;

id,name,value
1,Yve,1.0
4,Ted,5.7
2,Omar,15.2
7,Blue,7.7


Note that a **`RESTORE`** <a href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-restore.html" target="_blank">command</a> is recorded as a transaction; you won't be able to completely hide the fact that you accidentally deleted all the records in the table, but you will be able to undo the operation and bring your table back to a desired state.

In [0]:
DESCRIBE HISTORY students;

version,timestamp,userId,userName,operation,operationParameters,job,notebook,clusterId,readVersion,isolationLevel,isBlindAppend,operationMetrics,userMetadata,engineInfo
11,2025-05-23T06:33:54Z,7363748412033156,labuser10342477_1747973406@vocareum.com,RESTORE,"Map(version -> 8, timestamp -> null)",,List(296149350687057),0523-041054-ysw25i30,10.0,Serializable,False,"Map(numRestoredFiles -> 3, removedFilesSize -> 0, numRemovedFiles -> 0, restoredFilesSize -> 3420, numOfFilesAfterRestore -> 3, tableSizeAfterRestore -> 3420)",,Databricks-Runtime/15.4.x-scala2.12
10,2025-05-23T06:33:30Z,7363748412033156,labuser10342477_1747973406@vocareum.com,DELETE,"Map(predicate -> [""true""])",,List(296149350687057),0523-041054-ysw25i30,9.0,WriteSerializable,False,"Map(numRemovedFiles -> 1, numRemovedBytes -> 1153, numCopiedRows -> 0, numDeletionVectorsAdded -> 0, numDeletionVectorsRemoved -> 0, numAddedChangeFiles -> 0, executionTimeMs -> 252, numDeletionVectorsUpdated -> 0, numDeletedRows -> 4, scanTimeMs -> 249, numAddedFiles -> 0, numAddedBytes -> 0, rewriteTimeMs -> 0)",,Databricks-Runtime/15.4.x-scala2.12
9,2025-05-23T06:29:41Z,7363748412033156,labuser10342477_1747973406@vocareum.com,OPTIMIZE,"Map(predicate -> [], auto -> true, clusterBy -> [], zOrderBy -> [], batchId -> 0)",,List(296149350687057),0523-041054-ysw25i30,8.0,SnapshotIsolation,False,"Map(numRemovedFiles -> 3, numRemovedBytes -> 3420, p25FileSize -> 1153, numDeletionVectorsRemoved -> 1, minFileSize -> 1153, numAddedFiles -> 1, maxFileSize -> 1153, p75FileSize -> 1153, p50FileSize -> 1153, numAddedBytes -> 1153)",,Databricks-Runtime/15.4.x-scala2.12
8,2025-05-23T06:29:37Z,7363748412033156,labuser10342477_1747973406@vocareum.com,MERGE,"Map(predicate -> [""(id#47195 = id#46962)""], matchedPredicates -> [{""predicate"":""(type#46965 = update)"",""actionType"":""update""},{""predicate"":""(type#46965 = delete)"",""actionType"":""delete""}], statsOnLoad -> false, notMatchedBySourcePredicates -> [], notMatchedPredicates -> [{""predicate"":""(type#46965 = insert)"",""actionType"":""insert""}])",,List(296149350687057),0523-041054-ysw25i30,7.0,WriteSerializable,False,"Map(numTargetRowsCopied -> 0, numTargetRowsDeleted -> 1, numTargetFilesAdded -> 2, numTargetBytesAdded -> 2228, numTargetBytesRemoved -> 0, numTargetDeletionVectorsAdded -> 1, numTargetRowsMatchedUpdated -> 1, executionTimeMs -> 3642, materializeSourceTimeMs -> 140, numTargetRowsInserted -> 1, numTargetRowsMatchedDeleted -> 1, numTargetDeletionVectorsUpdated -> 1, scanTimeMs -> 1432, numTargetRowsUpdated -> 1, numOutputRows -> 2, numTargetDeletionVectorsRemoved -> 1, numTargetRowsNotMatchedBySourceUpdated -> 0, numTargetChangeFilesAdded -> 0, numSourceRows -> 4, numTargetFilesRemoved -> 0, numTargetRowsNotMatchedBySourceDeleted -> 0, rewriteTimeMs -> 2031)",,Databricks-Runtime/15.4.x-scala2.12
7,2025-05-23T06:29:31Z,7363748412033156,labuser10342477_1747973406@vocareum.com,OPTIMIZE,"Map(predicate -> [], auto -> true, clusterBy -> [], zOrderBy -> [], batchId -> 0)",,List(296149350687057),0523-041054-ysw25i30,5.0,SnapshotIsolation,False,"Map(numRemovedFiles -> 5, numRemovedBytes -> 5602, p25FileSize -> 1192, numDeletionVectorsRemoved -> 1, conflictDetectionTimeMs -> 415, minFileSize -> 1192, numAddedFiles -> 1, maxFileSize -> 1192, p75FileSize -> 1192, p50FileSize -> 1192, numAddedBytes -> 1192)",,Databricks-Runtime/15.4.x-scala2.12
6,2025-05-23T06:29:29Z,7363748412033156,labuser10342477_1747973406@vocareum.com,DELETE,"Map(predicate -> [""(value#45631 > 6.0)""])",,List(296149350687057),0523-041054-ysw25i30,5.0,WriteSerializable,False,"Map(numRemovedFiles -> 1, numRemovedBytes -> 1140, numCopiedRows -> 0, numDeletionVectorsAdded -> 1, numDeletionVectorsRemoved -> 1, numAddedChangeFiles -> 0, executionTimeMs -> 1923, numDeletionVectorsUpdated -> 0, numDeletedRows -> 2, scanTimeMs -> 919, numAddedFiles -> 0, numAddedBytes -> 0, rewriteTimeMs -> 998)",,Databricks-Runtime/15.4.x-scala2.12
5,2025-05-23T06:29:25Z,7363748412033156,labuser10342477_1747973406@vocareum.com,UPDATE,"Map(predicate -> [""StartsWith(name#44611, T)""])",,List(296149350687057),0523-041054-ysw25i30,4.0,WriteSerializable,False,"Map(numRemovedFiles -> 0, numRemovedBytes -> 0, numCopiedRows -> 0, numDeletionVectorsAdded -> 1, numDeletionVectorsRemoved -> 0, numAddedChangeFiles -> 0, executionTimeMs -> 2575, numDeletionVectorsUpdated -> 0, scanTimeMs -> 1191, numAddedFiles -> 1, numUpdatedRows -> 2, numAddedBytes -> 1128, rewriteTimeMs -> 1367)",,Databricks-Runtime/15.4.x-scala2.12
4,2025-05-23T06:29:21Z,7363748412033156,labuser10342477_1747973406@vocareum.com,WRITE,"Map(mode -> Append, statsOnLoad -> false, partitionBy -> [])",,List(296149350687057),0523-041054-ysw25i30,3.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 3, numOutputBytes -> 1140)",,Databricks-Runtime/15.4.x-scala2.12
3,2025-05-23T06:29:20Z,7363748412033156,labuser10342477_1747973406@vocareum.com,WRITE,"Map(mode -> Append, statsOnLoad -> false, partitionBy -> [])",,List(296149350687057),0523-041054-ysw25i30,2.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 1, numOutputBytes -> 1114)",,Databricks-Runtime/15.4.x-scala2.12
2,2025-05-23T06:29:18Z,7363748412033156,labuser10342477_1747973406@vocareum.com,WRITE,"Map(mode -> Append, statsOnLoad -> false, partitionBy -> [])",,List(296149350687057),0523-041054-ysw25i30,1.0,WriteSerializable,True,"Map(numFiles -> 1, numOutputRows -> 1, numOutputBytes -> 1114)",,Databricks-Runtime/15.4.x-scala2.12


## Predictive Optimization
Predictive Optimization is a feature that can be enabled that takes away the necessity for manually performing **`OPTIMIZE`** and **`VACUUM`**.

With predictive optimization enabled, Databricks automatically identifies tables that would benefit from maintenance operations and runs them for the user. Maintenance operations are only run as necessary, eliminating both unnecessary runs for maintenance operations and the burden associated with tracking and troubleshooting performance.

You must enable predictive optimization at the account level. The feature is inherited by all lower-level objects, but it can be enabled/disabled on those objects, as needed.

#### View if Predictive Optimization is Enabled:
To check whether Predictive Optimization is enabled on a catalog, schema or table: 
```
DESCRIBE (CATALOG | SCHEMA | TABLE) EXTENDED name
```
 

**View Catalog**

`DESCRIBE CATALOG EXTENDED dbacademy;`

![Catalog PO Check](./Includes/images/po_enabled_catalog.png)

**View Table**

`DESCRIBE TABLE EXTENDED events;`

![Table PO Check](./Includes/images/po_enabled_table.png)

<br></br>

#### Enabling Predictive Optimization:
- To enable Predictive Optimization view the [Enable predictive optimization](https://docs.databricks.com/en/optimizations/predictive-optimization.html) documentation.
```
ALTER CATALOG [catalog_name] {ENABLE | DISABLE} PREDICTIVE OPTIMIZATION;
ALTER {SCHEMA | DATABASE} [schema_name] {ENABLE | DISABLE} PREDICTIVE OPTIMIZATION;
ALTER TABLE [table_name] {ENABLE | DISABLE} PREDICTIVE OPTIMIZATION;
```


Run the `DESCRIBE CATALOG EXTENDED` statement below. Is Predictive Optimization turned on at the catalog level?

In [0]:
DESCRIBE CATALOG EXTENDED dbacademy;

info_name,info_value
Catalog Name,dbacademy
Comment,
Owner,metastore_admins
Catalog Type,Regular
Created By,9556a37f-7dc0-4b5f-849c-babbde9b34af
Created At,2024-12-03 AD at 09:39:25 UTC
Updated By,9556a37f-7dc0-4b5f-849c-babbde9b34af
Updated At,2025-05-19 AD at 05:52:01 UTC
Predictive Optimization,ENABLE (inherited from METASTORE 3665583-us-west-2)


Run the `DESCRIBE TABLE EXTENDED` statement below. Is Predictive Optimization turned on for the **events** table?

In [0]:
DESCRIBE TABLE EXTENDED events;

col_name,data_type,comment
device,string,
ecommerce,struct,
event_name,string,
event_previous_timestamp,bigint,
event_timestamp,bigint,
geo,struct,
items,array>,
traffic_source,string,
user_first_touch_timestamp,bigint,
user_id,string,



&copy; 2025 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use">Terms of Use</a> | 
<a href="https://help.databricks.com/">Support</a>