
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img
    src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png"
    alt="Databricks Learning"
  >
</div>


# Ingest and Manipulate a Delta Table Lab

This notebook provides a hands-on review of some of the features Delta Lake brings to the data lakehouse.

## REQUIRED - SELECT CLASSIC COMPUTE

Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:


1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

2. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

   - Click **More** in the drop-down.

   - In the **Attach to an existing compute resource** window, use the first drop-down to select your unique cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

2. Find the triangle icon to the right of your compute cluster name and click it.

3. Wait a few minutes for the cluster to start.

4. Once the cluster is running, complete the steps above to select your cluster.

## Classroom Setup

Run the following cell to configure your working environment for this course.

**NOTE:** The `DA` object is only used in Databricks Academy courses and is not available outside of these courses. It will dynamically reference the information needed to run the course.

In [0]:
%run ../Includes/Classroom-Setup-05L

## Begin Lab

1. Set your current Catalog to **dbacademy** and your schema to your specific schema.

    **HINT**:
    - Catalog: `USE CATALOG`
    - Schema: `IDENTIFIER(DA.schema_name) (or you can hardcode your schema name)`

In [0]:
%python

# -- # Set the catalog and schema
spark.sql(f'USE CATALOG {DA.catalog_name}')
spark.sql(f'USE SCHEMA {DA.schema_name}')

In [0]:
<FILL_IN>

In [0]:
%skip
USE CATALOG dbacademy;
USE SCHEMA IDENTIFIER(DA.schema_name);

2. Run a query to view your current Catalog and schema. Verify that the results show the module's Catalog (**dbacademy**) and your specific schema.

In [0]:
<FILL_IN>

In [0]:
%skip
SELECT current_catalog(), current_schema()

3. View the available volumes in your schema and confirm that the **taxi_files** volume is listed.

In [0]:
SELECT current_catalog() AS catalog, current_schema() AS schema;

In [0]:
%python
# List tables in the specified schema
table_names = [
    t.name
    for t in spark.catalog.listTables('dbacademy.labuser13428579_1768153708')
]
print(table_names)

# List volumes using SHOW VOLUMES instead of LIST
volumes = spark.sql(
    f"SHOW VOLUMES IN {DA.catalog_name}.{DA.schema_name}"
)
display(volumes)

In [0]:
<FILL_IN>

In [0]:
%skip
SHOW VOLUMES;

4. List the files in the **taxi_files** volume and check the **name**  column to determine the file types stored in the volume. Ignore any additional files that begin with an underscore (_).

**HINT**: Use the following path format to access the volume: */Volumes/catalog_name/schema_name/volume_name/*.

In [0]:
SHOW VOLUMES IN dbacademy.labuser13428579_1768153708.taxi_files

In [0]:
<FILL_IN>

In [0]:
%skip
%python
spark.sql(f"LIST '/Volumes/{DA.catalog_name}/{DA.schema_name}/taxi_files'").display()

5. Query the volume path directly and preview the data in the file using the appropriate file format. Make sure to use backticks around the path to your volume.

**HINT**: SELECT * FROM \<file-format\>. \`\<path-to-volume-taxi_files\>\`

In [0]:
<FILL_IN>

In [0]:
%skip
%python
spark.sql(f'''SELECT *
FROM csv.`/Volumes/{DA.catalog_name}/{DA.schema_name}/taxi_files`
LIMIT 10
''').display()

6. Create a table in your schema called **taxitrips_bronze** that contains the following columns:
| Field Name | Field type |
| --- | --- |
| tpep_pickup_datetime | TIMESTAMP |
| tpep_dropoff_datetime | TIMESTAMP |
| trip_distance | DOUBLE |
| fare_amount | DOUBLE |
| pickup_zip | INT |
| dropoff_zip | INT |

**NOTE:** The DROP TABLE statement will drop the table if it already exists to avoid errors.

In [0]:
<FILL_IN>

In [0]:
%skip
DROP TABLE IF EXISTS taxitrips_bronze;

CREATE TABLE IF NOT EXISTS taxitrips_bronze (
  tpep_pickup_datetime TIMESTAMP,
  tpep_dropoff_datetime TIMESTAMP,
  trip_distance DOUBLE,
  fare_amount DOUBLE,
  pickup_zip INT,
  dropoff_zip INT
);

7. Use the [COPY INTO](https://docs.databricks.com/en/sql/language-manual/delta-copy-into.html) statement to populate the table with files from the **taxi_files** volume into the **taxitrips_bronze** table. Include the following options:
    - FROM `path-to-tax_files`
    - FILEFORMAT = '\<file-format\>'
    - FORMAT_OPTIONS
      - 'header' = 'true'
      - 'inferSchema' = 'true'

    Confirm 21,932 rows were inserted.

In [0]:
<FILL_IN>

In [0]:
%skip
%python
spark.sql(f'''COPY INTO taxitrips_bronze
  FROM '/Volumes/{DA.catalog_name}/{DA.schema_name}/taxi_files'
  FILEFORMAT = CSV
  FORMAT_OPTIONS ('header' = 'true', 'inferSchema' = 'true')
  ''').display()

8. Count the number of rows in the **taxitrips_bronze** table. Confirm that the table has 21,932 rows.

In [0]:
<FILL_IN>

In [0]:
%skip
SELECT count(*) as totalrows
FROM taxitrips_bronze;

9. View the **taxitrips_bronze** table's history. Confirm version 0 and version 1 are available.

In [0]:
<FILL_IN>

In [0]:
%skip
DESCRIBE HISTORY taxitrips_bronze;

10. Run the following script to delete all rows where **trip_distance** is less than *1*. Confirm *5,387* rows were deleted.

In [0]:
DELETE FROM taxitrips_bronze
  WHERE trip_distance < 1;

11. View the **taxitrips_bronze** table's history. View the **operation** column. View the version where the *DELETE* operation occurred.

In [0]:
<FILL_IN>

In [0]:
%skip
DESCRIBE HISTORY taxitrips_bronze;

12. Run a query to count the total number of rows in the current version of the **taxitrips_bronze** table. Confirm that the current table contains *16,545* rows.

**HINT:** By default the most recent version will be used.

In [0]:
<FILL_IN>

In [0]:
%skip
SELECT count(*) AS totalrows
FROM taxitrips_bronze;

13. Query the original version of the table to count the number of rows when it was first created. Confirm that the original table contains *21,932* rows.

**HINT:** FROM \<table> VERSION AS OF \<n>

In [0]:
<FILL_IN>

In [0]:
%skip
SELECT count(*) AS totalrows
FROM taxitrips_bronze VERSION AS OF 1;

**CHALLENGE**


14. Whoops! You made a mistake and didn't mean to delete the rows from earlier. Use the [RESTORE](https://docs.databricks.com/en/sql/language-manual/delta-restore.html) statement to restore a Delta table to the original state prior to the *DELETE* operation.

In [0]:
<FILL_IN>

In [0]:
%skip
RESTORE TABLE taxitrips_bronze TO VERSION AS OF 1;

15. View the history of the **taxitrips_bronze** table. Confirm the most recent version contains the **operation** *RESTORE*.

In [0]:
<FILL_IN>

In [0]:
%skip
DESCRIBE HISTORY taxitrips_bronze;

16. Count the total number of rows in the current **taxitrips_bronze** table. Confirm that the most recent version of the table contains *21,932* rows.

In [0]:
<FILL_IN>

In [0]:
%skip
SELECT count(*) as totalrows
FROM taxitrips_bronze;

17. Drop the **taxitrips_bronze** table.

In [0]:
<FILL_IN>

In [0]:
%skip
DROP TABLE IF EXISTS taxitrips_bronze;

### Summary
By completing this lab, you should now feel comfortable:
* Completing standard Delta Lake table creation and data manipulation commands
* Reviewing table metadata including table history
* Leverage Delta Lake versioning for snapshot queries and rollbacks

&copy; 2026 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br/><br/><a href="https://databricks.com/privacy-policy" target="_blank">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use" target="_blank">Terms of Use</a> | <a href="https://help.databricks.com/" target="_blank">Support</a>