
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img
    src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png"
    alt="Databricks Learning"
  >
</div>


# Creating and Working with a Delta Table 


## REQUIRED - SELECT CLASSIC COMPUTE

Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:


1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

2. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

   - Click **More** in the drop-down.

   - In the **Attach to an existing compute resource** window, use the first drop-down to select your unique cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

2. Find the triangle icon to the right of your compute cluster name and click it.

3. Wait a few minutes for the cluster to start.

4. Once the cluster is running, complete the steps above to select your cluster.

## Classroom Setup

Run the following cell to configure your working environment for this course.

**NOTE:** The `DA` object is only used in Databricks Academy courses and is not available outside of these courses. It will dynamically reference the information needed to run the course.

In [0]:
%run ../Includes/Classroom-Setup-01

## A. Explore your Catalog

**The Unity Catalog object model**

In Unity Catalog, all metadata is registered in a metastore. The hierarchy of database objects in any Unity Catalog metastore is divided into three levels, represented as a three-level namespace (catalog.schema.table-etc) when you reference tables, views, volumes, models, and functions.

![unity_catalog_object_model](../Includes/images/unity_catalog_object_model.png)


For more information check out [What is Unity Catalog?](https://docs.databricks.com/en/data-governance/unity-catalog/index.html)

####1. Viewing the Catalog and Schema
Complete the following to manually view the course catalog **dbacademy** and your schema:

- Select the Catalog icon ![catalog_icon](../Includes/images/catalog_icon.png) in the left navigation bar. 

- You should see your catalog name(**dbacademy**)

- Expand the **dbacademy** catalog. Within the catalog, you should see a variety of schemas (databases).

- Find your specific schema (begins with **labuser**). You can locate your schema in the classroom setup notes in the first cell.

- Expand your schema. Notice that your schema only contains a volume named **myfiles**.

- Expand your volume. The volume should contain a single CSV file named **employees.csv**.

####2. Defaulting Catalog and Schema
Execute the cells to set the current catalog to **dbacademy** and the current schema to your specific schema. This configuration avoids the need to use a two-level naming convention (catalog.schema) in your queries. The SELECT statement will display the name of your current catalog and schema. 


In [0]:
%sql
SELECT DA.schema_name -- Precreated SQL variable pointing to your unique schema

Let's modify our default catalog and schema using the `USE CATALOG` and `USE SCHEMA` statements. This eliminates the need to specify the three-level name for objects in your **labuser** schema (i.e., catalog.schema.object).

  - `USE CATALOG` – Sets the current catalog.

  - `USE SCHEMA` – Sets the current schema.

    **NOTE:** Since our dynamic schema name is stored in the SQL variable `DA.schema_name` as a string, we will need to use the `IDENTIFIER` clause to interpret the constant string in our variable as a schema name. The `IDENTIFIER` clause can interpret a constant string as any of the following:
    - Relation (table or view) name
    - Function name
    - Column name
    - Field name
    - Schema name
    - Catalog name

    [IDENTIFIER clause documentation](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-names-identifier-clause?language=SQL)

Run the following cell to set and view your default catalog and schema. Confirm that your default catalog is **dbacademy** and your schema is **labuser** (this uses the `DA.schema_name` variable created in the classroom setup script).

**NOTE:** Alternatively, you can simply add your schema name without using the `IDENTIFIER` clause.


In [0]:
%sql
USE CATALOG dbacademy;
USE SCHEMA IDENTIFIER(DA.schema_name);

SELECT current_catalog(), current_schema()

####3. Describing the Schema 
Use the [DESCRIBE SCHEMA EXTENDED](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-describe-schema.html) statement to display the metadata and properties of your schema.

In [0]:
%sql
DESCRIBE SCHEMA EXTENDED IDENTIFIER(DA.schema_name)

####4. Show Table
Use the [SHOW TABLES](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-show-tables.html) statement to display the available tables in your schema. You will notice that there are currently no tables available in your schema.

In [0]:
%sql
SHOW TABLES;

####5. Show Volumes
Use the [SHOW VOLUMES](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-show-volumes.html) statement to list all of the volumes available in your schema. Notice that you have a single volume available, **myfiles**.


**NOTE**: Volumes are Unity Catalog objects representing a logical volume of storage in a cloud object storage location. Volumes provide capabilities for accessing, storing, governing, and organizing files. While tables provide governance over tabular datasets, volumes add governance over non-tabular datasets. 

You can use volumes to store and access files in any format, including structured, semi-structured, and unstructured data.

[What are Unity Catalog volumes?](https://docs.databricks.com/en/volumes/index.html)

In [0]:
%sql
SHOW VOLUMES;

####6. Listing files/data in Volumes
Use the LIST statement to list all of the files in the **myfiles** volume. Notice that the volume only has the **employees.csv** file. It contains information about current employees. 

When interacting with data in volumes, you use the path provided by Unity Catalog, which always has the following format: `/Volumes/catalog_name/schema_name/volume_name/`.

For more information on exploring directories and data files managed with Unity Catalog volumes, check out the [Explore storage and find data files](https://docs.databricks.com/en/discover/files.html)  documentation.


In [0]:
spark.sql(f"LIST '/Volumes/dbacademy/{DA.schema_name}/myfiles'").display()

## B. Create a Delta Table from a CSV File

All saved tables on Databricks are Delta tables by default. Whether you’re using Apache Spark DataFrames or SQL, you get all the benefits of Delta Lake just by saving your data to the lakehouse with default settings.

1. Query the **employees.csv** file directly using SQL to view the file. Notice that the query returns a view of the CSV file with the headers as the first row of data.

**NOTE**: This syntax is specific to Spark SQL and allows you to read files directly without explicitly loading them into a table first. You specify the file format and enclose the file path in **backticks**. This method works for various file types.

*SELECT * FROM \<file_format>.\`/path/to/file`;*

In [0]:
spark.sql(f'''
SELECT *
FROM csv.`/Volumes/dbacademy/{DA.schema_name}/myfiles/` 
''').display()

2. You can use the *text* file format to view files as strings in a column. This enables you to view the contents of the file. Notice that the first row contains the column names and the fields are separated by commas.

In [0]:
spark.sql(f'''
SELECT * 
FROM text.`/Volumes/dbacademy/{DA.schema_name}/myfiles/`
''').display()

3. With SQL you can use the [read_files](https://docs.databricks.com/en/sql/language-manual/functions/read_files.html) table-valued function to read the CSV file into tabular form and apply specific options to modify how the file is read. 

    Execute the query and confirm that the results show 4 employees with valid column names.

**NOTE**: A rescued data column is provided by default to rescue any data that doesn’t match the schema.

In [0]:
%sql
SELECT *
FROM read_files(
  '/Volumes/dbacademy/' || DA.schema_name || '/myfiles/',
  format => 'csv',
  header => true,
  inferSchema => true
  )

4. Use the CREATE TABLE AS (CTAS) statement with the query from above to create a Delta table in the Lakehouse named **current_employees** using the **employees.csv** file. The table will be created in your schema.

**NOTE:** The CREATE TABLE statement will create a Delta table by default.

In [0]:
%sql
-- Drop the table if it already exists for demonstration purposes
DROP TABLE IF EXISTS current_employees;

-- Create a Delta table using the CSV file

CREATE TABLE current_employees AS
SELECT 
  ID, 
  FirstName, 
  Country, 
  Role 
FROM read_files(
  '/Volumes/dbacademy/' || DA.schema_name || '/myfiles/',
  format => 'csv',
  header => true,
  inferSchema => true
  );


-- Display table
SELECT *
FROM current_employees;

5. Complete the following steps to manually view the table in your schema:

   a. Select the Catalog icon ![catalog_icon](../Includes/images/catalog_icon.png) in the left navigation bar. 

   b. Find the catalog name **dbacademy**.

   c. Select the refresh icon ![refresh_icon](../Includes/images/refresh_icon.png) to refresh the catalog.

   d. Expand the **dbacademy** catalog. Within the catalog you should see a variety of schemas (databases).

   e. Find your schema. You can locate your schema in the classroom setup notes in the first cell. 

   f. Expand your schema. Notice that your schema contains **Volumes** and **Tables**.

   g. Expand **Tables**. Confirm that the **current_employees** Delta table is available.

(optional) To create a Delta table from a CSV file using Python, you can use the code below:

In [0]:
#
# Read the CSV file and create a Spark DataFrame
#
sdf = (spark
       .read
       .format("csv")
       .option('header', 'true')
       .option('inferSchema','true')
       .load(f'/Volumes/dbacademy/{DA.schema_name}/myfiles/')
    )


#
# Create a Delta table from the Spark DataFrame
#

(sdf
 .write
 .mode("overwrite")
 .format("delta")
 .saveAsTable(f"dbacademy.{DA.schema_name}.current_employees_py")
)

Read the Delta table using Python.

In [0]:
(spark
 .read
 .table(f"dbacademy.{DA.schema_name}.current_employees_py")
 .display()
)

6. Show the tables in your schema using Python. You should see a table named **current_employees** and **current_employees_py**.


In [0]:
spark.catalog.listTables(f"dbacademy.{DA.schema_name}")

7. Query the **current_employees** table to view the data. Confirm that it contains 4 rows of data with a list of current employees.

In [0]:
%sql
SELECT * 
FROM current_employees;

8. Use the DESCRIBE DETAIL statement to view detailed information about the **current_employees** table. View the results. 

    Notice the following:
    - The **format** column indicates that the **current_employees** table has been created as a Delta table.
    - The **location** column displays the cloud location of the table in the following format for AWS: 
        - *s3://\<bucket-name>/\<metastore id>/tables/\<table id>* 

**NOTES:** 
    - The results of the DESCRIBE DETAIL statement includes additional information about the Delta table. For more details, refer to the [Detail schema documentation](https://docs.databricks.com/en/delta/table-details.html#detail-schema).
    - For more information on how paths work for data managed by Unity Catalog check out the [How do paths work for data managed by Unity Catalog?](https://docs.databricks.com/en/data-governance/unity-catalog/paths.html) documentation.

In [0]:
%sql
DESCRIBE DETAIL current_employees;

9. Use the [DESCRIBE EXTENDED](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-describe-table.html) statement to display detailed information about the specified columns and additional metadata for the Delta table.

    Notice the following:
    - The top portion of the results displays column metadata of the table.
    - Scroll down the **col_name** column to find the *Type* value. Notice that the **data_type** specifies *Managed*. This indicates that Databricks manages the lifecycle and file layout for the table. Managed tables are the default table creation method.



In [0]:
%sql
DESCRIBE EXTENDED current_employees;

10. Execute the [DESCRIBE HISTORY](https://docs.databricks.com/en/sql/language-manual/delta-describe-history.html) statement on the **current_employees** table to retrieve Delta table history.

    Notice the following:
    - The **version** column indicates the table is on version 0
    - The **timestamp** column displays when the table was created
    - The **operation** column shows what operation was performed.
    - The **operationsMetrics** column display information about the number of files, number of output rows, and number of output bytes.

In [0]:
%sql
DESCRIBE HISTORY current_employees;

## C. Insert, Update and Delete Records in the Delta Table

1. View the current **current_employees** table. 

    Notice the following in the Delta table:
    - There are 4 columns and 4 rows
    - The **ID** values *1111*, *2222*, *3333*, *4444* are in the table
    - The  **Role** of **ID** *1111* is *Manager*

In [0]:
%sql
SELECT * 
FROM current_employees;

2. Perform the following operations on the **current_employees** Delta table:

   a. Insert two new employees named *Alex* and *Sanjay*. 

   b. Update the **Role** of employee **ID** *1111* to *Senior Manager*.

   c. Delete the record of the employee with the **ID** *3333*.

In [0]:
%sql
-- 1. Insert two employees into the table
INSERT INTO current_employees 
VALUES
    (5555, 'Alex','USA', 'Instructor'),
    (6666, 'Sanjay','India', 'Instructor');

-- 2. Update a record in the table
UPDATE current_employees
  SET Role = 'Senior Manager'
  WHERE ID = 1111;

-- 3. Delete a record in the table
DELETE FROM current_employees
  WHERE ID = 3333;

3. View the data in the **current_employees** table. Notice that the table has been modified from the original version.

In [0]:
%sql
SELECT *
FROM current_employees
ORDER BY ID;

4. Each operation that modifies a Delta Lake table creates a new table version. View the history of the table.

    Notice the following:
    - The table has versions 0 through 4. 
    - **Version 0** is the original table that was created.
    - **Version 1** contains the WRITE operation that inserted two new employees.
    - **Version 2** contains the UPDATE operation that modified the job role.
    - **Version 3** contains the DELETE operation that removed an employee.
    - **Version 4** contains the OPTIMIZE operation on the table. Predictive optimization is a feature in Delta Lake on Databricks that automatically optimizes Delta tables (optional feature). For more information, view the [Predictive optimization for Delta Lake
    ](https://docs.databricks.com/en/optimizations/predictive-optimization.html) documentation.

**NOTE:** The `OPTIMIZE` operation(s) might be in a different order.

In [0]:
%sql
DESCRIBE HISTORY current_employees;

## D. Use Time Travel to Read Previous Versions of the Delta Table
You can use history information to audit operations, rollback a table, or query a table at a specific point in time using time travel.

1. View the current version of the **current_employees** table. By default, the most recent version of the table will be used.

In [0]:
%sql
SELECT *
FROM current_employees
ORDER BY ID;

2. Use time travel to view the table prior to the DELETE operation. Notice that the results show 6 employees before the record was deleted.

**NOTE**: Time travel takes advantage of the power of the Delta Lake transaction log to access data that is no longer in the table.

In [0]:
%sql
SELECT *
FROM current_employees VERSION AS OF 2
ORDER BY ID;

-- Alternate syntax
-- SELECT *
-- FROM current_employees@v2
-- ORDER BY ID;

3. View the original table using version 0. Notice that the original 4 employees are displayed.

In [0]:
%sql
SELECT *
FROM current_employees VERSION AS OF 0
ORDER BY ID;

## E. Drop the Tables
1. Drop the Delta tables created in this demonstration.

In [0]:
%sql
DROP TABLE IF EXISTS current_employees;
DROP TABLE IF EXISTS current_employees_py;

&copy; 2026 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br/><br/><a href="https://databricks.com/privacy-policy" target="_blank">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use" target="_blank">Terms of Use</a> | <a href="https://help.databricks.com/" target="_blank">Support</a>