
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img
    src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png"
    alt="Databricks Learning"
  >
</div>


# 2.3 Lab - Import Data into Databricks
### Duration: ~15 minutes

In this lab, you will ingest a series of JSON files and create a table (Delta table) from a Databricks volume.

### Objectives
- Demonstrate how to create a table by ingesting multiple raw JSON files stored in a Databricks volume using either `COPY INTO` or the `read_files` function.

## REQUIRED - SELECT A SHARED SQL WAREHOUSE

Before executing cells in this notebook, please select the **SHARED SQL WAREHOUSE** in the lab. Follow these steps:

1. Navigate to the top-right of this notebook and click the drop-down to select compute (it might say **Connect**). Complete one of the following below:

   a. Under **Recent resources**, check to see if you have a **shared_warehouse SQL**. If you do, select it.

   b. If you do not have a **shared_warehouse** under **Recent resources**, complete the following:

    - In the same drop-down, select **More**.

    - Then select the **SQL Warehouse** button.

    - In the drop-down, make sure **shared_warehouse** is selected.

    - Then, at the bottom of the pop-up, select **Start and attach**.

<br></br>
   <img src="../Includes/images/sql_warehouse.png" alt="SQL Warehouse" width="600">

## A. Classroom Setup

Run the following cell to configure your working environment for this notebook.

**NOTE:** The `DA` object is only used in Databricks Academy courses and is not available outside of these courses. It will dynamically reference the information needed to run the course in the lab environment.

### IMPORTANT LAB INFORMATION

Recall that your lab setup is created with the [0 - REQUIRED - Course Setup and Data Discovery]($../0 - REQUIRED - Course Setup and Data Discovery) notebook. If you end your lab session or if your session times out, your environment will be reset, and you will need to rerun the Course Setup notebook.

In [0]:
%run ../Includes/2.3-Classroom-Setup

## B. Lab Scenario

You are an analyst at DB Inc., and youâ€™ve been tasked with an assignment to ingest JSON files into a table. The company has placed several JSON files in the volume **dbacademy_retail.v01.retail-pipeline.customers.stream_json**, and your job is to read these JSON files into a structured table for later analysis.

Follow the steps below.

1. Start by viewing your default catalog and schema. Confirm that your current catalog is **samples** and your current schema is **nyctaxi**.

In [0]:
<FILL-IN>

In [0]:
%skip
SELECT current_catalog(), current_schema()

2. Modify the default catalog and schema to the following:

    - Set **dbacademy** as the default catalog.

    - Set your **labuser** schema as the default schema. When setting the default schema, use the `IDENTIFIER` clause with the `DA.schema_name` variable to set your schema.

    Run the cell and confirm that your default catalog and schema have been modified.


In [0]:
<FILL-IN>


---- Run below to confirm the default catalog and schema were modified
SELECT current_catalog(), current_schema()

In [0]:
%skip
USE CATALOG dbacademy;
USE SCHEMA IDENTIFIER(DA.schema_name);

---- Run below to confirm the default catalog and schema were modified
SELECT current_catalog(), current_schema()

3. Display all of the JSON files in the **/Volumes/dbacademy_retail/v01/retail-pipeline/customers/stream_json/** volume programmatically. Confirm that your volume contains 31 JSON files.

**HINTS:**
  - First, navigate to the volume using the navigation bar on the right to manually view the files.
  - Then, use the [LIST Statement](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-aux-list) to programmatically list all of the JSON files in the volume. You can insert the path of the volume using the UI.

In [0]:
<FILL-IN>

In [0]:
%skip
LIST '/Volumes/dbacademy_retail/v01/retail-pipeline/customers/stream_json/'

4. Query the JSON files to view the raw JSON data. This is a good way to see exactly how the JSON files are structured. Notice that each JSON files contains a list of dictionaries, where each dictionary represents a row of data.

**HINTS:**
  - [Query the data by path](https://docs.databricks.com/aws/en/query#query-data-by-path) to view the raw JSON files using `text` keyword.  
  - The path to volume is `/Volumes/dbacademy_retail/v01/retail-pipeline/customers/stream_json/`. Make sure to surround the volume path in backticks.
  - Add a `LIMIT` option to limit the results to 10 rows.
  - You can also download one of the JSON files from the volume and open it in your favorite editor as another approach to view the raw files.


**Small Example of the JSON File:**
```JSON
[
    {
        "name": "Brent Chavez",
        "email": "nelsonjoy@example.com",
        "address": "568 David Brook Apt. 524",
        "city": "Norwalk",
        "state": "CA",
        "zip_code": "45049",
        "operation": "NEW",
        "timestamp": 1632417981,
        "customer_id": 23056
    },
    {
        "name": "James Cruz",
        "email": "perkinsdeborah@example.net",
        "address": "741 Wendy Plains Apt. 143",
        "city": "San Francisco",
        "state": "CA",
        "zip_code": "42872",
        "operation": "NEW",
        "timestamp": 1632421305,
        "customer_id": 23057
    },
    ...
]
```

In [0]:
<FILL-IN>

In [0]:
%skip
SELECT *
FROM text.`/Volumes/dbacademy_retail/v01/retail-pipeline/customers/stream_json/`
LIMIT 10;

5. For a tabular display of the JSON files, query the files using `SELECT * FROM json.`. This will help you explore what the JSON files will look like as a table using the default arguments. Depending on the JSON file structure, this will read in as a table differently.

    **HINTS:**
      - [Query the data by path](https://docs.databricks.com/aws/en/query#query-data-by-path) to view the raw JSON files using `json`.
      - Add a `LIMIT` option to limit the results to 10 rows.

In [0]:
<FILL-IN>

In [0]:
%skip
SELECT *
FROM json.`/Volumes/dbacademy_retail/v01/retail-pipeline/customers/stream_json/`
LIMIT 10;

6. Now that you have explored the JSON files, your goal is to:
    - Create a table named **customers_lab** in the **dbacademy.labuser** schema.

    - Use the JSON files in **/Volumes/dbacademy_retail/v01/retail-pipeline/customers/stream_json/**.

    - You can use either the `COPY INTO` statement or the `read_files` function.

<br></br>
**HINTS:**
- [COPY INTO](https://docs.databricks.com/aws/en/sql/language-manual/delta-copy-into)
  - With `COPY INTO`, you will need to create a table first.
    - You can create the schema for the table.
    - If you create the table without a schema, look at the `COPY_OPTIONS ('mergeSchema' = 'true')` option to merge the schema when adding to the table.

- [read_files table-valued function](https://docs.databricks.com/aws/en/sql/language-manual/functions/read_files)


In [0]:
<FILL-IN>

In [0]:
%skip


-- read_files solution
DROP TABLE IF EXISTS customers_lab;

CREATE OR REPLACE TABLE customers_lab AS
SELECT *
FROM read_files(
  '/Volumes/dbacademy_retail/v01/retail-pipeline/customers/stream_json/',
  format => 'json'
);


-- COPY INTO solution
DROP TABLE IF EXISTS customers_lab;

CREATE TABLE customers_lab;

COPY INTO customers_lab
FROM '/Volumes/dbacademy_retail/v01/retail-pipeline/customers/stream_json/'
FILEFORMAT = JSON
COPY_OPTIONS ('mergeSchema' = 'true');

7. Run the following query to view your table. Confirm the following:
    - The **customers_lab** table contains 1,467 rows

    - The table contains the columns: **address, city, customer_id, email, name, operation, state, timestamp, zip_code**

In [0]:
SELECT *
FROM customers_lab;

&copy; 2025 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br/><br/><a href="https://databricks.com/privacy-policy" target="_blank">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use" target="_blank">Terms of Use</a> | <a href="https://help.databricks.com/" target="_blank">Support</a>