Skip to content

Latest commit

 

History

History
77 lines (48 loc) · 3.33 KB

onelake-open-access-quickstart.md

File metadata and controls

77 lines (48 loc) · 3.33 KB
title description ms.reviewer ms.author author ms.topic ms.custom ms.date
Ingest data into OneLake and analyze with Azure Databricks
Learn how to use pipelines to ingest data into OneLake and analyze that data with Azure Databricks.
eloldag
eloldag
eloldag
how-to
build-2023
ignite-2023
09/27/2023

Ingest data into OneLake and analyze with Azure Databricks

In this guide, you will:

  • Create a pipeline in a workspace and ingest data into your OneLake in Delta format.

  • Read and modify a Delta table in OneLake with Azure Databricks.

Prerequisites

Before you start, you must have:

  • A workspace with a Lakehouse item.

  • A premium Azure Databricks workspace. Only premium Azure Databricks workspaces support Microsoft Entra credential passthrough. When creating your cluster, enable Azure Data Lake Storage credential passthrough in the Advanced Options.

  • A sample dataset.

Ingest data and modify the Delta table

  1. Navigate to your lakehouse in the Power BI service and select Get data and then select New data pipeline.

    :::image type="content" source="media\onelake-open-access-quickstart\onelake-new-pipeline.png" alt-text="Screenshot showing how to navigate to new data pipeline option from within the UI.":::

  2. In the New Pipeline prompt, enter a name for the new pipeline and then select Create.

  3. For this exercise, select the NYC Taxi - Green sample data as the data source and then select Next.

    :::image type="content" source="media\onelake-open-access-quickstart\onelake-nyc-taxi.png" alt-text="Screenshot showing how to select NYC sample semantic model.":::

  4. On the preview screen, select Next.

  5. For data destination, select the name of the lakehouse you want to use to store the OneLake Delta table data. You can choose an existing lakehouse or create a new one.

    :::image type="content" source="media\onelake-open-access-quickstart\onelake-dest-lake.png" alt-text="Screenshot showing how to select destination lakehouse.":::

  6. Select where you want to store the output. Choose Tables as the Root folder and enter "nycsample" as the table name.

  7. On the Review + Save screen, select Start data transfer immediately and then select Save + Run.

    :::image type="content" source="media\onelake-open-access-quickstart\onelake-final-pipeline-review.png" alt-text="Screenshot showing how to enter table name.":::

  8. When the job is complete, navigate to your lakehouse and view the delta table listed under /Tables.

  9. Copy the Azure Blob Filesystem (ABFS) path to your delta table to by right-clicking the table name in the Explorer view and selecting Properties.

  10. Open your Azure Databricks notebook. Read the Delta table on OneLake.

    olsPath = "abfss://<replace with workspace name>@onelake.dfs.fabric.microsoft.com/<replace with item name>.Lakehouse/Tables/nycsample" 
    df=spark.read.format('delta').option("inferSchema","true").load(olsPath)
    df.show(5)
  11. Update the Delta table data by changing a field value.

    %sql
    update delta.`abfss://<replace with workspace name>@onelake.dfs.fabric.microsoft.com/<replace with item name>.Lakehouse/Tables/nycsample` set vendorID = 99999 where vendorID = 1;

Related content