title | description | ms.reviewer | ms.author | author | ms.topic | ms.custom | ms.date | ||
---|---|---|---|---|---|---|---|---|---|
Ingest data into OneLake and analyze with Azure Databricks |
Learn how to use pipelines to ingest data into OneLake and analyze that data with Azure Databricks. |
eloldag |
eloldag |
eloldag |
how-to |
|
09/27/2023 |
In this guide, you will:
-
Create a pipeline in a workspace and ingest data into your OneLake in Delta format.
-
Read and modify a Delta table in OneLake with Azure Databricks.
Before you start, you must have:
-
A workspace with a Lakehouse item.
-
A premium Azure Databricks workspace. Only premium Azure Databricks workspaces support Microsoft Entra credential passthrough. When creating your cluster, enable Azure Data Lake Storage credential passthrough in the Advanced Options.
-
A sample dataset.
-
Navigate to your lakehouse in the Power BI service and select Get data and then select New data pipeline.
:::image type="content" source="media\onelake-open-access-quickstart\onelake-new-pipeline.png" alt-text="Screenshot showing how to navigate to new data pipeline option from within the UI.":::
-
In the New Pipeline prompt, enter a name for the new pipeline and then select Create.
-
For this exercise, select the NYC Taxi - Green sample data as the data source and then select Next.
:::image type="content" source="media\onelake-open-access-quickstart\onelake-nyc-taxi.png" alt-text="Screenshot showing how to select NYC sample semantic model.":::
-
On the preview screen, select Next.
-
For data destination, select the name of the lakehouse you want to use to store the OneLake Delta table data. You can choose an existing lakehouse or create a new one.
:::image type="content" source="media\onelake-open-access-quickstart\onelake-dest-lake.png" alt-text="Screenshot showing how to select destination lakehouse.":::
-
Select where you want to store the output. Choose Tables as the Root folder and enter "nycsample" as the table name.
-
On the Review + Save screen, select Start data transfer immediately and then select Save + Run.
:::image type="content" source="media\onelake-open-access-quickstart\onelake-final-pipeline-review.png" alt-text="Screenshot showing how to enter table name.":::
-
When the job is complete, navigate to your lakehouse and view the delta table listed under /Tables.
-
Copy the Azure Blob Filesystem (ABFS) path to your delta table to by right-clicking the table name in the Explorer view and selecting Properties.
-
Open your Azure Databricks notebook. Read the Delta table on OneLake.
olsPath = "abfss://<replace with workspace name>@onelake.dfs.fabric.microsoft.com/<replace with item name>.Lakehouse/Tables/nycsample" df=spark.read.format('delta').option("inferSchema","true").load(olsPath) df.show(5)
-
Update the Delta table data by changing a field value.
%sql update delta.`abfss://<replace with workspace name>@onelake.dfs.fabric.microsoft.com/<replace with item name>.Lakehouse/Tables/nycsample` set vendorID = 99999 where vendorID = 1;