Skip to content

Latest commit

 

History

History
76 lines (50 loc) · 3.03 KB

trino-create-delta-lake-tables-synapse.md

File metadata and controls

76 lines (50 loc) · 3.03 KB
title description ms.service ms.topic ms.date
Read Delta Lake tables (Synapse or External Location)
How to read external tables created in Synapse or other systems into a Trino cluster.
hdinsight-aks
how-to
10/19/2023

Read Delta Lake tables (Synapse or external location)

[!INCLUDE feature-in-preview]

This article provides an overview of how to read a Delta Lake table without having any access to the metastore (Synapse or other metastores without public access).

You can perform the following operations on the tables using Trino with HDInsight on AKS.

  • DELETE
  • UPDATE
  • INSERT
  • MERGE

Prerequisites

Create Delta Lake schemas and tables

This section shows how to create a Delta table over a pre-existing location given you already have a Delta Lake catalog configured.

  1. Browse the storage account using the Storage browser in the Azure portal to where the base directory of your table is. If this table originates in Synapse, it's likely under a synapse/workspaces/.../warehouse/ path and will be named after your table and contains a _delta_log directory. Select Copy URL from the three dots located next to the folder.

    You need to convert this http path into an ABFS (Azure Blob File System) path:

    The storage http path is structured like this: https://{{AZURE_STORAGE_ACCOUNT}}.blob.core.windows.net/{{AZURE_STORAGE_CONTAINER}}/synapse/workspaces/my_workspace/warehouse/{{TABLE_NAME}}/

    ABFS paths need to look like this: abfss://{{AZURE_STORAGE_CONTAINER}}@{{AZURE_STORAGE_ACCOUNT}}.dfs.core.windows.net/synapse/workspaces/my_workspace/warehouse/{{TABLE_NAME}}/

    Example: abfss://container@storageaccount.dfs.core.windows.net/synapse/workspaces/workspace_name/warehouse/table_name/

  2. Create a Delta Lake schema in Trino.

    CREATE SCHEMA delta.default;

    Alternatively, you can also create a schema in a specific storage account:

    CREATE SCHEMA delta.default WITH (location = 'abfss://container@storageaccount.dfs.core.windows.net/trino/');
  3. Use the register_table procedure to create the table.

    CALL delta.system.register_table(schema_name => 'default', table_name => 'table_name', table_location => 'abfss://container@storageaccount.dfs.core.windows.net/synapse/workspaces/workspace_name/warehouse/table_name/');
  4. Query the table to verify.

    SELECT * FROM delta.default.table_name

Write Delta Lake tables in Synapse Spark

Use format("delta") to save a dataframe as a Delta table, then you can use the path where you saved the dataframe as delta format to register the table in Trino.

my_dataframe.write.format("delta").save("abfss://container@storageaccount.dfs.core.windows.net/synapse/workspaces/workspace_name/warehouse/table_name")

Next steps

How to configure caching in Trino