title | description | ms.service | ms.topic | ms.date |
---|---|---|---|---|
Read Delta Lake tables (Synapse or External Location) |
How to read external tables created in Synapse or other systems into a Trino cluster. |
hdinsight-aks |
how-to |
10/19/2023 |
[!INCLUDE feature-in-preview]
This article provides an overview of how to read a Delta Lake table without having any access to the metastore (Synapse or other metastores without public access).
You can perform the following operations on the tables using Trino with HDInsight on AKS.
- DELETE
- UPDATE
- INSERT
- MERGE
This section shows how to create a Delta table over a pre-existing location given you already have a Delta Lake catalog configured.
-
Browse the storage account using the
Storage browser
in the Azure portal to where the base directory of your table is. If this table originates in Synapse, it's likely under asynapse/workspaces/.../warehouse/
path and will be named after your table and contains a_delta_log
directory. SelectCopy URL
from the three dots located next to the folder.You need to convert this http path into an ABFS (Azure Blob File System) path:
The storage http path is structured like this:
https://{{AZURE_STORAGE_ACCOUNT}}.blob.core.windows.net/{{AZURE_STORAGE_CONTAINER}}/synapse/workspaces/my_workspace/warehouse/{{TABLE_NAME}}/
ABFS paths need to look like this:
abfss://{{AZURE_STORAGE_CONTAINER}}@{{AZURE_STORAGE_ACCOUNT}}.dfs.core.windows.net/synapse/workspaces/my_workspace/warehouse/{{TABLE_NAME}}/
Example:
abfss://container@storageaccount.dfs.core.windows.net/synapse/workspaces/workspace_name/warehouse/table_name/
-
Create a Delta Lake schema in Trino.
CREATE SCHEMA delta.default;
Alternatively, you can also create a schema in a specific storage account:
CREATE SCHEMA delta.default WITH (location = 'abfss://container@storageaccount.dfs.core.windows.net/trino/');
-
Use the
register_table
procedure to create the table.CALL delta.system.register_table(schema_name => 'default', table_name => 'table_name', table_location => 'abfss://container@storageaccount.dfs.core.windows.net/synapse/workspaces/workspace_name/warehouse/table_name/');
-
Query the table to verify.
SELECT * FROM delta.default.table_name
Use format("delta")
to save a dataframe as a Delta table, then you can use the path where you saved the dataframe as delta format to register the table in Trino.
my_dataframe.write.format("delta").save("abfss://container@storageaccount.dfs.core.windows.net/synapse/workspaces/workspace_name/warehouse/table_name")