# Query Delta Lake folders

Serverless Synapse SQL pool enables you to read Delta Lake files from Azure storage (DataLake or blob storage).

![Delta Lake folder](img/covid-delta-lake-studio.png)

## Read Delta Lake folder

The easiest way to see to the content of your Delta Lake file is to provide Delta Lake folder URL to `OPENROWSET` function and specify parquet `DELTA`. If the file is publicly available or if your Azure AD identity can access this file, you should be able to see the content of the file using the query like the one shown in the following example:

In [7]:
select top 10 *
from openrowset(
    bulk 'https://sqlondemandstorage.blob.core.windows.net/delta-lake/covid/',
    format = 'delta') as rows

date_rep,day,month,year,cases,deaths,countries_and_territories,geo_id,country_territory_code,pop_data_2018,continent_exp,load_date,iso_country
2020-12-14,14,12,2020,746,6,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-13,13,12,2020,298,9,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-12,12,12,2020,113,11,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-11,11,12,2020,63,10,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-10,10,12,2020,202,16,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-09,9,12,2020,135,13,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-08,8,12,2020,200,6,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-07,7,12,2020,210,26,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-06,6,12,2020,234,10,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-05,5,12,2020,235,18,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF


## Data source usage

Previous example uses full path to the file. As an alternative, you can create an external data source with the location that points to the root folder of the storage, and use that data source and the relative path to the file in `OPENROWSET` function.

First you need to create `EXTERNAL DATA SOURCE` in some database:

In [None]:
create external data source DeltaLakeStorage
with ( location = 'https://sqlondemandstorage.blob.core.windows.net/delta-lake/' );

Make sure that you create `EXTERNAL DATA SOURCE` in database other than `master`. If data source is protected with some credential you might need to create credential that is associated to data source.

Once you have properly configures data source, you can use it in `OPENROWSET` function:

In [5]:
select top 10 *
from openrowset(
        bulk 'covid',
        data_source = 'DeltaLakeStorage',
        format = 'delta'
    ) as rows

date_rep,day,month,year,cases,deaths,countries_and_territories,geo_id,country_territory_code,pop_data_2018,continent_exp,load_date,iso_country
2020-12-14,14,12,2020,746,6,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-13,13,12,2020,298,9,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-12,12,12,2020,113,11,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-11,11,12,2020,63,10,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-10,10,12,2020,202,16,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-09,9,12,2020,135,13,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-08,8,12,2020,200,6,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-07,7,12,2020,210,26,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-06,6,12,2020,234,10,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF
2020-12-05,5,12,2020,235,18,Afghanistan,AF,AFG,,Asia,2021-05-11 00:07:13,AF


## Explicitly specify schema

The `OPENROWSET` function enables you to explicitly specify what are the types of the columns that you want to read from the file using `WITH` clause:

In [6]:
select top 10 *
from openrowset(
        bulk 'covid',
        data_source = 'DeltaLakeStorage',
        format = 'delta'
    )
    with ( date_rep date,
           cases int,
           geo_id varchar(6)
           ) as rows

date_rep,cases,geo_id
2020-12-14,746,AF
2020-12-13,298,AF
2020-12-12,113,AF
2020-12-11,63,AF
2020-12-10,202,AF
2020-12-09,135,AF
2020-12-08,200,AF
2020-12-07,210,AF
2020-12-06,234,AF
2020-12-05,235,AF


Delta Lake data types are by default mapped to SQL types. The following table describes how Parquet types are mapped to SQL native types.

| Parquet type | Parquet logical type (annotation) | SQL data type |
| --- | --- | --- |
| BOOLEAN |  | bit |
| BINARY / BYTE\_ARRAY |  | varbinary |
| DOUBLE |  | float |
| FLOAT |  | real |
| INT32 |  | int |
| INT64 |  | bigint |
| INT96 |  | datetime2 |
| FIXED\_LEN\_BYTE\_ARRAY |  | binary |
| BINARY | UTF8 | varchar \*(UTF8 collation) |
| BINARY | STRING | varchar \*(UTF8 collation) |
| BINARY | ENUM | varchar \*(UTF8 collation) |
| BINARY | UUID | uniqueidentifier |
| BINARY | DECIMAL | decimal |
| BINARY | JSON | varchar(max) \*(UTF8 collation) |
| BINARY | BSON | varbinary(max) |
| FIXED\_LEN\_BYTE\_ARRAY | DECIMAL | decimal |
| BYTE\_ARRAY | INTERVAL | varchar(max), serialized into standardized format |
| INT32 | INT(8, true) | smallint |
| INT32 | INT(16, true) | smallint |
| INT32 | INT(32, true) | int |
| INT32 | INT(8, false) | tinyint |
| INT32 | INT(16, false) | int |
| INT32 | INT(32, false) | bigint |
| INT32 | DATE | date |
| INT32 | DECIMAL | decimal |
| INT32 | TIME (MILLIS ) | time |
| INT64 | INT(64, true) | bigint |
| INT64 | INT(64, false ) | decimal(20,0) |
| INT64 | DECIMAL | decimal |
| INT64 | TIME (MICROS / NANOS) | time |
| INT64 | TIMESTAMP (MILLIS / MICROS / NANOS) | datetime2 |
| [Complex type](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists) | LIST | varchar(max), serialized into JSON |
| [Complex type](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps) | MAP | varchar(max), serialized into JSON |