# Query PARQUET files

Serverless Synapse SQL pool enables you to read PARQUET files from Azure storage (DataLake or blob storage).

## Read parquet file

The easiest way to see to the content of your `PARQUET` file is to provide file URL to `OPENROWSET` function and specify parquet `FORMAT`. If the file is publicly available or if your Azure AD identity can access this file, you should be able to see the content of the file using the query like the one shown in the following example:

In [None]:
select top 10 *
from openrowset(
    bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet',
    format = 'parquet') as rows

## Data source usage

Previous example uses full path to the file. As an alternative, you can create an external data source with the location that points to the root folder of the storage, and use that data source and the relative path to the file in `OPENROWSET` function.

First you need to create `EXTERNAL DATA SOURCE` in some database:

In [None]:
create external data source covid
with ( location = 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases' );

Make sure that you create `EXTERNAL DATA SOURCE` in database other than `master`. If data source is protected with some credential you might need to create credential that is associated to data source.

Once you have properly configures data source, you can use it in `OPENROWSET` function:

In [None]:
select top 10 *
from openrowset(
        bulk 'latest/ecdc_cases.parquet',
        data_source = 'covid',
        format = 'parquet'
    ) as rows

## Explicitly specify schema

`OPENROWSET` enables you to explicitly specify what are the types of the columns that you want to read from the file using `WITH` clause:

In [None]:
select top 10 *
from openrowset(
        bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet',
        format = 'parquet'
    ) with ( date_rep date, cases int, geo_id varchar(6) ) as rows

PARQUET data types are by default mapped to SQL types. The following table describes how Parquet types are mapped to SQL native types.

| Parquet type | Parquet logical type (annotation) | SQL data type |
| --- | --- | --- |
| BOOLEAN |  | bit |
| BINARY / BYTE\_ARRAY |  | varbinary |
| DOUBLE |  | float |
| FLOAT |  | real |
| INT32 |  | int |
| INT64 |  | bigint |
| INT96 |  | datetime2 |
| FIXED\_LEN\_BYTE\_ARRAY |  | binary |
| BINARY | UTF8 | varchar \*(UTF8 collation) |
| BINARY | STRING | varchar \*(UTF8 collation) |
| BINARY | ENUM | varchar \*(UTF8 collation) |
| BINARY | UUID | uniqueidentifier |
| BINARY | DECIMAL | decimal |
| BINARY | JSON | varchar(max) \*(UTF8 collation) |
| BINARY | BSON | varbinary(max) |
| FIXED\_LEN\_BYTE\_ARRAY | DECIMAL | decimal |
| BYTE\_ARRAY | INTERVAL | varchar(max), serialized into standardized format |
| INT32 | INT(8, true) | smallint |
| INT32 | INT(16, true) | smallint |
| INT32 | INT(32, true) | int |
| INT32 | INT(8, false) | tinyint |
| INT32 | INT(16, false) | int |
| INT32 | INT(32, false) | bigint |
| INT32 | DATE | date |
| INT32 | DECIMAL | decimal |
| INT32 | TIME (MILLIS ) | time |
| INT64 | INT(64, true) | bigint |
| INT64 | INT(64, false ) | decimal(20,0) |
| INT64 | DECIMAL | decimal |
| INT64 | TIME (MICROS / NANOS) | time |
| INT64 | TIMESTAMP (MILLIS / MICROS / NANOS) | datetime2 |
| [Complex type](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists "https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists") | LIST | varchar(max), serialized into JSON |
| [Complex type](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps "https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps") | MAP | varchar(max), serialized into JSON |