AWS Athena Information Schema Error with Large Table Lists in dlt Pipeline

### dlt version

1.8.0

### Describe the problem

We encountered an issue in our dlt pipeline when querying AWS Athena's INFORMATION_SCHEMA.COLUMNS for multiple tables. The error we faced was:

Error listing table columns for catalog awsdatacatalog: Detected Iceberg type table without metadata location. Please make sure an Iceberg-enabled compute engine such as Athena or EMR Spark is used to create the table, or the table is created by using the Iceberg open source AWS library iceberg-aws.

This error happens due to the query executed against INFORMATION_SCHEMA:


```
sql:
SELECT table_name, column_name, data_type, is_nullable
    FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_schema = '{my_table_schema_name}' AND table_name IN ({127 elements}) 
ORDER BY table_name, ordinal_position
```

The query works without issues when the number of elements in the IN clause is less than or equal to 100. However, with larger lists (e.g., 127 elements), the query takes around 20 minutes to complete, leading to pipeline failure.

To mitigate this, we adjusted the INFO_TABLES_QUERY_THRESHOLD parameter by forking the dlt library and setting the value from 1000 to 40, thus avoiding the fallback mechanism defined in the library's code:

```
python
if (
    len(name_lookup) > self.INFO_TABLES_QUERY_THRESHOLD
    or len(",".join(folded_table_names)) > self.capabilities.max_query_length / 2
):
    logger.info(
        "Fallback to query all columns from INFORMATION_SCHEMA due to limited query length"
        " or table threshold"
    )
    folded_table_names = []
```

This workaround resolved the error.

### Expected behavior

The query against INFORMATION_SCHEMA.COLUMNS should operate efficiently regardless of the number of table names passed in the IN clause. The pipeline should successfully retrieve metadata without fallback or excessive execution delays.

### Steps to reproduce

1. Use AWS Athena as the SQL backend for the dlt pipeline.
2. Attempt a query similar to the following:
```
sql
SELECT table_name, column_name, data_type, is_nullable
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_schema = '{my_table_schema_name}' AND table_name IN ({127 elements}) 
ORDER BY table_name, ordinal_position

```
3. Observe the error or extremely delayed query execution if the number of elements in the IN clause exceeds 100.
4. Note that reducing the INFO_TABLES_QUERY_THRESHOLD parameter mitigates the issue.

### Operating system

Linux

### Runtime environment

Kubernetes

### Python version

3.9

### dlt data source

_No response_

### dlt destination

_No response_

### Other deployment details

_No response_

### Additional information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AWS Athena Information Schema Error with Large Table Lists in dlt Pipeline #2529

dlt version

Describe the problem

Expected behavior

Steps to reproduce

Operating system

Runtime environment

Python version

dlt data source

dlt destination

Other deployment details

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AWS Athena Information Schema Error with Large Table Lists in dlt Pipeline #2529

Description

dlt version

Describe the problem

Expected behavior

Steps to reproduce

Operating system

Runtime environment

Python version

dlt data source

dlt destination

Other deployment details

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions