Skip to content

Support specifying dtypes in delete_from_iceberg for empty/null columns #3077

@tlinkin

Description

@tlinkin

Describe the bug

When calling wr.athena.delete_from_iceberg_table(...) on a DataFrame that contains one or more columns with entirely null values, AWS Wrangler is unable to infer the Athena data type and raises an UndetectedType exception. This happens because delete_from_iceberg_table writes out to a temporary table first, and the columns with only null values are typed as object without additional type information.

How to Reproduce

Traceback (most recent call last):
  ...
  File "/home/ray/anaconda3/lib/python3.12/site-packages/awswrangler/_data_types.py", line 663, in athena_types_from_pandas
    raise exceptions.UndetectedType(
awswrangler.exceptions.UndetectedType: Impossible to infer the equivalent Athena data type for the accounting_document column. It is completely empty (only null values) ...

Expected behavior

Enable passing explicit dtype (or similar schema specification) to delete_from_iceberg so that AWS Wrangler can properly set column types for temporary tables even if the columns are completely null. This mirrors the approach used in other Wrangler functions (e.g. wr.s3.to_parquet) which accept a dtype argument

Your project

No response

Screenshots

No response

OS

Ray

Python version

3.12

AWS SDK for pandas version

3.10.1

Additional context

Why is this needed?:

  • Currently, when working with partially or completely null columns, delete_from_iceberg fails because it cannot infer the data type automatically.
  • Being able to specify dtype would allow users to inform Wrangler about the correct column type, preventing the UndetectedType error.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions