Skip to content

Create CTAS focused function in wr.athena, and corresponding function supporting the cleanup of those tables #1165

@martin-kociemba

Description

@martin-kociemba

Is your idea related to a problem? Please describe.
In order to speed up the performance of a data flow that I am building, I am creating persisted tables with chosen partitions and buckets to optimize the join on a defined set of columns. AWS Data Wrangler does not support this use case to my understanding, so in the current implementation I am defining the CTAS statement manually and run it through pyathena.

After running the flow, I iterate through the list of created tables to drop them both from Athena/Glue and S3, since they are only created to speed up the flow in the spirit of temp tables.

Describe the solution you'd like
It would be great to have a function focused on creating a table as CTAS, that is focused on creating the table, without sending back the data as awswrangler.athena.read_sql_query does it. Parameters could be very similar to the parameters of awswrangler.athena.read_sql_query (e.g. sql, database, ctas_*) without those that are specific on how the data is send back to Python, since this would not be necessary. As a return argument, it should return the database and name of the table.

In addition, it would be extremely helpful if Data Wrangler could support the cleanup once the dataflow is finished. In that case the implementation is less obvious, since my impression was that usually Data Wrangler does not instantiate. One possible solution could be having the function in the prior paragraph return a kind of object that could be handed over as a parameter in a cleanup function, ideally as a list to manage the cleanup of all temp tables at once.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions