Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with Athena Engine v3 #15

Closed
Jrmyy opened this issue Nov 9, 2022 · 7 comments
Closed

Compatibility with Athena Engine v3 #15

Jrmyy opened this issue Nov 9, 2022 · 7 comments
Labels
documentation Improvements or additions to documentation

Comments

@Jrmyy
Copy link
Member

Jrmyy commented Nov 9, 2022

馃憢馃徎 Hello dbt-athena squad

For now the adapter uses the version 2 of Athena Engine, according to README. On 2022.10.13, Athena release the V3 of their Engine, reducing the gap between Athena and Trino features.

I don't know what we want to do about this :

  1. Does the adapter only support the engine v2 ?
  2. Since we are responsible of the SQL we are creating and we can configure the workgroup, is the engine version really limited in the adapter ?

FYI there are the breaking changes

@jessedobbelaere
Copy link
Member

jessedobbelaere commented Nov 9, 2022

I also assume that the adapter itself is not tightly coupled to Athena engine v2 specifics. The work_group param allows you to switch between a v2 or v3 workgroup indeed.

Personally, I don't have experience running on athena engine v3 yet, as I experienced some Athena errors such as HIVE_METASTORE_ERROR: Database cannot be a link for this operation when called on a table. when running a create table on lake formation governed tables, or a random java.lang.NullPointerException in Athena. I also saw dbt-athena users having errors or performance issues in the #dbt-athena slack thread. But I'll log AWS support tickets and take it for a spin in a month again and evaluate 馃憣

@nicor88
Copy link
Member

nicor88 commented Nov 9, 2022

I agree with you @jessedobbelaere , the adapter shouldn't have any issue on using v3, as it's tight to the workgroup.

Let's refer to this to understand if there is work to do. After some testing I noticed those:

  • When working with Iceberg tables, v3 is 20-30% faster
  • When working with parquet v3 is 20-30% slower
  • Queries need to be re-written to take in consideration the data type breaking changes.

The biggest breaking changes should be in the model level, not in the adapters internal.

Also, we have this repo https://github.com/dbt-athena/dbt-athena-tester to use as reference to run the same set of models when developing. @Jrmyy and @jessedobbelaere feel free to have a look and add relevant models if necessary to test v2 vs v3

@nicor88
Copy link
Member

nicor88 commented Nov 20, 2022

@Jrmyy I managed to use v3 with the adapter, I time to time need to apply explicit casting to the timestamp.
Here few:

  • when a timestamp field is overflowing I need to run cast(my_timestamp as timestamp(3))
  • I cannot use current_timestamp anymore, but rather cast(REPLACE(cast(current_timestamp as varchar), ' UTC', '') as timestamp(3)) as now maybe we create a macro?

I think that to tackle this issue, we could just add a section in the readme on how to solve common cases, to make extra smoother, a sort of enrichment of the athena docs.

@Jrmyy
Copy link
Member Author

Jrmyy commented Nov 20, 2022

Yes, I think we can tackle this using README.md, since now we will support both engine versions but with different features (CTAS & merge strategies for v3, temp parquet table for v2 + some data types stuff).

@nicor88 nicor88 added the documentation Improvements or additions to documentation label Nov 25, 2022
@Jrmyy
Copy link
Member Author

Jrmyy commented Dec 5, 2022

We finally decided to go only for a support of v3 engine concerning Iceberg tables, (i.e. if you use parquet tables, you can still use the v2 engine). (#64)
What drove us to this decision is :

  • V3 Engine is based on Trino and all the new features are going to be pushed on v3 engine on the Athena side.
  • V3 Engine is more performant on Iceberg tables

The consequences are, for Iceberg, you will need :

  • To have a workgroup configured with V3 Engine
  • Use unique table locations (uuid, table_unique or schema_table_unique according to documentation).

@Jrmyy Jrmyy pinned this issue Dec 5, 2022
@Jrmyy Jrmyy unpinned this issue Dec 5, 2022
@Jrmyy
Copy link
Member Author

Jrmyy commented Jan 20, 2023

Should we close this since the documentation makes it clearer now what you can and can't do with different athena adapter versions and different table types ?

@nicor88
Copy link
Member

nicor88 commented Jan 20, 2023

Yes please.

@Jrmyy Jrmyy closed this as completed Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants