Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt-athena-community support #203

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

brabster
Copy link

@brabster brabster commented May 2, 2023

Description & motivation

resolves #274

PR based on #133

  • uses dbt-athena-community > 1.4.1, not compatible yet with earlier versions
  • removed JSON format source/tests for Athena, needs to be JSONL to work with Athena
  • PR raised for info and to check tests pass

Checklist

  • [/] I have verified that these changes work locally
  • [/] I have updated the README.md (if applicable)
  • [/] I have added an integration test for my fix/feature (if applicable)

@brabster brabster requested a review from jeremyyeo as a code owner May 2, 2023 12:40
@brabster
Copy link
Author

brabster commented May 3, 2023

Local run for info:

$ ./run_test.sh athena
Setting up virtual environment
Changing working directory: integration_tests
Starting integration tests
15:59:49  Running with dbt=1.4.6
15:59:50  Installing ../
15:59:50    Installed from <local @ ../>
15:59:50  Installing dbt-labs/dbt_utils
15:59:50    Installed from version 0.8.0
15:59:50    Updated version available: 1.1.0
15:59:50  
15:59:50  Updates available for packages: ['dbt-labs/dbt_utils']                 
Update your versions in packages.yml, then run dbt deps
15:59:52  Running with dbt=1.4.6
15:59:52  Found 0 models, 2 tests, 0 snapshots, 0 analyses, 560 macros, 0 operations, 1 seed file, 5 sources, 0 exposures, 0 metrics
15:59:52  
15:59:54  Concurrency: 1 threads (target='athena')
15:59:54  
15:59:54  1 of 1 START seed file dbt_external_tables_integration_tests_athena.people ..... [RUN]
16:00:04  1 of 1 OK loaded seed file dbt_external_tables_integration_tests_athena.people . [CREATE 200 in 10.79s]
16:00:04  
16:00:04  Finished running 1 seed in 0 hours 0 minutes and 12.15 seconds (12.15s).
16:00:04  
16:00:04  Completed successfully
16:00:04  
16:00:04  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
16:00:07  Running with dbt=1.4.6
16:00:07  No prep necessary, skipping
16:00:09  Running with dbt=1.4.6
16:00:09  Unable to do partial parsing because config vars, config profile, or config target have changed
16:00:11  1 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_unpartitioned
16:00:12  1 of 4 (1) drop table if exists `awsdatacatalog`.`dbt_external_tables_integration_tests_ath...  
16:00:13  1 of 4 (1) OK -1
16:00:13  1 of 4 (2) create external table `awsdatacatalog`.`dbt_external_tables_integration_tests_at...  
16:00:15  1 of 4 (2) OK -1
16:00:15  2 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_partitioned
16:00:17  2 of 4 (1) drop table if exists `awsdatacatalog`.`dbt_external_tables_integration_tests_ath...  
16:00:18  2 of 4 (1) OK -1
16:00:18  2 of 4 (2) create external table `awsdatacatalog`.`dbt_external_tables_integration_tests_at...  
16:00:20  2 of 4 (2) OK -1
16:00:20  2 of 4 (3) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
16:00:23  2 of 4 (3) OK -1
16:00:23  3 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned
16:00:26  3 of 4 (1) drop table if exists `awsdatacatalog`.`dbt_external_tables_integration_tests_ath...  
16:00:27  3 of 4 (1) OK -1
16:00:27  3 of 4 (2) create external table `awsdatacatalog`.`dbt_external_tables_integration_tests_at...  
16:00:29  3 of 4 (2) OK -1
16:00:29  3 of 4 (3) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
16:01:06  3 of 4 (3) OK -1
16:01:06  3 of 4 (4) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
16:01:42  3 of 4 (4) OK -1
16:01:42  3 of 4 (5) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
16:02:16  3 of 4 (5) OK -1
16:02:16  4 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned_hive_compatible
16:02:17  4 of 4 (1) drop table if exists `awsdatacatalog`.`dbt_external_tables_integration_tests_ath...  
16:02:19  4 of 4 (1) OK -1
16:02:19  4 of 4 (2) create external table `awsdatacatalog`.`dbt_external_tables_integration_tests_at...  
16:02:20  4 of 4 (2) OK -1
16:02:20  4 of 4 (3) msck repair table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena...  
16:02:25  4 of 4 (3) OK -1
16:02:27  Running with dbt=1.4.6
16:02:27  Unable to do partial parsing because config vars, config profile, or config target have changed
16:02:29  1 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_unpartitioned
16:02:30  1 of 4 SKIP
16:02:30  2 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_partitioned
16:02:33  2 of 4 (1) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
16:02:36  2 of 4 (1) OK -1
16:02:36  3 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned
16:02:39  3 of 4 (1) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
16:03:15  3 of 4 (1) OK -1
16:03:15  3 of 4 (2) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
16:03:51  3 of 4 (2) OK -1
16:03:51  3 of 4 (3) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
16:04:25  3 of 4 (3) OK -1
16:04:25  4 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned_hive_compatible
16:04:26  4 of 4 (1) msck repair table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena...  
16:04:31  4 of 4 (1) OK -1
16:04:33  Running with dbt=1.4.6
16:04:33  Found 0 models, 2 tests, 0 snapshots, 0 analyses, 560 macros, 0 operations, 1 seed file, 5 sources, 0 exposures, 0 metrics
16:04:33  
16:04:34  Concurrency: 1 threads (target='athena')
16:04:34  
16:04:34  1 of 2 START test dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_  [RUN]
16:04:38  1 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_  [PASS in 3.84s]
16:04:38  2 of 2 START test dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_  [RUN]
16:04:41  2 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_  [PASS in 2.77s]
16:04:41  
16:04:41  Finished running 2 tests in 0 hours 0 minutes and 7.58 seconds (7.58s).
16:04:41  
16:04:41  Completed successfully
16:04:41  
16:04:41  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

@brabster
Copy link
Author

brabster commented May 3, 2023

Re: #133 (comment) (need for quote_comment: key to get around invalid comment chars: this fix doesn't seem to work, at least in Athena engine v3. The whole query gets commented out

16:09:31  1 of 4 (2) create external table `awsdatacatalog`.`dbt_external_tables_integration_tests_at...  
16:10:52  Encountered an error while running operation: Runtime Error
  Runtime Error
    [ErrorCategory:USER_ERROR, ErrorCode:DDL_FAILED], Detail:FAILED: ParseException line 1:489 cannot recognize input near '<EOF>' '<EOF>' '<EOF>'

@brabster
Copy link
Author

brabster commented May 4, 2023

Note - dbt-athena/dbt-athena#161 effectively added a large subset of external tables functionality in dbt-athena itself. Might be worth trying to refactor that and utilise it to cut down on the duplicated logic in here

@aidan-o-boyle-kroo
Copy link

@brabster what's needed to get this PR approved? I'm happy to contribute.

@brabster
Copy link
Author

@aidan-o-boyle-kroo hi there! I've just pulled this, it is still working on dbt-athena-community 1.4.6 and works against latest 1.6.1 too.

$ ATHENA_TEST_DBNAME=AwsDataCatalog AWS_REGION=eu-west-2 ATHENA_TEST_BUCKET=my-redacted_bucket ATHENA_TEST_WORKGROUP=primary ./run_test.sh athena
Setting up virtual environment for dbt-athena
Changing working directory: integration_tests
Starting integration tests
19:25:28  Running with dbt=1.6.3
19:25:29  Installing ../
19:25:29  Installed from <local @ ../>
19:25:29  Installing dbt-labs/dbt_utils
19:25:29  Installed from version 0.8.0
19:25:29  Updated version available: 1.1.1
19:25:29  
19:25:29  Updates available for packages: ['dbt-labs/dbt_utils']                 
Update your versions in packages.yml, then run dbt deps
19:25:32  Running with dbt=1.6.3
19:25:32  Registered adapter: athena=1.6.1
19:25:32  Unable to do partial parsing because config vars, config profile, or config target have changed
19:25:34  Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 619 macros, 0 groups, 0 semantic models
19:25:34  
19:25:37  Concurrency: 1 threads (target='athena')
19:25:37  
19:25:37  1 of 1 START seed file dbt_external_tables_integration_tests_athena.people ..... [RUN]
19:25:48  1 of 1 OK loaded seed file dbt_external_tables_integration_tests_athena.people . [CREATE 200 in 11.03s]
19:25:48  
19:25:48  Finished running 1 seed in 0 hours 0 minutes and 14.18 seconds (14.18s).
19:25:48  
19:25:48  Completed successfully
19:25:48  
19:25:48  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
19:25:51  Running with dbt=1.6.3
19:25:51  Registered adapter: athena=1.6.1
19:25:51  Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 619 macros, 0 groups, 0 semantic models
19:25:51  No prep necessary, skipping
19:25:54  Running with dbt=1.6.3
19:25:54  Registered adapter: athena=1.6.1
19:25:54  Unable to do partial parsing because config vars, config profile, or config target have changed
19:25:57  Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 619 macros, 0 groups, 0 semantic models
19:25:57  1 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_unpartitioned
19:25:58  1 of 4 (1) drop table if exists `AwsDataCatalog`.`dbt_external_tables_integration_tests_ath...  
19:25:59  1 of 4 (1) OK -1
19:25:59  1 of 4 (2) create external table `AwsDataCatalog`.`dbt_external_tables_integration_tests_at...  
19:26:01  1 of 4 (2) OK -1
19:26:01  2 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_partitioned
19:26:02  2 of 4 (1) drop table if exists `AwsDataCatalog`.`dbt_external_tables_integration_tests_ath...  
19:26:03  2 of 4 (1) OK -1
19:26:03  2 of 4 (2) create external table `AwsDataCatalog`.`dbt_external_tables_integration_tests_at...  
19:26:05  2 of 4 (2) OK -1
19:26:05  2 of 4 (3) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
19:26:08  2 of 4 (3) OK -1
19:26:08  3 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned
19:26:10  3 of 4 (1) drop table if exists `AwsDataCatalog`.`dbt_external_tables_integration_tests_ath...  
19:26:11  3 of 4 (1) OK -1
19:26:11  3 of 4 (2) create external table `AwsDataCatalog`.`dbt_external_tables_integration_tests_at...  
19:26:12  3 of 4 (2) OK -1
19:26:12  3 of 4 (3) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
19:26:49  3 of 4 (3) OK -1
19:26:49  3 of 4 (4) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
19:27:25  3 of 4 (4) OK -1
19:27:25  3 of 4 (5) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
19:27:59  3 of 4 (5) OK -1
19:27:59  4 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned_hive_compatible
19:27:59  4 of 4 (1) drop table if exists `AwsDataCatalog`.`dbt_external_tables_integration_tests_ath...  
19:28:00  4 of 4 (1) OK -1
19:28:00  4 of 4 (2) create external table `AwsDataCatalog`.`dbt_external_tables_integration_tests_at...  
19:28:01  4 of 4 (2) OK -1
19:28:01  4 of 4 (3) msck repair table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena...  
19:28:04  4 of 4 (3) OK -1
19:28:07  Running with dbt=1.6.3
19:28:07  Registered adapter: athena=1.6.1
19:28:07  Unable to do partial parsing because config vars, config profile, or config target have changed
19:28:09  Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 619 macros, 0 groups, 0 semantic models
19:28:09  1 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_unpartitioned
19:28:10  1 of 4 SKIP
19:28:10  2 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_partitioned
19:28:12  2 of 4 (1) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
19:28:16  2 of 4 (1) OK -1
19:28:16  3 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned
19:28:17  3 of 4 (1) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
19:28:53  3 of 4 (1) OK -1
19:28:53  3 of 4 (2) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
19:29:29  3 of 4 (2) OK -1
19:29:29  3 of 4 (3) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
19:30:03  3 of 4 (3) OK -1
19:30:03  4 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned_hive_compatible
19:30:04  4 of 4 (1) msck repair table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena...  
19:30:06  4 of 4 (1) OK -1
19:30:09  Running with dbt=1.6.3
19:30:09  Registered adapter: athena=1.6.1
19:30:09  Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 619 macros, 0 groups, 0 semantic models
19:30:09  
19:30:10  Concurrency: 1 threads (target='athena')
19:30:10  
19:30:11  1 of 2 START test dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_  [RUN]
19:30:14  1 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_  [PASS in 3.83s]
19:30:14  2 of 2 START test dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_  [RUN]
19:30:17  2 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_  [PASS in 2.77s]
19:30:17  
19:30:17  Finished running 2 tests in 0 hours 0 minutes and 7.89 seconds (7.89s).
19:30:17  
19:30:17  Completed successfully
19:30:17  
19:30:17  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

I've love to get it merged, will remove draft label. Main concerns would be:

  • stability but it hasn't broken yet and probably better to have support than no support
  • lack of automated testing (Circle CI break is due to unset vars pointing to AWS), would need an AWS account configuring up and paying for, unsure what the policy is on that

I am depending on my fork for multiple projects now - you can kick tyres and check it's working for you that way I guess.

@brabster brabster changed the title DRAFT: dbt-athena-community support dbt-athena-community support Sep 26, 2023
@brabster brabster marked this pull request as ready for review September 26, 2023 19:41
@aidan-o-boyle-kroo
Copy link

@brabster
Copy link
Author

brabster commented Oct 1, 2023

We could - I'm not sure how effective a test that would be, and I'm not sure what the maintainers need in order to merge the PR. @jeremyyeo can you advise on what we'd need to do to get this PR merged in? 🙇‍♂️

@nicor88 nicor88 mentioned this pull request Jan 18, 2024
3 tasks
Copy link

This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days.

@Avinash-1394
Copy link

I'd also like to contribute whatever it takes to get this merged. This would be really helpful for our team.

@nicor88
Copy link

nicor88 commented Apr 27, 2024

@dataders who should we add as reviewer to merge this one? 🙏🏻
Quite some folks from the community mentioned dbt-external-tables in few occasions.

@brabster
Copy link
Author

I've just set it up again with latest dbt-athena-community against my personal AWS account. All appears to still be working fine, integration tests run and pass. I've added an example of minimal IAM permissions and defaulted a config value to assist with any future test automation setup. Also checked that the implementation does its own drop-if logic and so doesn't appear to inherit any inappropriate housekeeping behaviour from the adapter.

(venv) @brabster ➜ /workspaces/dbt-external-tables/integration_tests (dbt-athena-community-support) $ dbt test --target athena
16:59:28  Running with dbt=1.7.13
16:59:29  Registered adapter: athena=1.7.2
16:59:29  Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 683 macros, 0 groups, 0 semantic models
16:59:29  
16:59:30  Concurrency: 1 threads (target='athena')
16:59:30  
16:59:30  1 of 2 START test dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_  [RUN]
16:59:32  1 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_  [PASS in 2.59s]
16:59:32  2 of 2 START test dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_  [RUN]
16:59:35  2 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_  [PASS in 2.46s]
16:59:35  
16:59:35  Finished running 2 tests in 0 hours 0 minutes and 5.65 seconds (5.65s).
16:59:35  
16:59:35  Completed successfully
16:59:35  
16:59:35  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] Support AWS Athena
5 participants