Skip to content

Conversation

ghukill
Copy link
Contributor

@ghukill ghukill commented Mar 21, 2025

Purpose and background context

This PR adds an AthenaClient class for querying Athena.

It is likely we will create similar scaffolding in other projects (e.g. CURD project) for Athena querying. My hope is that we allow each project to move forward, where this AIP validation project is somewhat time sensitive, but keeping in mind we could/should work to normalize our Athena querying approach across projects. Particularly in these "early" days of Glue + Athena work.

As such, I think this represents a first pass, but does not need to be the final pass.

How can a reviewer manually see the effects of these changes?

Run Integration Tests

Experimenting with more substantial integration tests in this project -- which are not run as part of CI/CD pipeline -- that confirm the Athena querying is functional.

To run:

1- set Dev admin AWS credentials in terminal

2- set the the following env vars:

WORKSPACE=dev
SENTRY_DSN=None

AWS_ATHENA_DATABASE=cdps-storage-dev-222053980223-aip-database
AWS_ATHENA_WORK_GROUP=default-222053980223-us-east-1

3- run integration tests

make test-integration

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:

Querying Athena for S3 Inventory is an important part of the
AIP validation.  This standardizes how Athena is queried, using
the library PyAthena as a DB API driver for Athena, and
SQLAlchemy for a widely used DB connection wrapper.  This plays
nicely with libraries like pandas.

How this addresses that need:
* Adds new AthenaClient, that will get composed with the
soon-to-be AIPValidator class

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IR-182
@coveralls
Copy link

coveralls commented Mar 21, 2025

Pull Request Test Coverage Report for Build 14009428231

Details

  • 32 of 33 (96.97%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+2.0%) to 93.258%

Changes Missing Coverage Covered Lines Changed/Added Lines %
lambdas/utils/aws/athena.py 28 29 96.55%
Totals Coverage Status
Change from base Build 13998438953: 2.0%
Covered Lines: 83
Relevant Lines: 89

💛 - Coveralls

@ghukill ghukill marked this pull request as ready for review March 21, 2025 20:31
@ghukill ghukill requested a review from a team as a code owner March 21, 2025 20:31
Copy link

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really awesome! Thank you for providing a clear example of an integration test. Looking forward to seeing more of these in our testing modules!

One non-blocking comment: While we still haven't officially decided on a template for our Config module, in dspace-submission-composer (DSC), we did shift away from using __getattr__ in favor of explicitly defining property methods for each env variable. Just a thought/reminder!

@ghukill
Copy link
Contributor Author

ghukill commented Mar 24, 2025

One non-blocking comment: While we still haven't officially decided on a template for our Config module, in dspace-submission-composer (DSC), we did shift away from using __getattr__ in favor of explicitly defining property methods for each env variable. Just a thought/reminder!

Agreed. I felt a little funny with this "old style", but it didn't feel problematic in this context. But I'm making a note of this and will try and include that adjustment in a future PR. Thanks for raising!

@ghukill ghukill merged commit b2b2c15 into main Mar 24, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants