Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: "PostgresMetadataExtractor doesn't discover Redshift late binding views" #356

Conversation

nathanlawrence-asana
Copy link
Contributor

@nathanlawrence-asana nathanlawrence-asana commented Sep 2, 2020

Summary of Changes

Splits PostgresMetadataExtractor into BasePostgresMetadataExtractor, PostgresMetadataExtractor and RedshiftMetadataExtractor (the latter two inherit from BasePostgresMetadataExtractor).

This allows us to safely modify the SQL_QUERY constant for Redshift-based deployments to pull late binding views into Amundsen.

See amundsen-io/amundsen#672

Tests

I'm happy to write tests once folks are happy with the general structure of the PR (this is my first time contributing :) ).

My understanding is that I should be adding a new test for the RedshiftMetadataExtractor similar to what we have currently for the PostgresMetadataExtractor.

Documentation

I'm happy to update documentation once folks are happy with the general structure of the PR (this is my first time contributing :) ).

My understanding is that the main place that needs to be updated is adding a new section to README.md

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
  • PR passes make test

@nathanlawrence-asana nathanlawrence-asana changed the title Fix "PostgresMetadataExtractor doesn't discover Redshift late binding views" fix "PostgresMetadataExtractor doesn't discover Redshift late binding views" Sep 2, 2020
@nathanlawrence-asana nathanlawrence-asana force-pushed the nathanlawrence-redshift_metadata_extractor branch from 1641197 to 043f190 Compare September 2, 2020 18:06
@feng-tao
Copy link
Member

feng-tao commented Sep 2, 2020

some checks are failing

@feng-tao feng-tao added the keep fresh Disables stalebot from closing an issue label Sep 3, 2020
@nathanlawrence-asana nathanlawrence-asana changed the title fix "PostgresMetadataExtractor doesn't discover Redshift late binding views" fix: "PostgresMetadataExtractor doesn't discover Redshift late binding views" Sep 3, 2020
@nathanlawrence-asana nathanlawrence-asana force-pushed the nathanlawrence-redshift_metadata_extractor branch 2 times, most recently from a937506 to ed5657b Compare September 3, 2020 17:39
Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the contribution. Doe the refactor / new feature cover by the existing unit test?

@nathanlawrence-asana
Copy link
Contributor Author

thanks for the contribution. Doe the refactor / new feature cover by the existing unit test?

It's not, and I'm not sure what the best way to test this is. The underlying issue is that the SQL we're running to pull tables and views in Redshift misses a category of views.

From looking at the unit tests currently in Amundsen, none of them actually run any SQL (they just mock the results from SQL Alchemy). I could write some tests for the new RedshiftMetadataExtractor that fit that pattern, but they wouldn't really be testing the underlying logic of the change.

P.S. I'm also on the Amundsen Slack (Nathan Lawrence) if you'd like to chat synchronously sometime tomorrow. I work 9-5pm pacific and I'm mostly free tomorrow.

@nathanlawrence-asana
Copy link
Contributor Author

I'll also add that I've tested this on a local installation and verified that the changes work as expected.

@feng-tao
Copy link
Member

feng-tao commented Sep 3, 2020

yeah, I think it is fine, if it is a SQL change. I will take a look at the pr in detail. thanks for the clarification.

@feng-tao
Copy link
Member

feng-tao commented Sep 4, 2020

the change lgtm (I don't know much about the actual SQL, but based on your local testing, I assume it should work as expected), if you could add some basic unit tests, we are good to go!

Signed-off-by: Nathan Lawrence <nathanlawrence@asana.com>
@nathanlawrence-asana nathanlawrence-asana force-pushed the nathanlawrence-redshift_metadata_extractor branch from ed5657b to 3bc4f74 Compare September 4, 2020 18:43
@nathanlawrence-asana
Copy link
Contributor Author

Thanks. Updated to copy over some of the Postgres tests and ensure that they work with the similar Redshift extractor.

In the process, I made some updates to the Postgres tests. The scope that we were passing the extractor was wrong, and the tests were really just using the default config parameters for most tests. This only showed up when I copied the tests over because the database key check started to fail (checking against the passed redshift vs. the default postgres).

@feng-tao
Copy link
Member

feng-tao commented Sep 7, 2020

thanks @nathanlawrence-asana

@feng-tao feng-tao merged commit 4113cfd into amundsen-io:master Sep 7, 2020
@nathanlawrence-asana nathanlawrence-asana deleted the nathanlawrence-redshift_metadata_extractor branch September 14, 2020 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep fresh Disables stalebot from closing an issue
Projects
None yet
2 participants