-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest/postgres): support extracting metadata from all databases in single recipe #7581
feat(ingest/postgres): support extracting metadata from all databases in single recipe #7581
Conversation
@@ -101,13 +105,14 @@ class PostgresConfig(BasicSQLAlchemyConfig): | |||
default=False, description="Include table lineage for views" | |||
) | |||
|
|||
def get_identifier(self: BasicSQLAlchemyConfig, schema: str, table: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required as we override get_identifier within PostgresSource
27a5d6f
to
b284ed3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some questions about this
...ta-ingestion/tests/integration/postgres/postgres_to_file_with_db_estimate_row_count copy.yml
Outdated
Show resolved
Hide resolved
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last question: what does this do to existing users of the postgres source? If they weren't specifying database previously, what was the existing behavior? Will the urns change based on this change?
The logic to construct urn remains same in this PR. Also, AFAIK, running postgres source without |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly concerned about whether we close inspector
connections. Otherwise a few style comments but LGTM!
logger.debug(f"sql_alchemy_url={url}") | ||
engine = create_engine(url, **self.config.options) | ||
with engine.connect() as conn: | ||
if self.config.database and self.config.database != "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant -- ""
is falsy
inspector = inspect( | ||
create_engine(url, **self.config.options).connect() | ||
) | ||
yield inspector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep conn
open as we run ingestion on each inspector
? If we store databases
in memory then we can move the iteration outside the with
block to close conn
. I don't think it really matters either way though
More importantly, are these inspectors getting closed?
url = self.config.get_sql_alchemy_url() | ||
engine = create_engine(url, **self.config.options) | ||
def _get_view_lineage_elements( | ||
self, inspector: Inspector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, how come we're using Inspector
s over a straight conn
(or whatever the result of .connect()
is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using connection directly does make sense indeed. especially for this piece _get_view_lineage_elements, as it does not require inspector. Inspector interface is used in sql_common to get metadata of tables, etc and probably that's why needs to be used here as well.
… in single recipe (#7581) Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
Checklist