Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of metadata-based freshness #1060

Merged
merged 8 commits into from
Feb 14, 2024

Conversation

mikealfare
Copy link
Contributor

@mikealfare mikealfare commented Dec 18, 2023

resolves #938

Problem

The current implementation of source freshness requires querying the data and requires the user provide a datetime field. This is slower and more expensive than it should be. It doesn't scale across multiple models. And some models do not have an appropriate datetime field.

Solution

Use source metadata where available.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

@mikealfare mikealfare self-assigned this Dec 18, 2023
@cla-bot cla-bot bot added the cla:yes label Dec 18, 2023
@mikealfare mikealfare marked this pull request as ready for review December 19, 2023 02:22
@mikealfare mikealfare requested a review from a team as a code owner December 19, 2023 02:22
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this test because we're seeing odd behavior with the get_table method (a BQ Client method). If we run get_table during source freshness (line 726 in impl.py) on a table that does not exist, it hangs without returning the expected NotFound error, even when providing retry. However when we do the same here, it seems to behave as expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we understand this well enough to raise a bug with bigquery? https://github.com/googleapis/python-bigquery/issues

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's this issue: googleapis/python-bigquery#1674

@mikealfare mikealfare changed the base branch from main to 1.7.latest January 17, 2024 23:45
@mikealfare mikealfare changed the base branch from 1.7.latest to main January 17, 2024 23:49
@mikealfare
Copy link
Contributor Author

Removing backport 1.7.latest as this has actually been merged into 1.7.latest already.

@mikealfare mikealfare modified the milestone: 1.8.0 Feb 13, 2024
@mikealfare
Copy link
Contributor Author

This PR reflects the changes that were merged directly into 1.7.latest via #1072.

@mikealfare mikealfare merged commit 34eadae into main Feb 14, 2024
15 checks passed
@mikealfare mikealfare deleted the source-freshness/metadata-based branch February 14, 2024 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ADAP-912] Support Metadata Freshness
2 participants