Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3149] [Implementation] allow source freshness to be evaluated from warehouse metadata tables, instead of running select max(loaded_at_field) ... SQL queries #8704

Closed
1 task done
Tracked by #8316
graciegoheen opened this issue Sep 25, 2023 · 2 comments · Fixed by #8795
Assignees
Labels
enhancement New feature or request Impact: Adapters user docs [docs.getdbt.com] Needs better documentation

Comments

@graciegoheen
Copy link
Contributor

graciegoheen commented Sep 25, 2023

Housekeeping

  • I am a maintainer of dbt-core

Short description

From #7012

We should allow source freshness metadata to be collected from warehouse metadata tables when possible.

In this case, the loaded_at_field becomes optional for source freshness configuration:

  • if you provide loaded_at_field, we will use the SQL query select max(loaded_at_field) ... from ...
  • if you don’t provide loaded_at_field, we will get the source freshness from the warehouse metadata table
  • if the user omits the loaded_at_field while using an adapter without metadata support we will raise a warning

A significant benefit is that it doesn't rely on a specific column being consistently available - it provides a more generic implementation for the masses.

Acceptance criteria

  • Add a new static boolean property to the base adapter called METADATA_FRESHNESS_SUPPORT. It will default to False.
  • When an adapter implementation overrides this property and sets it to True, and the user has requested that freshness for a source be determined via metadata, dbt-core will look for a macro called get_relation_last_modified(information_schema, relations). The new macro will accept the name of the information schema and a List[BaseRelation]. It will issue and return the result of a query with the following output columns: database, schema, identifier, last_modified, and snapshotted_at. By accepting a list, we will allow core to batch metadata-freshness requests in the future.
  • In the YML description of a source, freshness can be requested by metadata, by excluding loaded_at_field from freshness config.
  • If the user omits the loaded at field while using an adapter without metadata support we will raise a warning

Impact to Other Teams

Impact adapters

Will backports be required?

No

Context

No response

@graciegoheen graciegoheen added the user docs [docs.getdbt.com] Needs better documentation label Sep 25, 2023
@github-actions github-actions bot changed the title [Implementation] allow source freshness to be evaluated from warehouse metadata tables, instead of running select max(loaded_at_field) ... SQL queries [CT-3149] [Implementation] allow source freshness to be evaluated from warehouse metadata tables, instead of running select max(loaded_at_field) ... SQL queries Sep 25, 2023
@graciegoheen graciegoheen added Impact: Adapters enhancement New feature or request labels Sep 25, 2023
@graciegoheen
Copy link
Contributor Author

graciegoheen commented Oct 9, 2023

Current behavior: source freshness is “turned off” by default, even when you’ve added an explicit freshness: config. The only way to “turn it on” is to explicitly configure a loaded_at_field:.

New behavior: when you add a freshness: config to a source, we will calculate freshness for that source.

I am in favor of moving forward with this implementation. This is a good change. Our current behavior is nonintuitive.

There are two ways this could affect customers’ freshness checks:

  • we calculate freshness for sources (via metadata tables), where they actually didn’t want to calculate freshness -> this doesn’t concern me because this is additive
  • we return an error because they’re using a warehouse that doesn’t support metadata tables -> what if we returned a warning instead of an error so this was non-breaking?

I believe this is niche behavior with a clear workaround (setting freshness: null for sources you don’t want to get freshness for).

Because we had previously documented this behavior:

  • we need to call this out as a behavior change in our 1.7 migration guide
  • we need to provide guidance for folks who were previously relying on this behavior - they will have to go in and explicitly mark which sources they don’t want to calculate freshness for by setting freshness: null

@peterallenwebb
Copy link
Contributor

For the benefit of reviewers and future historians, I've revised the issue description to reflect the previous discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Impact: Adapters user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants