Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3515] [Feature] Source freshness: allow loaded_at_field set to null at table level to overwrite default value set at the source level #9320

Closed
2 tasks done
Tracked by #9425
kokorin opened this issue Dec 29, 2023 · 6 comments · Fixed by #10065
Assignees
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors
Milestone

Comments

@kokorin
Copy link

kokorin commented Dec 29, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

While defining source freshness we use 2 properties: freshness and loaded_at_field.
Value resolution is different (from hierarchy perspective) for these 2 properties:

When freshness is set to None at table level (using freshness: or freshness: null) it overwrites freshness set at source level, but loaded_at_field set to None at table level doesn't overwrite value set at source level.

Expected Behavior

loaded_at_field set to None at table level overwrites value set at source level

Steps To Reproduce

version: 2
sources:
  - name: source_name
    database: db_name
    schema: schema_name
    freshness:
      warn_after:
        count: 1
        period: day
      error_after:
        count: 4
        period: day
    loaded_at_field: loaded_at
    tables:
      - name: loaded_at_is_overwritten
        loaded_at_field: xxx
      - name: loaded_at_is_NOT_overwritten
        loaded_at_field: null
      - name: loaded_at_is_NOT_overwritten2
        loaded_at_field: 
      - name: loaded_at_is_NOT_overwritten3
        loaded_at_field: ~
      - name: freshness_disabled_OK
        freshness: null

Relevant log output

No response

Environment

- OS: Windows/Linux
- Python: 3.11
- dbt-core: 1.7.4
- dbt-snowflake: 1.7.1

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

@kokorin kokorin added bug Something isn't working triage labels Dec 29, 2023
@github-actions github-actions bot changed the title [Bug] Source freshness: loaded_at_field set to null at table level doesn't overwrite value set at source level [CT-3515] [Bug] Source freshness: loaded_at_field set to null at table level doesn't overwrite value set at source level Dec 29, 2023
@kokorin
Copy link
Author

kokorin commented Dec 29, 2023

Here is DBT project (using sqlite) which shows the error:
yaml_null_error.zip

Steps to reproduce:

  1. unpack
  2. pip install -r requirements.txt
  3. dbt seed
  4. dbt source freshness

Sources yaml (present in the project):

version: 2
sources:
  - name: sources_db
    database: test
    schema: main
    freshness:
      warn_after:
        count: 1
        period: day
      error_after:
        count: 4
        period: day
    loaded_at_field: absent
    tables:
      - name: freshness_fails
        identifier: example
        loaded_at_field: null
        description: |
          loaded_at_field is set to null and thus DBT should use warehouse metadata (at least try)
          Notice error message:
          16:07:56    Runtime Error in source freshness_fails (models\sources.yml)
            Database Error
              no such column: absent
      - name: freshness_skipped
        identifier: example
        freshness: null

@dbeatty10 dbeatty10 self-assigned this Jan 2, 2024
@dbeatty10
Copy link
Contributor

Thanks for reaching out @kokorin !

The crux is that it's tricky to within the dbt implementation details to determine the difference between an explicit null / none and an implicit one. Our current implementation treats them both as the same thing, and I don't think we're inclined to change it at this point.

Either way, it doesn't look like dbt is behaving erroneously here, so I'm going to re-categorize this as a feature request.

Have you tried either of the following instead?

Option 1

Two sources. One for freshness from metadata and the other based off column names.

  # source tables with source freshness from metadata
  - name: sources_with_metadata_based_freshness
    database: db_name
    schema: schema_name
    freshness:
      warn_after:
        count: 1
        period: day
      error_after:
        count: 4
        period: day
    tables:
      - name: table_1
      - name: table_2

  # source tables with source freshness based off column names
  - name: sources_with_column_based_freshness
    database: db_name
    schema: schema_name
    freshness:
      warn_after:
        count: 1
        period: day
      error_after:
        count: 4
        period: day
    loaded_at_field: loaded_at  # default at the source level
    tables:
      - name: table_3
      - name: table_4
      - name: table_5
        loaded_at_field: xxx  # override of source-level default

Option 2

One source. loaded_at_field is defined under each table name instead of having a default at the source level.

  - name: sources_with_metadata_freshness
    database: db_name
    schema: schema_name
    freshness:
      warn_after:
        count: 1
        period: day
      error_after:
        count: 4
        period: day

    # no source-level default for loaded_at_field

    tables:

      # source tables with source freshness from metadata
      - name: table_1
      - name: table_2

      # source tables with standardized loaded_at_field
      - name: table_3
        loaded_at_field: loaded_at
      - name: table_4
        loaded_at_field: loaded_at

      # other source tables with custom loaded_at_field
      - name: table_5
        loaded_at_field: xxx

@dbeatty10 dbeatty10 added enhancement New feature or request awaiting_response and removed bug Something isn't working triage labels Jan 2, 2024
@dbeatty10 dbeatty10 removed their assignment Jan 2, 2024
@dbeatty10 dbeatty10 changed the title [CT-3515] [Bug] Source freshness: loaded_at_field set to null at table level doesn't overwrite value set at source level [CT-3515] [Feature] Source freshness: allow loaded_at_field set to null at table level to overwrite default value set at the source level Jan 2, 2024
@kokorin
Copy link
Author

kokorin commented Jan 3, 2024

@dbeatty10 thank you for the reply and for suggestions. I thought about the same options.

Either way, it doesn't look like dbt is behaving erroneously here, so I'm going to re-categorize this as a feature request.

Sorry, but I can't agree. I still consider this as bug because behavior is different for freshness and loaded_at_field properties.
Freshness set to explicit null at table level overwrites freshness at source level, while loaded_at_field does not.
The problem is that those 2 properties are often used together and should behave similar, otherwise it's not clear for end users.

I can work on pull request if you consider it possible.

@dbeatty10
Copy link
Contributor

... behavior is different for freshness and loaded_at_field properties.
Freshness set to explicit null at table level overwrites freshness at source level, while loaded_at_field does not.

Thanks for pointing this out @kokorin 👍 Agreed that it's less clear for end users if freshness and loaded_at_field don't behave similarly to each other.

I'm going to label this as "refinement" for us to 1) reproduce that they are behaving differently and 2) determine how they should behave going forward.

@dbeatty10 dbeatty10 added the Refinement Maintainer input needed label Jan 3, 2024
@kokorin
Copy link
Author

kokorin commented Jan 3, 2024

@dbeatty10 please notice that I have attached zip archive to this ticket with minimal project reproducing the issue

@dbeatty10
Copy link
Contributor

I can work on pull request if you consider it possible.

That would be awesome if you work on a pull request for this @kokorin 🏆

@graciegoheen and I discussed briefly offline, and we're planning to keep this labeled as a feature request and not backport it to dbt-core 1.7.

Acceptance criteria

  • To align with freshness, when loaded_at_field is set with an explicit null at the table level, it should overwrite loaded_at_field which is set at the source level
  • When loaded_at_field is null, then it uses metadata-based source freshness checks

@dbeatty10 dbeatty10 added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors and removed Refinement Maintainer input needed labels Jan 3, 2024
@dbeatty10 dbeatty10 removed their assignment Jan 3, 2024
@martynydbt martynydbt modified the milestones: v1.8, v1.9 Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants