Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CumulativeTypeParams & sub-daily granularities to semantic manifest #10350

Merged
merged 11 commits into from
Jul 1, 2024

Conversation

courtneyholcomb
Copy link
Contributor

@courtneyholcomb courtneyholcomb commented Jun 20, 2024

resolves #10360

CumulativeTypeParams

DSI has recently moved fields related to cumulative metrics (specifically, window and grain_to_date) from type_params to type_params.cumulative_type_params. This creates better organization in the metric spec, and also adds a new field called type_params.cumulative_type_params.period_agg which is used to re-aggregate cumulative metrics for non-default granularities. This PR adds core support for the new fields.

Note that the old fields type_params.window and type_params.grain_to_date will still work, so this is not a breaking change. There is a transformation that runs in MetricFlow to copy the old fields into the new fields if the new fields are not set, and sets the default period_agg value to PeriodAggregation.FIRST if it's not already set in the manifest.

Soon, setting the old fields will trigger a validation warning prompting users to move those values to the new fields. The plan is to eventually deprecate those fields.

Sub-Daily Granularities

Since the new sub-daily granularity options are included in the DSI version upgrade, this PR also adds support for those.

Notes

  • The related schema docs update is here, and I'll need to merge that to get CI passing.
  • I noticed a couple of schema changes were generated that did not appear related to my changes. I'm assuming someone else's changes never got updated in the schema.
  • This is my first time making a schema update here, so please LMK if I missed anything or handled something incorrectly!

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
  • This PR includes type annotations for new and modified functions

@cla-bot cla-bot bot added the cla:yes label Jun 20, 2024
@dbt-labs dbt-labs deleted a comment from github-actions bot Jun 20, 2024
@courtneyholcomb courtneyholcomb added the artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking label Jun 20, 2024
@dbt-labs dbt-labs deleted a comment from github-actions bot Jun 25, 2024
@courtneyholcomb courtneyholcomb added the semantic Issues related to the semantic layer label Jun 25, 2024
@dbt-labs dbt-labs deleted a comment from codecov bot Jun 25, 2024
Copy link

codecov bot commented Jun 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.77%. Comparing base (ca163c3) to head (96b79a8).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10350      +/-   ##
==========================================
+ Coverage   88.75%   88.77%   +0.01%     
==========================================
  Files         180      180              
  Lines       22486    22517      +31     
==========================================
+ Hits        19958    19989      +31     
  Misses       2528     2528              
Flag Coverage Δ
integration 86.03% <100.00%> (+0.01%) ⬆️
unit 62.16% <39.39%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@courtneyholcomb courtneyholcomb added the user docs [docs.getdbt.com] Needs better documentation label Jun 25, 2024
@Jstein77
Copy link

Pulling out the yaml spec here:

    - name: cumulative_orders
      label: Rolling total of orders (all time)
      type: cumulative
      type_params:
        measure: num_orders
        cumulative_type_params:
          period_agg: last

@courtneyholcomb courtneyholcomb changed the title Add CumulativeTypeParams to semantic manifest Add CumulativeTypeParams & sub-daily granularities to semantic manifest Jun 26, 2024
@courtneyholcomb courtneyholcomb changed the title Add CumulativeTypeParams & sub-daily granularities to semantic manifest Add CumulativeTypeParams to semantic manifest Jun 26, 2024
@courtneyholcomb courtneyholcomb changed the title Add CumulativeTypeParams to semantic manifest Add CumulativeTypeParams & sub-daily granularities to semantic manifest Jun 26, 2024
}
],
"default": null
"type": "array",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this change is - I don't think it's related to my changes, might be that someone else did not update the schema in their commit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I'm not sure either. I went on an archeological mission, and it looks like it was added in #10096. That PR adds primary_key to the model resource. However, it appears it was added as a list of strings, never optional. To me, it looks like a mistake, and I think this removal of null is okay 🤷 @dave-connors-3 do you have more context?

@@ -17930,26 +18188,7 @@
"type": "string"
},
"resource_type": {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change also seems unrelated to my PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a fix. Basically, the resource_type of a saved query isn't actually an enum, it's should always be saved_query

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what's happening is we aren't running the scripts/collect-artifact-schema.py as often as we should. I'll bring this up to the rest of the core team. Maybe we can have a CI check for it.

@courtneyholcomb courtneyholcomb marked this pull request as ready for review June 26, 2024 21:22
@courtneyholcomb courtneyholcomb requested review from a team as code owners June 26, 2024 21:22
@courtneyholcomb courtneyholcomb requested review from vadim82 and removed request for a team June 26, 2024 21:22
@github-actions github-actions bot added the community This PR is from a community member label Jun 26, 2024
Copy link
Contributor

@QMalcolm QMalcolm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thank you for doing this work ❤️ I'm not sure on everything happening in manifest/v12.json but it looks reasonable.

There is one thing we probably should change though. In the DSI counterpart (dbt-labs/dbt-semantic-interfaces#288) we discussed the migration plan. I think from that we are missing 2.b where in we populate the new paths from the old paths (iff the new paths aren't specified and the old paths are)

"default": "1.8.0a1"
"default": "1.9.0a1"
Copy link
Contributor

@QMalcolm QMalcolm Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichelleArk / @ChenyuLInx This looks correct to me, but I don't know if there'd be any fall out from doing so. This seem okay?

}
],
"default": null
"type": "array",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I'm not sure either. I went on an archeological mission, and it looks like it was added in #10096. That PR adds primary_key to the model resource. However, it appears it was added as a list of strings, never optional. To me, it looks like a mistake, and I think this removal of null is okay 🤷 @dave-connors-3 do you have more context?

@@ -17930,26 +18188,7 @@
"type": "string"
},
"resource_type": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a fix. Basically, the resource_type of a saved query isn't actually an enum, it's should always be saved_query

@@ -17930,26 +18188,7 @@
"type": "string"
},
"resource_type": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what's happening is we aren't running the scripts/collect-artifact-schema.py as often as we should. I'll bring this up to the rest of the core team. Maybe we can have a CI check for it.

Comment on lines 357 to 360
),
cumulative_type_params=self._get_optional_cumulative_type_params(
type_params.cumulative_type_params
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either here or in _get_optional_cumulative_type_params we want to fall back on the deprecated values type_params.window and type_params.grain_to_date to populate the new values type_params.cumulative_type_params.window and type_params.cululative_type_params.grain_to_date respectively, if the new paths haven't been specified but the old paths have.

@courtneyholcomb
Copy link
Contributor Author

There is one thing we probably should change though. In the DSI counterpart (dbt-labs/dbt-semantic-interfaces#288) we dbt-labs/dbt-semantic-interfaces#288 (review). I think from that we are missing 2.b where in we populate the new paths from the old paths (iff the new paths aren't specified and the old paths are)

@QMalcolm ah ok! We are running the transformation in MF that should handle this. I didn't realize we would need to handle that in core, too. What's the reason for needing it in both places?

@QMalcolm
Copy link
Contributor

QMalcolm commented Jun 27, 2024

@QMalcolm ah ok! We are running the transformation in MF that should handle this. I didn't realize we would need to handle that in core, too. What's the reason for needing it in both places?

In a vacuum, if all we care about is the functionality working, then that would be correct. However, that isn't our only goal. Our additional goal is to move in the direction of deprecating and removing the old paths. In order for DSI to remove the old paths, we first need to reasonably guarantee that nearly all semantic manifests being consumed have the new paths set when applicable. There are four possible states for the new paths and old paths:

  1. Old paths not set, new paths not set
  2. Old paths not set, new paths set
  3. Old paths set, new paths not set
  4. Old paths set, new paths set

Currently, any core produced semantic manifest will produce (3). In order for DSI to ever remove the old paths (and the related transformations), we have to guarantee that semantic manifests being consumed by the MetricFlow / SL don't have state (3). So, although we don't technically need to do it now, we need to do it eventually. However, the longer we wait to do it, the longer DSI and Metricflow have to handle both states. Thus, doing it now means the tech debt we incur can be shorter lived and arguably more contained.

@QMalcolm QMalcolm removed the community This PR is from a community member label Jun 27, 2024
@courtneyholcomb
Copy link
Contributor Author

@QMalcolm Updated, and added tests for the old paths too.

@mirnawong1
Copy link
Contributor

Copy link
Contributor

@QMalcolm QMalcolm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ❤️ Thanks again!

Comment on lines +423 to +435
def test_cumulative_metric(self, project):
# initial parse
runner = dbtRunner()
result = runner.invoke(["parse"])
assert result.success
assert isinstance(result.result, Manifest)

manifest = get_manifest(project.project_root)
metric_ids = set(manifest.metrics.keys())
expected_metric_ids_to_cumulative_type_params = {
"metric.test.weekly_visits": CumulativeTypeParams(
window=MetricTimeWindow(count=7, granularity=TimeGranularity.DAY),
period_agg=PeriodAggregation.AVERAGE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably be converted to unit test. I don't think it's necessary to do that as part of this PR though. I've been working on being able to unit test the reader clases in schema_yaml_readers.py, and it should be possible, but I think there is one kink to still work out first.

@QMalcolm
Copy link
Contributor

QMalcolm commented Jul 1, 2024

Hmmm so the artifact_minor_upgrade label skips Check Artifact Changes CI but not Artifact Schema Check. I should double check what that means

@courtneyholcomb
Copy link
Contributor Author

Hmmm so the artifact_minor_upgrade label skips Check Artifact Changes CI but not Artifact Schema Check. I should double check what that means

@QMalcolm I thought that was what I need the schema docs change for? Linked in the PR description but also here! dbt-labs/schemas.getdbt.com#42

@QMalcolm
Copy link
Contributor

QMalcolm commented Jul 1, 2024

@courtneyholcomb I looked into the failing check. Basically, if it fails, we know that this change impacts the cloud artifacts team. I've added the label impact: CA which hopefully notifies them somehow. I'm also figuring out the appropriate way to notify them in slack.

@QMalcolm
Copy link
Contributor

QMalcolm commented Jul 1, 2024

@courtneyholcomb Additionally, in talking to the core team about the check. It is specifically not blocking, and we are good to proceed.

@courtneyholcomb
Copy link
Contributor Author

@courtneyholcomb I looked into the failing check. Basically, if it fails, we know that this change impacts the cloud artifacts team. I've added the label impact: CA which hopefully notifies them somehow. I'm also figuring out the appropriate way to notify them in slack.

@QMalcolm awesome thank you! Merging now

@courtneyholcomb
Copy link
Contributor Author

@courtneyholcomb I looked into the failing check. Basically, if it fails, we know that this change impacts the cloud artifacts team. I've added the label impact: CA which hopefully notifies them somehow. I'm also figuring out the appropriate way to notify them in slack.

@QMalcolm awesome thank you! Merging now

@QMalcolm just kidding, I don't have permission to merge 🙃 Would you be able to merge for me?

@QMalcolm QMalcolm merged commit a94027a into main Jul 1, 2024
66 of 68 checks passed
@QMalcolm QMalcolm deleted the court/cumulative-type-params branch July 1, 2024 21:54
courtneyholcomb added a commit to dbt-labs/dbt-semantic-interfaces that referenced this pull request Jul 1, 2024
### Description
Enable validations for `cumulative_type_params` and bump to a new patch
version.
This version will be added to dbt-core after [this
PR](dbt-labs/dbt-core#10350) merges to ensure
these fields are validated once they are enabled.

### Checklist

- [x] I have read [the contributing
guide](https://github.com/dbt-labs/dbt-semantic-interfaces/blob/main/CONTRIBUTING.md)
and understand what's expected of me
- [x] I have signed the
[CLA](https://docs.getdbt.com/docs/contributor-license-agreements)
- [x] This PR includes tests, or tests are not required/relevant for
this PR
- [ ] I have run `changie new` to [create a changelog
entry](https://github.com/dbt-labs/dbt-semantic-interfaces/blob/main/CONTRIBUTING.md#adding-a-changelog-entry)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking cla:yes Impact: CA semantic Issues related to the semantic layer user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Support cumulative_type_params in semantic manifest
4 participants