-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBT as the source of truth for cube definitions #3611
Comments
useful feature |
@ethanve Hey Ethan! Thanks for kicking off this discussion! From design spec, it seems dimensions will be defined per metric. At that point, we'd need some sort of projection to cubes. How do you envision this conversion? Create cube per metric or merge all metrics with all dimensions to a single cube for the same model? |
@ethanve We're starting to work on it. To follow up on the previous question I guess we're going to merge all metrics within a model by default and then optionally provide metric cubes. So to refer to the extended example of the original issue: # models/marts/product/schema.yml
version: 2
models:
- name: dim_customers
...
metrics:
- name: new_customers
label: New Customers
model: dim_customers
description: "The number of paid customers who are using the product"
type: count
sql: user_id # superflous here, but shown as an example
timestamp: signup_date
time_grains: [day, week, month]
dimensions:
- plan
- country
filters:
- field: is_paying
value: true
meta: {}
- name: churned_users
label: Churned Users
model: dim_customers
description: "The number of churned users"
type: count_distinct
sql: user_id # superflous here, but shown as an example
timestamp: churned_at
time_grains: [day, week, month]
dimensions:
- plan
- country
meta: {} in Cube would be converted to: cube(`dim_customers`, {
sql: `select * from dbt.dim_customers`,
measures: {
new_customers: {
sql: `user_id`,
type: `count`,
description: `The number of paid customers who are using the product`,
filters: [{
sql: `${CUBE}.is_paying = true`
}],
meta: {}
},
churned_users: {
sql: `user_id`,
type: `countDistinct`,
description: `The number of churned users`,
meta: {}
}
},
dimensions: {
plan: {
sql: `plan`,
type: `string`
},
country: {
sql: `country`,
type: `string`
},
signup_date: {
sql: `signup_date`,
type: `time`
},
churned_at: {
sql: `churned_at`,
type: `time`
}
}
}); Optional metric cube would look like: cube(`dim_customers_new_customers`, {
sql: `select * from dbt.dim_customers`,
measures: {
new_customers: {
sql: `user_id`,
type: `count`,
description: `The number of paid customers who are using the product`,
filters: [{
sql: `${CUBE}.is_paying = true`
}],
meta: {}
}
},
dimensions: {
plan: {
sql: `plan`,
type: `string`
},
country: {
sql: `country`,
type: `string`
},
signup_date: {
sql: `signup_date`,
type: `time`
}
}
}); In order to reference dbt project const { DbtRepository } = require('@cubejs-backend/dbt');
module.exports = {
repositoryFactory: ({ securityContext }) => new DbtRepository(`${proces.env.DBT_PROJECT_PATH}/models/marts/product/schema.yml`, { metricCubes: true }),
}; To mix with other definitions you can use in cube.js: const { DbtRepository } = require('@cubejs-backend/dbt');
const { FileRepository } = require('@cubejs-backend/server-core');
module.exports = {
repositoryFactory: ({ securityContext }) => [new DbtRepository(`${proces.env.DBT_PROJECT_PATH}/models/marts/product/schema.yml`), new FileRepository()]
}; Additional info such as pre-aggregation definitions can be provided as mix-ins then: // actual model definition is in dbt schema and we define only pre-aggregations here
cube(`dim_customers`, {
preAggregations: {
main: {
measures: [new_customers],
dimensions: [plan, country],
timeDimensions: signup_date,
granularity: `day`
}
}
}); Would love to hear opinions on this design! |
@paveltiunov the above API looks really clean! I'm not an expert in dbt to know exactly how to make this mapping but the above design looks strong. The one thing that I'm unsure about is the ability to define metrics per model. It seems that dbt doesn't support this yet for metrics but it is an open question in their docs: https://next.docs.getdbt.com/docs/building-a-dbt-project/metrics
|
@ethanve It looks like one of the fields required by dbt to create a metric is the model "that powers this metric". I know the metrics are experimental in dbt at this point, but it looks like they are requiring a metric to point to a particular model. I'm guessing the configuration inheritance you mentioned is more about metrics inheriting how they should be persisted in the data warehouse, should they be incrementally loaded or fully refreshed on each run, etc. |
@paveltiunov apologies for the late reply. I would initially test out merged cubes per model. That seems the most logical |
We've merged the very first version of dbt integration. We ended up with a schema extension approach which can be used as follows: import Dbt from '@cubejs-backend/dbt-schema-extension';
asyncModule(async () => {
const { MyNewProjectOrdersFiltered } = await Dbt.loadMetricCubesFromDbtProject('path/to/dbt/project, { toExtend: ['MyNewProjectOrdersFiltered'] }); // toExtend can be used to select cubes for extending. Other cubes will be evaluated unconditionally.
cube('OrdersFiltered', {
extends: MyNewProjectOrdersFiltered
});
}); With dbt cloud: import Dbt from '@cubejs-backend/dbt-schema-extension';
asyncModule(async () => {
await Dbt.loadMetricCubesFromDbtCloud(/* jobId */ 12345, /* authToken */ 'abcde.abc');
}); Looking forward to community feedback regarding API here! Work to be done is filters support and |
This is absolutely fantastic and very timely. Will be checking this out over the next few days! |
@paveltiunov does it mean that each model/cube needs its Dbt metrics separately? |
@khozzy You can configure it using
|
as I added a dbt.js with the content " asyncModule(async () => { cube('OrdersFiltered', { |
Did you ever find a solution to the issue: Unsupported db type: undefined |
Note: dbt hasn't fully fleshed out this feature, so the outcome is unclear and this issue to track.
Is your feature request related to a problem? Please describe.
My organization heavily uses dbt to transform our raw data coming from third-party sources. Now that dbt has added metric definitions to their roadmap (dbt-labs/dbt-core#4071) it would be ideal to have dbt the source of truth for all of my transformed data and the data catalog.
Describe the solution you'd like
dbt as a source for cube definitions (measures and dimensions). Realistically, this would leverage dbt's
meta
tag to add on additional metadata as well.The text was updated successfully, but these errors were encountered: