Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBT as the source of truth for cube definitions #3611

Closed
ethanve opened this issue Nov 2, 2021 · 13 comments
Closed

DBT as the source of truth for cube definitions #3611

ethanve opened this issue Nov 2, 2021 · 13 comments
Labels
enhancement New feature proposal Roadmap: 2021 Q4 Cube.js team roadmap for Q4 of 2021.

Comments

@ethanve
Copy link

ethanve commented Nov 2, 2021

Note: dbt hasn't fully fleshed out this feature, so the outcome is unclear and this issue to track.

Is your feature request related to a problem? Please describe.
My organization heavily uses dbt to transform our raw data coming from third-party sources. Now that dbt has added metric definitions to their roadmap (dbt-labs/dbt-core#4071) it would be ideal to have dbt the source of truth for all of my transformed data and the data catalog.

Describe the solution you'd like
dbt as a source for cube definitions (measures and dimensions). Realistically, this would leverage dbt's meta tag to add on additional metadata as well.

@ifokeev
Copy link

ifokeev commented Nov 2, 2021

useful feature

@paveltiunov
Copy link
Member

@ethanve Hey Ethan! Thanks for kicking off this discussion! From design spec, it seems dimensions will be defined per metric. At that point, we'd need some sort of projection to cubes. How do you envision this conversion? Create cube per metric or merge all metrics with all dimensions to a single cube for the same model?

@paveltiunov paveltiunov added the enhancement New feature proposal label Nov 3, 2021
@paveltiunov paveltiunov added the Roadmap: 2021 Q4 Cube.js team roadmap for Q4 of 2021. label Nov 15, 2021
@paveltiunov
Copy link
Member

paveltiunov commented Nov 15, 2021

@ethanve We're starting to work on it. To follow up on the previous question I guess we're going to merge all metrics within a model by default and then optionally provide metric cubes.

So to refer to the extended example of the original issue:

# models/marts/product/schema.yml

version: 2

models:
 - name: dim_customers
   ...

metrics:
  - name: new_customers
    label: New Customers
    model: dim_customers
    description: "The number of paid customers who are using the product"

    type: count
    sql: user_id # superflous here, but shown as an example

    timestamp: signup_date
    time_grains: [day, week, month]

    dimensions:
      - plan
      - country
    
    filters:
      - field: is_paying
        value: true

    meta: {}

  - name: churned_users
    label: Churned Users
    model: dim_customers
    description: "The number of churned users"

    type: count_distinct
    sql: user_id # superflous here, but shown as an example

    timestamp: churned_at
    time_grains: [day, week, month]

    dimensions:
      - plan
      - country

    meta: {}

in Cube would be converted to:

cube(`dim_customers`, {
  sql: `select * from dbt.dim_customers`,
  
  measures: {
    new_customers: {
      sql: `user_id`,
      type: `count`,
      description: `The number of paid customers who are using the product`,
      filters: [{
        sql: `${CUBE}.is_paying = true`
      }],
      meta: {}
    },
    churned_users: {
      sql: `user_id`,
      type: `countDistinct`,
      description: `The number of churned users`,
      meta: {}
    }
  },
  
  dimensions: {
    plan: {
      sql: `plan`,
      type: `string`
    },
    
    country: {
      sql: `country`,
      type: `string`
    },

    signup_date: {
      sql: `signup_date`,
      type: `time`
    },
    
    churned_at: {
      sql: `churned_at`,
      type: `time`
    }
  }
});

Optional metric cube would look like:

cube(`dim_customers_new_customers`, {
  sql: `select * from dbt.dim_customers`,
  
  measures: {
    new_customers: {
      sql: `user_id`,
      type: `count`,
      description: `The number of paid customers who are using the product`,
      filters: [{
        sql: `${CUBE}.is_paying = true`
      }],
      meta: {}
    }
  },
  
  dimensions: {
    plan: {
      sql: `plan`,
      type: `string`
    },
    
    country: {
      sql: `country`,
      type: `string`
    },

    signup_date: {
      sql: `signup_date`,
      type: `time`
    }
  }
});

In order to reference dbt project DbtRepository can be introduced as follows in cube.js:

const { DbtRepository } = require('@cubejs-backend/dbt');

module.exports = {
  repositoryFactory: ({ securityContext }) => new DbtRepository(`${proces.env.DBT_PROJECT_PATH}/models/marts/product/schema.yml`, { metricCubes: true }),
};

To mix with other definitions you can use in cube.js:

const { DbtRepository } = require('@cubejs-backend/dbt');
const { FileRepository } = require('@cubejs-backend/server-core');

module.exports = {
  repositoryFactory: ({ securityContext }) => [new DbtRepository(`${proces.env.DBT_PROJECT_PATH}/models/marts/product/schema.yml`), new FileRepository()]
};

Additional info such as pre-aggregation definitions can be provided as mix-ins then:

// actual model definition is in dbt schema and we define only pre-aggregations here
cube(`dim_customers`, {
  preAggregations: {
    main: {
      measures: [new_customers],
      dimensions: [plan, country],
      timeDimensions: signup_date,
      granularity: `day`
    }
  }
});

Would love to hear opinions on this design!

@ethanve
Copy link
Author

ethanve commented Nov 16, 2021

@paveltiunov the above API looks really clean! I'm not an expert in dbt to know exactly how to make this mapping but the above design looks strong. The one thing that I'm unsure about is the ability to define metrics per model. It seems that dbt doesn't support this yet for metrics but it is an open question in their docs:

https://next.docs.getdbt.com/docs/building-a-dbt-project/metrics

Should metrics inherit configurations from the models on which they are defined? Should it be possible to define metrics directly on models/columns, like tests?

@air-rick
Copy link

@ethanve It looks like one of the fields required by dbt to create a metric is the model "that powers this metric". I know the metrics are experimental in dbt at this point, but it looks like they are requiring a metric to point to a particular model.

I'm guessing the configuration inheritance you mentioned is more about metrics inheriting how they should be persisted in the data warehouse, should they be incrementally loaded or fully refreshed on each run, etc.

@paveltiunov
Copy link
Member

@ethanve @air-rick Would love to hear if you're leaning towards using metric cubes or merged cubes per model?

@ethanve
Copy link
Author

ethanve commented Dec 9, 2021

@paveltiunov apologies for the late reply. I would initially test out merged cubes per model. That seems the most logical

@paveltiunov
Copy link
Member

We've merged the very first version of dbt integration. We ended up with a schema extension approach which can be used as follows:

import Dbt from '@cubejs-backend/dbt-schema-extension';

asyncModule(async () => {
  const { MyNewProjectOrdersFiltered } = await Dbt.loadMetricCubesFromDbtProject('path/to/dbt/project, { toExtend: ['MyNewProjectOrdersFiltered'] }); // toExtend can be used to select cubes for extending. Other cubes will be evaluated unconditionally.

  cube('OrdersFiltered', {
    extends: MyNewProjectOrdersFiltered
  });
});

With dbt cloud:

import Dbt from '@cubejs-backend/dbt-schema-extension';

asyncModule(async () => {
  await Dbt.loadMetricCubesFromDbtCloud(/* jobId */ 12345, /* authToken */ 'abcde.abc');
});

Looking forward to community feedback regarding API here! Work to be done is filters support and schemaVersion utils to support dynamic schema updates.

@geastham
Copy link

geastham commented Mar 8, 2022

This is absolutely fantastic and very timely. Will be checking this out over the next few days!

@khozzy
Copy link

khozzy commented May 9, 2022

@paveltiunov does it mean that each model/cube needs its Dbt metrics separately?

@paveltiunov
Copy link
Member

@khozzy You can configure it using cubePerMetric option:

import Dbt from '@cubejs-backend/dbt-schema-extension';

asyncModule(async () => {
  await Dbt.loadMetricCubesFromDbtProject('path/to/dbt/project, cubePerMetric: true });
});

@xushuaitiger
Copy link

as I added a dbt.js with the content "
iimport Dbt from '@cubejs-backend/dbt-schema-extension';

asyncModule(async () => {
const { MyNewProjectOrdersFiltered } = await Dbt.loadMetricCubesFromDbtProject('path/to/dbt/project, { toExtend: ['MyNewProjectOrdersFiltered'] }); // toExtend can be used to select cubes for extending. Other cubes will be evaluated unconditionally.

cube('OrdersFiltered', {
extends: MyNewProjectOrdersFiltered
});
});" to /schema and run cubejs, an error accurred "Error: Unsupported db type: undefined ".what is the problem?

@shikoli-turnkeyafrica
Copy link

as I added a dbt.js with the content " iimport Dbt from '@cubejs-backend/dbt-schema-extension';

asyncModule(async () => { const { MyNewProjectOrdersFiltered } = await Dbt.loadMetricCubesFromDbtProject('path/to/dbt/project, { toExtend: ['MyNewProjectOrdersFiltered'] }); // toExtend can be used to select cubes for extending. Other cubes will be evaluated unconditionally.

cube('OrdersFiltered', { extends: MyNewProjectOrdersFiltered }); });" to /schema and run cubejs, an error accurred "Error: Unsupported db type: undefined ".what is the problem?

Did you ever find a solution to the issue: Unsupported db type: undefined

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature proposal Roadmap: 2021 Q4 Cube.js team roadmap for Q4 of 2021.
Projects
None yet
Development

No branches or pull requests

8 participants