Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set configs in schema.yml files #2401

Closed
drewbanin opened this issue May 5, 2020 · 9 comments · Fixed by #3616
Closed

Set configs in schema.yml files #2401

drewbanin opened this issue May 5, 2020 · 9 comments · Fixed by #3616
Assignees
Labels
1.0.0 Issues related to the 1.0.0 release of dbt discussion enhancement New feature or request
Milestone

Comments

@drewbanin
Copy link
Contributor

drewbanin commented May 5, 2020

Describe the feature

Let's bring the schema.yml spec and node configs together! Right now, some configs can only be set for a node in the node's .sql file with config() or in the dbt_project.yml file. These configs include:

  • materialized
  • enabled
  • tags
  • database-specific configs (bind, sort, dist, etc)
  • etc

These configs can not be configured in the schema.yml file. At the same time, there are some configs that can only be set in a schema.yml file:

  • description
  • columns (tags, descriptions, tests, etc)
  • tags (for sources)

Instead, it would be ideal if these configs could be supplied in either place, with a sensible and well-defined precedence rules.

Example usage:

-- models/my_model.sql
/*
  Configure the description and columns for a model inline in a config.
  Note: I'd probably advise against doing this, but it should be possible
*/

{{ config(
  description="This is my model description",
  columns=[
    [
      {"name": "id", "description": "The primary key"}
    ]
) }}

select ...
# models/schema.yml
version: 2
models:
 - name: my_model
   config:
      tags: ["nightly"]
      materialized: view

Some configs, like tags or freshness in a source are set at the root level of the source. Today, this looks like:

version: 2
sources:
 - name: my_source
   tags: [my_tag]

We should continue to support this syntax, but it would also be good to allow a config level key which scopes these configurations, eg:

version: 2
sources:
 - name: my_source
   config:
      tags: [my_tag]

We should not allow these configs to be configured at both the root-level and the config level for parsing simplicity.

Additional context

I fear that this might be a really big departure from how parsing works today. Some fields, like description are patched into model objects. Would this feature require us to drastically change how we parse nodes in dbt?

@drewbanin drewbanin added the enhancement New feature or request label May 5, 2020
@drewbanin
Copy link
Contributor Author

@beckjake really curious to hear your rough thinking on how tractable a change like this would be

@beckjake
Copy link
Contributor

beckjake commented May 5, 2020

Off the top of my head, there's nothing about this that's hard in the sense of creating difficult data ordering dependencies or whatever. Since configurations are more unified now, allowing them inside a config block in the patch object should be more straightforward. Probably some extra work around config with patches and special-case handling existing fields like tags.

We will experience some awkwardness around the database/schema/alias fields. Maybe this is the time to move them (and tags?) finally off the node object and onto node.config, and just set them up as properties on the node that forward to the config? I bet that change would be very breaking and a huge mess! Currently those fields can get out of sync, which I think is even more awkward, and I'd have to look at the generate_*_name macros to see how those all pan out. I guess we'd have to defer those to after patch application, which doesn't seem too much worse.

So: lots of work, but doable.

@jtcohen6
Copy link
Contributor

What do we think about adding this to 1.0?

@drewbanin drewbanin added the 1.0.0 Issues related to the 1.0.0 release of dbt label Jun 15, 2020
@darrenhaken
Copy link
Contributor

I would love this feature to be added.

I asked a question about meta being hierarchical in the DBT channel.
I'd like to be able to define metadata at a higher level (such as mart) and have it propagate.

An example would be owner like in your examples.

@drewbanin signposted this issue for me to comment out.

@DVAlexHiggs
Copy link

DVAlexHiggs commented Jul 6, 2020

Not finding any information about 1.0.0 @jtcohen6
Is this the next release and you're skipping 0.18? Very exciting if so!

I'm hoping this feature will be in the next release, since the lack of it will be massively breaking for a lot of people, organisations and packages, dbtvault included.

Is there anything I can do to contribute? If so, where can I start?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 6, 2020

The next minor release (Marian Anderson) will be 0.18.0. It will include several issues currently tagged with the Marian Anderson milestone, and we will cut it from the current default branch (dev/marian-anderson).

Over the course of the next few minor versions, we're chipping away at a stable 1.0 version of dbt (aspirational target: EOY). As part of that effort, we're tagging and compiling a list of issues that we definitely want to include in our first major release.

These issues require foundational changes to the current dbt codebase. Following @beckjake's comment above, reconciling node and resource-YML configs will require a significant lift in some potentially tricky spots. I'm appreciative of your interest in this change, and for offering to help; I just don't think we're in a place right now to help an external contributor work through something this knotty.

@jtcohen6 jtcohen6 added this to the Oh-Twenty-One milestone Apr 13, 2021
This was referenced May 17, 2021
@gshank
Copy link
Contributor

gshank commented Jul 23, 2021

I've implemented configs in schema files for the objects that currently have config setting by FQN in dbt_project.yml. This includes models, seeds, snapshots, and analyses. In addition I've added a 'meta' field to NodeConfigs, so that people can use the hierarchical config setting in the dbt project file, but have not changed existing meta attributes (which occur in many places now) to use the new meta instead, because the behavior is substantially different. If we want to change that behavior, it will be done as a separate ticket.

We will address moving existing "config" in schema files into the various Config objects separately. Each of these config attributes has different considerations. The current config object is used without restriction in the dbt_project file, but some of these schema file configs may not be appropriate for dbt_project config. There are also more code complications in that we would be changing the location of information and would need to make decisions on compatibility, artifacts, etc.

I did not make any changes to macro config, because there is no existing macro config object and no way to specify macro config in dbt_project because we don't have an FQN. So the code behavior is different enough that it should be addressed separately.

I did not make any changes to source config, because it uses a different codepath and currently only has one config attribute ('enabled'), so we might want to add additional configurability separately.

@gshank gshank mentioned this issue Jul 23, 2021
4 tasks
gshank added a commit that referenced this issue Aug 10, 2021
gshank added a commit that referenced this issue Aug 11, 2021
gshank added a commit that referenced this issue Aug 12, 2021
gshank added a commit that referenced this issue Aug 12, 2021
gshank added a commit that referenced this issue Aug 17, 2021
IS-Josh pushed a commit to IS-Josh/dbt that referenced this issue Sep 4, 2021
TeddyCr pushed a commit to TeddyCr/dbt that referenced this issue Sep 9, 2021
@avaitla
Copy link

avaitla commented Sep 11, 2021

Does this mean you could add a unique test for a column in the models .sql file rather than having to put it into the .yaml file? Do you have a quick example / doc to show how that might look?

We currently have to write a schema file like so:

schema.yml

version: 2

models:
  - name: stg_items
    columns: 
      - name: item_id
        tests:
          - unique

But I have a feeling that being able to put this in the .sql file for the model itself is far simpler (see below pseudocode):

items.sql

{{
    config(tests={"unique": "item_id"})
}}

See the comment here: https://docs.getdbt.com/reference/model-configs

Screen Shot 2021-09-11 at 6 09 45 PM

Do you happen to know where the docs are that inform what is possible in the sql config macro itself?

@jtcohen6
Copy link
Contributor

@avaitla I'm working on the docs for this feature: dbt-labs/docs.getdbt.com#766

The short answer is no, or at least not yet, based on the implementation in #3616. It still won't be possible to define certain properties—description, tests, and columns are the big ones—within a model's .sql file. Eventually, we may make that possible, though tests would be the most difficult to implement: they create entirely separate nodes in the graph/manifest.

iknox-fa pushed a commit that referenced this issue Feb 8, 2022
automatic commit by git-black, original commits:
  d5461cc
iknox-fa pushed a commit that referenced this issue Feb 8, 2022
automatic commit by git-black, original commits:
  d5461cc
  fd658ac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.0.0 Issues related to the 1.0.0 release of dbt discussion enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants