Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arbitrary configuration #146

Closed
jthandy opened this issue Sep 15, 2016 · 6 comments
Closed

arbitrary configuration #146

jthandy opened this issue Sep 15, 2016 · 6 comments
Assignees
Milestone

Comments

@jthandy
Copy link
Member

jthandy commented Sep 15, 2016

configuration in dbt is incredibly powerful: it is what allows models to change their behavior without changing their code. currently all configuration is done in parameters that we have specified, but that actually limits the user in the power of configuration. we should allow for arbitrary configuration values.

there are multiple locations where config is currently specified. arbitrary config should be able to be specified at each of these locations:

  • in dbt_project.yml, any key provided that isn't a model or a config value we've defined should be saved as arbitrary config and scoped at the appropriate level in the tree.
  • in the model config itself, any key provided that isn't a config value we've defined should be saved as arbitrary config for that model.

additionally, we should invent one additional new way of specifying configuration. there should be a new folder in a dbt project at ./config (configurable within dbt_project.yml).

this folder should contain yml files that have arbitrary key/value stores in them, defined by the user. these key/value pairs can be called in models like {{key}}, or like {{nested.key}}, arbitrarily deeply (matching the nesting present in the yml file) within models. these config values should be accessible by all models.

there will likely end up being numerous uses for this, but a primary one will be to allow for configuration of base model schema and table names within dependencies. projects that depend on them can override the default config to point base models to the appropriate schema and table names.

the order of configuration key/value resolution in the case of collisions should be an extension of what it already is today:

  • dependency config (all types, following the order of the project config defined below)
  • ./config folder
  • dbt_project.yml
  • in-model configs

in each instance, the lower levels should overwrite the higher levels. this is a feature, not a bug.

as a part of this issue, we should change the syntax in dbt_project.yml to be model-config, not models. we're not declaring models, we're declaring config for them.

@jthandy
Copy link
Member Author

jthandy commented Sep 16, 2016

@drewbanin this issue is lit. i spent a lot more time thinking about it and think it is insanely useful, beyond what we specified. I think there are a lot of ways that you could add value with model configuration even within the model itself--it allows the publisher of a model to leave certain choices up to the user of the model, giving both sides quite a lot of control over how these common models behave in different environments.

here's one example: it's totally legitimate to define MRR with or without discounts applied. some people measure it one way, some measure it another. it's very possible to, with a single flag, change the behavior of a sql model with a simple if then statement triggered from that.

this is amazing, and will unlock a TON of options.

@drewbanin
Copy link
Contributor

this issue is lit

me

@jthandy
Copy link
Member Author

jthandy commented Sep 16, 2016

hahahahahah

@drewbanin drewbanin self-assigned this Sep 16, 2016
@drewbanin
Copy link
Contributor

drewbanin commented Sep 16, 2016

I'm super into this idea and I think it's something we should queue up soon.

Do you think these new ./config/*.yml files should share the same structure as the current dbt_project.yml "models` config?

eg.

# ./config/snowplow.yml
"Snowplow Dependency Name":
  events:
    exclude_ip: 192.168.1.1

Also, what's the recommended way of structuring these config files? Do you cram a bunch of dependency configs into one file? Or do you have one file per dependency? Is the name of the config file significant?

What happens if two different config files both define the same configuration? Is that a compilation error?

Suppose your project depends on project A which in turn depends on project B. Can you specify config options for project B in your config file? I don't know how likely this is, but what if two different dependencies both require project B? Can you configure each sub-dependency differently?

@jthandy you don't have to answer all of these questions, but this is the kind of feature which has a simple version and a "correct" version.... I think the prior is acceptable for right this very moment. We should just be careful to structure this in such a way that we can make it the "correct" version when the time comes without starting over from scratch.

@jthandy
Copy link
Member Author

jthandy commented Sep 16, 2016

Yeah. I agree. This is non-trivial, and I don't pretend to have great answers to these questions. They're all the right ones. I like your suggestion about configuration files mapping to specific model contexts; that definitely increases the utility. I had imagined they would all be globally accessible.

I think the biggest issue with dependencies that we haven't figured out is scope/namespacing, which is a lot of what you are bringing up. How do we address objects in an arbitrarily nested tree of dependencies? Right now we have a very naive answer to that question, and I think that all of your questions stem from that.

I agree with your instinct to continue punting on this hard question and implement a useful version of this function that breaks around the edges. Let's see what those failure modes look like, feel the utility of the core feature, and then sit down and have a think about what the right way is. You know, in like several months :)

@drewbanin drewbanin added this to the 0.5.0 Release milestone Sep 20, 2016
@drewbanin
Copy link
Contributor

let's simplify this for V1 and just pop configs right into the dbt_project.yml file. No need for the new directory at this point.

Ex:

models:
  Snowplow:
    events:
      base_table: atomic.events

yu-iskw pushed a commit to yu-iskw/dbt that referenced this issue Aug 17, 2021
An optional parameter 'condition' can be passed to the 
'expression_is_true' macro to assert the expression for all records 
which meet a condition. Closes dbt-labs#146
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants