Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1470] Whither UserConfig? #6207

Closed
Tracked by #6706
jtcohen6 opened this issue Nov 3, 2022 · 12 comments
Closed
Tracked by #6706

[CT-1470] Whither UserConfig? #6207

jtcohen6 opened this issue Nov 3, 2022 · 12 comments
Assignees
Labels
enhancement New feature or request Refinement Maintainer input needed

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Nov 3, 2022

https://docs.getdbt.com/docs/core/connect-data-platform/profiles.yml#user-config

The idea in the past has been: This is tied to my specific "machine," and should hold true / set defaults for all of my dbt activity. The original options were things like: don't use colors in the terminal, don't send anonymous usage tracking, don't use partial parsing. Those aren't tied to any one dbt project in particular, and they might be different for different users — hence, why they weren't stored in the project codebase and checked into version control.

However, these configs don't really have anything to do with connection profiles / credentials. We're all about splitting up our monolithic files. And profiles.yml really deserves a bigger refresh next year (#1958 and then some).

Questions:

  • Where should these "user configs" go instead? Should they be settable in dbt_project.yml? In a new root-level file that lives alongside the project? Should that file be .gitignored (a la dbt.env), or version-controlled (dbt_config.yml)?
  • Is the distinction between "global configs" and "project configs" really meaningful? The addition of --log-path and --target-path as CLI flag / env var overrides already blurs that boundary. That said, I don't think --model-paths should really be a thing :)

Wherever we move the UserConfig, we'll need to support it in its current location for a period of time, with a clear deprecation warning.

@jtcohen6 jtcohen6 added tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality Team:Language Refinement Maintainer input needed labels Nov 3, 2022
@github-actions github-actions bot changed the title Whither UserConfig? [CT-1470] Whither UserConfig? Nov 3, 2022
@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Nov 17, 2022

How would we feel about a precedence order like:

  1. CLI flag
  2. Env var
  3. UserConfig, in a dedicated file, always in the root directory, optionally version controlled (to set a default for everyone running this project) or ignored

Idea being:

  • All "global" configs can be set in dbt_project.yml, too, to set a default for everyone running this particular project. Will be overridden by the same config set at the user/machine level, env level, or invocation level
  • No ifs-ands-or-buts-ing about where UserConfig is located, so we can safely load it first & foremost. I don't even want to have to think about --profiles-dir or --project-dir.

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Feb 16, 2023

This comment is my attempt to summarize my latest thinking on the next steps here. Given the last few months of Fun With Flags, I think a solid answer here could provide us with much cleaner codepaths & initialization flows going forward. As always, we should aim to provide one minor version of back-compat for existing behavior, with deprecation warnings, if we can manage it.

Big idea: "Runtime configuration" can be provided by users in one of three ways:

  1. CLI flags
  2. Environment variables (read from system and/or defined in dbt.env file)
  3. Long-lived configuration file (dbt_config.yml) in root or home directory

Commentary:

  • The precedence order is (1) CLI flag > (2) env var > (3) config file
  • New: Users should have the ability to define env vars in a dbt.env file
    • This file must be located in the root directory where dbt is running.
    • This file can define any & all env vars relevant to dbt. Not limited to just the env var versions of "global configs"
    • If the file is found, dbt should use these values and only these values first to resolve all env vars. These should take precedence over anything read from the system. (See: [CT-1765] [Feature] Provide env vars as data during runtime #6545.)
    • This .env file should use standard "text" syntax, one statement per line in the style of DBT_ENV_VAR=value
    • It can be optionally gitignored—or version controlled, to ensure a consistent "env" for all runs
    • It's likely that dbt Cloud would write this file on users' behalf, and block users from defining it themselves.
  • New: dedicated config file. This is a way to define "global" dbt configurations. It could be saved in version control to ensure consistent default configuration. It should be yaml for starters, but there's no reason we couldn't someday also support ini, toml, json, ...
    • We move it out of profiles.yml, into its own dedicated file: dbt_config.yml.
    • This file does not support any Jinja rendering. Literal values only.
    • The file can be located in one of two places:
      • (a) root directory where dbt is running - preferred if available
      • (b) "home" directory for this system - Path.home() - default runtime config for any/all projects running on this user's machine
    • Note that these are the same file locations & precedence order as profiles.yml, minus the additional option of specifying a --profiles-dir flag. We cannot locate "UserConfig" on the basis of a flag, because we need to read user configs before we can resolve flags.
    • If we can't find dbt_config.yml, that's fine, keep going.
    • If we can locate dbt_config.yml, and there's an error while reading it, it's always an error. We immediately raise the error and stop.
    • dbt Cloud users could define this file themselves, and store it in their project's root directory.
      • How would this work in a unified deployment of multiple projects? Let me think more about it :)

@jtcohen6 jtcohen6 self-assigned this Feb 16, 2023
@colin-rogers-dbt
Copy link
Contributor

I wonder if we can simplify the code/configuration but just having folks move to use .env files by having more support for these. What does a .yml config give that .env doesn't?

If we do want got the route of a dbt_config file, it would probably be handy to set per stage / environment configurations. Either declaratively by supporting: prod.dbt_config.yml and local.dbt_config.yml or by allowing the stages to be specified within the file.

@jtcohen6
Copy link
Contributor Author

Thanks @colin-rogers-dbt @gshank @dbeatty10 for talking through this live during BLG!

If the file is found, dbt should use these values and only these values to resolve all env vars. It should never read them directly from the system.

Inside dbt.env, we could make this flexible with some kind of PASSENV, e.g.

PASSENV=DBT_* SPECIFIC_ENV_VAR

Or for everything:

PASSENV=*

If we do want got the route of a dbt_config file, it would probably be handy to set per stage / environment configurations.

I do want to think more about this. My initial instinct is, per-environment configuration should be solved for with environment variables (.env).

Good idea: Let's call this something like dbt_defaults.yml!


What does a .yml config give that .env doesn't?

It does allow us to:

  • Allow users to write yaml (the standard configuration language of dbt)
  • Perform additional validation on those configs
  • Allows us to support structured/nested configs (à la --warn-error-options), rather than needing to add an additional flag for every possible permutation

@dbeatty10
Copy link
Contributor

Factor III in a twelve-factor app is " Config: Store config in the environment".

One way to load environment-specific environment variables is using set -a; source .env; set +a like @RobbertDM described here.

Using direnv is another alternative to loading environment variables.

A pro of direnv is that it loads from .envrc (or optionally .env) upon entering the project directory and unloads the environment variables when leaving that directory.

The cons are that it isn't a Python program, and installation instructions are system-specific. Also, anytime there is a change to .envrc, it needs to be approved by running direnv allow.

@dbeatty10
Copy link
Contributor

@jtcohen6 this is eye-catching:

If the file is found, dbt should use these values and only these values to resolve all env vars. It should never read them directly from the system

What would be the consequences, pros, and cons of never reading env variables directly from the system when there is a dbt.env file present?

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Jul 5, 2023

@dbeatty10 Hah, it felt a bit controversial when I wrote it too! I'm not all the way sold on it myself.

It would be a way of "exporting" the environment into a file, and providing predictability/portability: wherever dbt runs, it uses the environment variables defined in that file, rather than the system / environment it's running in. This was a need we'd identified while working on runtimes for dbt-core in dbt Cloud: #6545

It might be sufficient (and less controversial) for dbt.env to simply take precedence over any env vars defined in the system, rather than completely preventing access.

@jtcohen6 jtcohen6 added this to the v1.7 milestone Jul 5, 2023
@dbeatty10
Copy link
Contributor

It might be sufficient (and less controversial) for dbt.env to simply take precedence over any env vars defined in the system, rather than completely preventing access.

➕ This makes sense to me

@colin-rogers-dbt
Copy link
Contributor

Recent work on dynamic profiles has exposed another problem with profile.yml and mixing runtime/project configuration: since it's a very flat spec it does not support nested/conditional configuration. The particular use case/problem is the configuration of auth methods where certain configurations only make sense with certain authentication methods (see: dbt-bigquery -> if auth_method = 'service-account-json' then user must set key_file_json otherwise it's just ignored). This is non-intuitive to the user, and requires a lot of validation code to manage in the adapter which introduces a lot of variance in behavior (i.e. sometimes we ignore unused parameters and sometimes we don't).

All of this is to say that we might want to look at a more structured configuration format like toml or even json-schema alongside a simple .env file.

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Jul 7, 2023

@colin-rogers-dbt That makes sense to me — though it sounds more like something we'd want to address in a (long-overdue) rework of profiles.yml, rather than a consideration for UserConfig specifically. The goal of UserConfig is really just to offer an in-file & yaml-formatted equivalent to the "global" configs that users can set via env var or by passing CLI flags. Some flags are mutually interdependent (e.g. --defer requires that --state or --defer-state is also set), but we need to have those conditional checks living inside dbt-core anyway. Let me know if I'm missing something

@colin-rogers-dbt
Copy link
Contributor

@jtcohen6 sorry I could have written this clearer: I meant this in a more general "flat" or unstructured configs come with significant downsides way not that we solve for profiles.yml here.

we need to have those conditional checks living inside dbt-core anyway.

Tots agree, not saying this config would replace those checks as they are currently written (we have to support them now) but that it might encourage us to structure the flags differently in the future to encourage better ergonomics and it would allow us to surface the existing complexity in a more user friendly way.

An extreme case is how spark manages it's configuration: configuration is pretty error prone as it's very easy to set conflicting flags many of which will silently change the application's behavior in unpredictable ways.

@MichelleArk MichelleArk added enhancement New feature or request and removed tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality labels Jul 31, 2023
@jtcohen6 jtcohen6 removed this from the v1.7 milestone Nov 7, 2023
@jtcohen6
Copy link
Contributor Author

Closing this in favor of #9183 :)

@jtcohen6 jtcohen6 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Refinement Maintainer input needed
Projects
None yet
Development

No branches or pull requests

4 participants