Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2787] [Bug] dbt is too aggressive about preventing contracted changes during CI #8028

Closed
2 tasks done
Tracked by #7979
joellabes opened this issue Jul 5, 2023 · 10 comments
Closed
2 tasks done
Tracked by #7979

Comments

@joellabes
Copy link
Contributor

joellabes commented Jul 5, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Slack discussion

When using dbt alongside state:modified, any breaking changes to the contract are rejected as an error.

To summarise the thread:

  • I don't like that the selector syntax I choose – something which until now I have considered a read-only/filtering operation – unlocks special new errors
  • @jelstongreen doesn't like that changes to a contracted model aren’t allowed even if the deal contract has been altered to accurately reflect the new state
  • @jaypeedevlin doesn't like that this assumes state:modified is only used in CI contexts

Expected Behavior

I think ContractBreakingChangeError should instead be a warning which can be escalated in CI contexts with --warn-error or --warn-error-options. This would solve all three of the above issues:

  • It's no longer a special new error, it's useful additional context that can be provided by added statefulness
  • Contracted models could be changed as long as they were updated to reflect the new rules, for folks who are using contracts for their own benefit so don't want/need versioning
  • OK this does still mostly assume it's in CI, but at least it doesn't block non-CI flows.

If we feel strongly that this is almost always an error but should be allowed to be ignored, I guess I would also settle for the inverse: some way to downgrade the errors to warnings.

Steps To Reproduce

  1. Make a model with a contract
  2. Run dbt compile
  3. Remove a column from the model and its contract
  4. Run dbt run -s state:modified --state path/to/artifacts

Relevant log output

Breaking Change to Contract Error in model model_y (model_y)
  While comparing to previous project state, dbt detected a breaking change to an enforced contract.
  
  Columns were removed: 
   - column_z
  
  Consider making an additive (non-breaking) change instead, if possible.

Environment

- OS:
- Python:
- dbt:

Which database adapter are you using with dbt?

No response

Additional Context

No response

@joellabes joellabes added bug Something isn't working triage labels Jul 5, 2023
@github-actions github-actions bot changed the title [Bug] dbt is too aggressive about preventing contracted changes during CI [CT-2787] [Bug] dbt is too aggressive about preventing contracted changes during CI Jul 5, 2023
@jaypeedevlin
Copy link
Contributor

Agree with Joel's suggestions, but I think in I think in @jelstongreen's use case (ie the contract is updated also) we might consider no warning at all, so we compare against the current contract not the old?

@jelstongreen
Copy link

Thanks @joellabes for raising the issues and both for comments. I would agree with @jaypeedevlin in that if the contract has been updated then verification should be against the new contract rather than the old one when using state - I can't think of a use case where you would want to compare your updated models against an outdated contract. In this situation I would say that the contract has not been broken, just altered.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 5, 2023

@joellabes @jelstongreen @jaypeedevlin Good conversation! Thanks for opening the issue to track it.

Today, dbt performs two contract checks:

  1. Do the columns produced by the model SQL (definition) match up with the columns specified in the YAML file (user-specified contract)? This check happens every time the model builds.
  2. Has there been a "breaking" change in this contract, compared to its previous specification? Meaning: a column has been renamed/retyped/removed, or an enforced constraint has been removed.

The former check always happens using the newest version of the contract. In the scenario we're describing, the first checks passes. It's the second check that we're interested in here.


To JPD's point about state:modified in CI contexts: I don't see the claim here as, "This should only raise an error at CI time." This is more the fact that dbt requires previous state (manifest) in order to make the comparison between current and previous contracts, and detect that the contract has changed. So if you're in dev, you're using state:modified in your dev workflow, you make a breaking change (remove/retype/rename a column), and then get an error saying as much — I see that as a good thing! I'd like to figure out how we could provide that feedback sooner & more frequently, as part of a standard state-informed dev workflow.

I do hear this point from Joel:

I don't like that the selector syntax I choose – something which until now I have considered a read-only/filtering operation – unlocks special new errors

I agree it's odd to have the check happen during selection syntax. We could do it instead during "deferral", or as soon as you pass a --state manifest, although those risk being even more implicit.


To JEG's point: I see how you want a way to override dbt's error ("I know what I'm doing") without strictly requiring a new model version. When we'd imagined this workflow a few months ago, @MichelleArk and I were imagining that this should happen rarely, but not never—perhaps some production issue requires immediate action—and so you'd "merge on red" (failing CI).

I'm hesitant to downgrade this from error to warning, because it risks getting lost among other warnings, makes it harder to surface, and CI checks tend to yield binary results: ✅ or ❌ . I can see the argument in favor of keeping this as an error by default, but giving users a way to downgrade this to a warning; we just don't lack an existing pattern to do that, à la --warn-error-options.

One idea: You could bump the model's version, without creating multiple versions. That's effectively what you're doing anyway, by updating the contract with a breaking change, and saying that there won't be any old version sticking around for a migration window. (I'm going to try this out right now and report back.)

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 5, 2023

-- models/my_model.sql
select 1 as id
# models/my_model.yml
models:
  - name: my_model
    config:
      contract: {enforced: true}
    columns:
      - name: id
        data_type: integer
      - name: this_column_will_be_removed
        data_type: boolean

The first contract check (1) succeeds when I run my_model. (I saved a manifest from this "previous run" and stuck it in a local folder named state/.) It succeeds again after I delete this_column_will_be_removed from both the SQL definition and YAML contract.

What starts failing is the second check (2), which detects a breaking change in a stateful comparison:

$ dbt ls -s state:modified --state state/
...
Breaking Change to Contract Error in model my_model (models/my_model.sql)
  While comparing to previous project state, dbt detected a breaking change to an enforced contract.

  Columns were removed:
   - this_column_will_be_removed

  Consider making an additive (non-breaking) change instead, if possible.
  Otherwise, create a new model version: https://docs.getdbt.com/docs/collaborate/govern/model-versions

I can get that check passing by "bumping" the version of the model (even if it was previously not versioned).

models:
  - name: my_model
    config:
      contract: {enforced: true}
    columns:
      - name: id
        data_type: integer

    versions:
      - v: 2

Things that make this work seamlessly:

  • The latest_version is always whatever the max of defined versions is. When there's only one version, it's the latest.
  • You can keep defining the latest_version in the standard (unsuffixed) file name — you only need to get into the X_v1.sql and X_v2.sql business when you have multiple
  • A custom alias config, or generate_alias_name macro that lands the latest_version in the standard (unsuffixed) table location, following our guidance here

(Note: This should also work if the model was previously versioned, and we just bump from v: 2 to v: 3. I just found a bug with this, which I've opened as a separate ticket: #8030)

@jtcohen6 jtcohen6 added discussion and removed bug Something isn't working triage labels Jul 5, 2023
@joellabes
Copy link
Contributor Author

One idea: You could bump the model's version, without creating multiple versions. That's effectively what you're doing anyway, by updating the contract with a breaking change, and saying that there won't be any old version sticking around for a migration window.

I quite like this! It's a "leave me alone I know what I'm doing" flag, but it's in the right place - you can add it before going through the CI flow instead of getting an error and then temporarily editing the dbt build commands to ignore the issue and then remembering to put it back afterwards.

@gnilrets
Copy link

I don't quite like the idea of requiring a version change in order to work around this. Our team is not yet ready to deal with versions. We wanted to to get contracts going first, get comfortable with those, and then later start incorporating versions. It's just another layer of complexity to have to deal with.

I do like the idea of being able to optionally just warn on contract violations. However, I'm still stuck on how I'd be able to work with that in CI. If I were doing my own CI (instead of SlimCI), I might think about setting the warn flag if the commit message contained some string indicating I didn't want to enforce the contract (or I wanted to change it). Feels hacky to me, but it might work. With dbt-cloud and SlimCI I don't see how I'd be able to do that without changing the CI job to temporarily do a warn-only. However, that would apply to any other pull requests that might be initiated at the time, which wouldn't work well for larger teams.

@rchui
Copy link

rchui commented Jul 27, 2023

My team is also not ready to use explicit versioning for our model changes but we want to be able to enforce model contracts based on our defined schema. The contracts are also useful internally as part of the development cycle because it enforces that the model we are writing matches our expected output shape. When we make a change to the schema we are explicitly stating that the contract is changing and it is unintuitive that we would compare the new model against the old contract. During rapid iteration phases of a model having to bump the version every single time that a column is renamed or a data type is changed is highly disruptive and makes the contracts feature more of a harm than a good.

I understand the purpose of explicit versioning when providing guarantees for external use. It is great when you want to enforce an external contract and provide guarantees but that is distinct from shape enforcement.

In my view, an error like this should be raised only if an explicit version number has been specified for the model and the contract has changed but the version number has not. This is entangling shape-enforcement with version enforcement which feel like two distinct features.

Edit: Added more context

@jtcohen6
Copy link
Contributor

jtcohen6 commented Aug 4, 2023

@gnilrets @rchui Thanks both for sharing your perspectives, and making some really compelling points!

Here's what I'm thinking:

  • A breaking change to a model with an enforced contract should raise a warning (but CI will still succeed)
  • Unless that model is versioned, in which case dbt understands that its API is locked in
  • If it is versioned, you can add a new version (or just bump the existing model's version) to facilitate the breaking change
  • If it's just a warning, you can still optionally promote that warning to an error (without messing with model versions) via the --warn-error / --warn-error-options config

Things I like about this:

  • More optionality overall
  • Positions contracts & versions as complementary, but less tightly coupled

Update: Opened as implementation ticket: #8384

@graciegoheen
Copy link
Contributor

I'm closing this as a duplicate, since we are tracking the implementation on #8384

Thanks for all of your input :)

@graciegoheen graciegoheen closed this as not planned Won't fix, can't repro, duplicate, stale Sep 8, 2023
@graciegoheen graciegoheen removed discussion Refinement Maintainer input needed labels Sep 8, 2023
@nicchristeller
Copy link

Unfortunately I'm only catching up to this discussion after implementation, but I wonder if there's a better approach here.

With the current option Turn "Breaking Change to Contract" into warning, unless model is versioned, there's "special case" handling causing functionality to change when a model starts being versioned. This could be unexpected if you haven't followed this discussion, or have been using contracts for a some time without model versions. It's also not as simple to specify that I really do want this contract to be enforced for this model with only one version.

Something like having 3 options for contract.enforced: false, warn, true seems more explicit and obvious to those with less familiarity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants