Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 extends does not extend/merge values properly from source #1440

Closed
1 task done
SutuSebastian opened this issue Jan 5, 2024 · 13 comments
Closed
1 task done

🐛 extends does not extend/merge values properly from source #1440

SutuSebastian opened this issue Jan 5, 2024 · 13 comments

Comments

@SutuSebastian
Copy link

Environment information

CLI:
  Version:                      1.4.1
  Color support:                true

Platform:
  CPU Architecture:             aarch64
  OS:                           macos

Environment:
  BIOME_LOG_DIR:                unset
  NO_COLOR:                     unset
  TERM:                         "xterm-256color"
  JS_RUNTIME_VERSION:           "v20.10.0"
  JS_RUNTIME_NAME:              "node"
  NODE_PACKAGE_MANAGER:         "bun/1.0.21"

Biome Configuration:
  Status:                       Loaded successfully
  Formatter disabled:           false
  Linter disabled:              false
  Organize imports disabled:    false
  VCS disabled:                 true

Workspace:
  Open Documents:               0

What happened?

Have the following setup:

config.json

{
  "formatter": {
    "lineWidth": 120,
    "indentStyle": "space"
  }
}

biome.json

{
  "extends": "./config.json",
  "formatter": {
    "indentStyle": "tab"
  }
}

Result (does not merge properly, it seems like it overrides the entire formatter property instead of merging into with precedence)

{
  "formatter": {
    "indentStyle": "tab"
  }
}

Expected result

Values from config.json to be merged into biome.json with biome.json properties (deep merge maybe) to take precedence:

Result

{
  "formatter": {
    "lineWidth": 120,
    "indentStyle": "tab"
  }
}

Code of Conduct

  • I agree to follow Biome's Code of Conduct
@arendjr
Copy link
Contributor

arendjr commented Jan 5, 2024

@ematipico Would you think this bug is fixable before Biome 2? I was never sure whether the behavior was intentional or not, and it could be someone relies on it by now...

@SutuSebastian
Copy link
Author

@ematipico Would you think this bug is fixable before Biome 2? I was never sure whether the behavior was intentional or not, and it could be someone relies on it by now...

If that "someone" does indeed rely on this misbehaviour, its time to finally fix it 😄

@arendjr
Copy link
Contributor

arendjr commented Jan 5, 2024

I think I've found the bug. Preparing a PR.

@arendjr
Copy link
Contributor

arendjr commented Jan 5, 2024

Okay, it is going to be a bit of a complex fix, since the root of the issue runs all the way back into Biome's deserializer. In short:

The implementation for Deserializable on FormatterConfiguration starts with FormatterConfiguration::default() and then overwrites the fields that are actually present in the JSON. Because FormatterConfiguration::default() sets lineWidth to 80, that value ends up in the deserialized form. Then when Biome merges the extends configs, it ends up overwriting all the fields in the formatter object. Each field is overwritten with either the value that was really set in the JSON, or the default value, because after deserialization the merge algorithm doesn't see the difference anymore.

Similar problems also exist with LinterConfiguration and I assume others.

It appears an attempt has already been made to work around this problem, since there are explicit merge_with_if_not_default() methods. The reason that workaround fails is because the implementations of merge_with_if_not_default() don't seem to check the values field-by-field, but are still implemented with an all-or-nothing approach. I think I could "fix the workaround" by changing those implementations to check field-by-field, but that still leaves another issue: If a config.json sets a value to a non-default one, any biome.json that extends config.json cannot revert it to a default one, because merge_with_if_not_default() would refuse to merge it.

So I think the correct fix would actually be to fix the deserializers so they start from a "null" state instead of the default, allowing callers to distinguish between fields that are present in the JSON and those that aren't. Then the defaults should only be merged in at the end, after merging any extended configs. The scope of this will be pretty wide though, and I can't yet oversee all the consequences.

It might also mean I can fully clean up merge_with_if_not_default(), although I'm not sure yet if that has some other usage as well that I'm not yet aware of.

Finally, the problems go even deeper for fields with complex values, such as the sets used for ignore and include options, as well as the rules definitions. I don't think those have meaningful merge functionality implemented at all yet. But maybe I should tackle those in a follow-up PR.

I'll attempt to first see how far I would get with changing the deserializers, but please let me know if anyone thinks that approach is likely to fail. (cc @ematipico, @Conaclos)

@arendjr
Copy link
Contributor

arendjr commented Jan 5, 2024

I also notice the overrides functionality seems to work by defining structs such as OverrideFormatterConfiguration, which appear to be almost verbatim copies of the types they override, but with different defaults defined. It makes me wonder if I can unify the approaches... but even more it makes me wonder why an approach more similar to what I'm trying to do hasn't been chosen there. Either I'm going to massively simplify things... or I'm going to hit a huge wall I don't see yet...

@faultyserver
Copy link
Contributor

Either I'm going to massively simplify things... or I'm going to hit a huge wall I don't see yet...

I have a feeling it will be both! I think Ema had called out sometime previously that the difficulty with unifying Override*Configuration with the normal ones (and the *Settings structs as well, which are a third place where this information ends up living) is that they would all end up having Option<> types for the values (since configuration files and overrides don't have to specify every field), which would mean the current usages of .unwrap_or_default would change semantics (i.e., from calling IndentWidth::default() to Option::default() which just yields None).

I think a possible approach here would be to do exactly what you've mentioned and have everything on the configuration side use None as the default values, use that all the way through while resolving. From reading the file initially, to handling overloads, to handling language-specific values (i.e., javascript.formatter inherits some fields from formatter). And then only at the very final step when returning the options to the caller would it apply the defaults to fill in the None values.

The language-specific settings also have an interesting property (and a decision to make) about how they interact with overrides. For example:

{
  "formatter": { "indentWidth": 2 },
  "javascript": { "formatter": { "indentWidth": 3 } },
  "overrides": [{
    "files": ["test.js"],
    "formatter": { "indentWidth": 4 }
  }]
}

When I want to format test.js, what should indentWidth be? It definitely won't be 2, we know that much, because formatter.indentWidth has been overridden. But should it be 4? Because that's what the overrides.0.formatter.indentWidth says...but we specifically said javascript.formatter.indentWidth is 3 globally, which feels like it should take precedence.

So imo, formatting for test.js should stay as 3. But I think that's somewhat debatable and either 3 or 4 could feel correct, so I think we just have to decide which we want.

The order of resolution for the value would be (from most important to least important):

overrides.n.javascript.formatter.indentWidth
javascript.formatter.indentWidth
overrides.n.formatter.indentWidth
formatter.indentWidth

And I think the way to achieve that is again exactly the approach you've described, which is to use None as the value for all options unless they are explicitly specified. Then the configuration resolves overrides for every "category" individually (formatter, javascript.formatter), and then only at the end merges in the generic properties (formatter) into the language specific ones (javascript.formatter).

I guess more generally, that means the order ends up being:

load global configuration
apply matching override if there's a matching pattern
merge generic properties into language-specific configuration

@arendjr
Copy link
Contributor

arendjr commented Jan 5, 2024

So imo, formatting for test.js should stay as 3.

I agree, that seems the most logical to me as well.

@Conaclos
Copy link
Member

Conaclos commented Jan 6, 2024

@arendjr

Before my refactoring of biome_deserialize, deserialization followed two steps:

  1. create an instance of the type to deserialize,
  2. Call a deserialize-like method directly on the instance and override present fields.

The caller was responsible for creating an instance. In fact, every caller created a ::default instance.

I refactored biome_deserialize to shift the responsibility of the caller to the deserialized type.
However, I kept the original idea of implementing deseriliazers by creating a ::default instance and then override present fields.

Thus, I think this should be easy to change it since only the deserializers have to be updated :)
Instead, of creating a ::default instance, we could create our own instance that initializes every field with None.

Alternatively, we could remove all custom impls of the Default trait (using the default impl of Default) and we could provide accessors for each field that could return the field if set or a default value. I could personally choose this alternative, however this requires more changes. We could do it later and start with the first approach.

@Conaclos
Copy link
Member

Conaclos commented Jan 6, 2024

So imo, formatting for test.js should stay as 3.

I agree, however I could prefer reasoning in terms of config propagation.

In your example, the test.js override don't override javascript.formatter. Thus, it inherits of it.
Thus, the original config is equivalent to:

{
  "formatter": { "indentWidth": 2 },
  "javascript": { "formatter": { "indentWidth": 3 } },
  "overrides": [{
    "files": ["test.js"],
    "formatter": { "indentWidth": 4 }
    "javascript": { "formatter": { "indentWidth": 3 } },
  }]
}

@ematipico
Copy link
Member

Yeah that's the curse of the defaults.

@faultyserver explained how my solution could be. I wished there was a more generic way to handle this

@Conaclos
Copy link
Member

Conaclos commented Jan 7, 2024

We could also introduce a Overridable trait that allows overriding a configuration with another (the actual merge_with_configuration). Most of the implementations of this trait should be straightforward because we extensively use Option in the configuration space. We could thus provide a derive macro.

The implementation of Overridable for Option<_> would be:

Overridden config overriding config result
x None x
_ x x

For "atomic" types (numbers, booleans, strings, Vec, ...):

Overridden config overriding config result
_ x x

@arendjr
Copy link
Contributor

arendjr commented Jan 7, 2024

My first attempt at a fix: #1460

@ematipico
Copy link
Member

Fixed by #1460

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants