Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-634] Node configs: tech debt #5236

Open
jtcohen6 opened this issue May 12, 2022 · 1 comment
Open

[CT-634] Node configs: tech debt #5236

jtcohen6 opened this issue May 12, 2022 · 1 comment
Assignees
Labels
design_needed tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality

Comments

@jtcohen6
Copy link
Contributor

Context

Problem

We need to better organize our tightly bundled NodeConfig. Many of these configs are only relevant to "executable" nodes (models, snapshots, seeds, and tests IFF store_failures is enabled). Some of them are only relevant to incremental models in particular. Analyses have no business with database/schema/alias.

Existing proposals

Incremental models. Many many configs are really just relevant to the incremental model materialization. As discussed in #5089: It feels a little icky that configs like unique_key and on_schema_change appear in the config for every node, when they're really just relevant to incremental models in particular. Could we isolate these to appear on incremental models only? This could be:

  • an IncrementalModelConfig subclass, which would supply unique_key, on_schema_change, incremental_strategy, and (new) incremental_predicates if and only if materialized == 'incremental'
  • a dictionary named incremental that contains strategy + predicates, and the old names can be passthroughs to that dictionary for backwards compatibility

Metadata interface. Every time we add a new attribute to NodeConfig, it changes the manifest.json artifact schema. As discussed in #4617: Given that users can supply whatever configs they want, to any node type that accepts config, should we even consider this a contracted metadata interface? I think the external contract for node.config should really be Dict[str, Any], and we should always be in the habit of "promoting" configs that are relevant and reliable to downstream metadata consumers as top-level node keys. Similar to what we did with node_info on logging events, which "promotes" config.materialized into node_info.materialized.

Adapter-specific configs (sort of discussed in #4622): I'm actually not sure if AdapterConfig works at all today. My sense is that their keys don't show up in the manifest unless set, their types aren't really validated until used (at runtime), and their default values don't get passed through. The code here is kinda jank, to put it nicely.

@jtcohen6 jtcohen6 added tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality Team:Language labels May 12, 2022
@github-actions github-actions bot changed the title Node configs: tech debt [CT-634] Node configs: tech debt May 12, 2022
@jtcohen6 jtcohen6 mentioned this issue May 12, 2022
6 tasks
@leahwicz
Copy link
Contributor

@emmyoop @jtcohen6 could you work together to flesh this out some more? It can be a spike if we need some digging in and designing on what we want to do here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design_needed tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality
Projects
None yet
Development

No branches or pull requests

4 participants