Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blueprint and load_defs_from_yaml #21980

Merged
merged 2 commits into from
May 24, 2024
Merged

Blueprint and load_defs_from_yaml #21980

merged 2 commits into from
May 24, 2024

Conversation

sryza
Copy link
Contributor

@sryza sryza commented May 20, 2024

Summary & Motivation

How I Tested These Changes

Args:
path (Path | str): The path to the YAML file or directory of YAML files containing the
blueprints for Dagster definitions.
per_file_blueprint_type (type[Blueprint]): The type of blueprint that each of the YAML
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense that this is Type[Blueprint] at least right now. In the future it might be ergonomic to allow for for e.g. List[Blueprint] (which would just have a special cased build_defs impl, I guess).

Copy link
Member

@benpankow benpankow May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little curious at the decision to specify the blueprint type in Python code - how would we suggest users implement some sort of "union type" where a file could define one of a few types of assets?

I could imagine a few ways to do this:

  • Separate load_defs_from_yaml calls targeting each set of files. This means users either would (i) separate their blueprint types into subfolders, (ii) distinguish them with something in the filename or (iii) have to introspect the files to figure out the type
  • A wrapper Blueprint type with a Selector/union type at the top level. This means you need to have some type which imports/explicitly unions all possible blueprints (pydantic & our type system's handling of unions is a bit wonky)

I think it's a reasonable API decision but I do find the other option - letting the user encode the blueprint type in the YAML file - also pretty nice:

_blueprint:
  module: my_module.blueprints
  class: MyFooBlueprint

a_str: foo
an_int: 55

This is less elegant, but avoids user code having to know about all possible blueprints, or defining a big wrapper type. This would be useful for things like integration-provided blueprints:

_blueprint:
  module: dagster_embedded_elt.sling
  class: SlingSyncBlueprint

...
_blueprint:
  module: dagster_dbt
  class: DbtModelBlueprint

...

That way a user can just call

defs = load_defs_from_yaml(...)

With a dir full of various YAML files that point at different integrations, as opposed to multiple calls which have to filter out each match/non-match.

Copy link
Member

@benpankow benpankow May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a world in which specifying blueprint-in-yaml could nicely match syntactically with linking a YAML schema for editor type checking:

# dagster: blueprint=dagster_embedded_elt.sling.SlingSyncBlueprint
# yaml-language-server: $schema=./.schemas/dagster_embedded_elt.sling.SlingSyncBlueprint

and we'd have some utility dagster blueprint generate-schemas to populate that .schemas folder automatically

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoids user code having to know about all possible blueprints

The way I have been thinking about it, this is a feature more than a bug. I.e. a big goal of blueprints is to help platform owners create guardrails for their stakeholders. By defining the set of available blueprints up front, you can be confident that only those will show up in the loaded definitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense that this is Type[Blueprint] at least right now. In the future it might be ergonomic to allow for for e.g. List[Blueprint] (which would just have a special cased build_defs impl, I guess).

Agreed. I actually started implementing this in this PR, but it added enough complexity that I figured it would be better to consider it separately. Same with unions.

@sryza sryza force-pushed the dagster-yaml-utils branch 2 times, most recently from 1f129f7 to 5988d5e Compare May 20, 2024 23:41
@sryza sryza force-pushed the dagster-yaml-utils branch 2 times, most recently from cb39a4e to e2fb58c Compare May 21, 2024 16:25
Base automatically changed from dagster-yaml-utils to master May 21, 2024 19:29
Copy link
Member

@benpankow benpankow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming q, impl is good 🚢

pass


class BlueprintDefinitions(NamedTuple):
Copy link
Member

@benpankow benpankow May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitions feels like a first-class noun in the Dagster system and at first glance I would expect BlueprintDefinitions to implement it. There's an unfortunate collision with the lowercase d "definitions", which this seems to actually be describing, but this threw me for a bit of a loop.

How would you feel about e.g. BlueprintDefs or even PartialDefinitions, DefinitionsFragment (leaving open the door for non-Blueprint use-case)?

Is there a world in which we'd have some sort of definitions ABC that would let you .merge a BlueprintDefinitions with Definitions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is BlueprintDefinitions replacing the concept of merging/joining Defintions? (eg are we returning to the world in which Definitions is singular)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted this through in person, but for posterity:

  • I hope to eliminate BlueprintDefinitions and replace it with Definitions
  • Using Definitions here will require making it possible to construct a Definitions object without binding all the resource keys to resource definitions, which is a behavior change from the current Definitions object. If we do make that change, we will probably want to at least wait until 1.8. Adding a new class allows us to make progress on blueprints without waiting. More discussion here: Definitions.merge and don't immediately validate definitions #21746.

@sryza sryza force-pushed the blueprints-and-load branch 2 times, most recently from 500afd1 to 30ff526 Compare May 21, 2024 23:41
@sryza
Copy link
Contributor Author

sryza commented May 22, 2024

@benpankow actually I updated this to handle unions. It just required switching from cls.parse_obj(...) to parse_obj_as(cls, ...). I also added a couple tests.

branch-name: blueprints-and-load
@sryza sryza merged commit 066fa22 into master May 24, 2024
1 check failed
@sryza sryza deleted the blueprints-and-load branch May 24, 2024 18:19
danielgafni pushed a commit to danielgafni/dagster that referenced this pull request Jun 18, 2024
## Summary & Motivation

## How I Tested These Changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants