-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blueprint and load_defs_from_yaml #21980
Conversation
c8e1d09
to
ee483ca
Compare
Args: | ||
path (Path | str): The path to the YAML file or directory of YAML files containing the | ||
blueprints for Dagster definitions. | ||
per_file_blueprint_type (type[Blueprint]): The type of blueprint that each of the YAML |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense that this is Type[Blueprint]
at least right now. In the future it might be ergonomic to allow for for e.g. List[Blueprint]
(which would just have a special cased build_defs
impl, I guess).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little curious at the decision to specify the blueprint type in Python code - how would we suggest users implement some sort of "union type" where a file could define one of a few types of assets?
I could imagine a few ways to do this:
- Separate
load_defs_from_yaml
calls targeting each set of files. This means users either would (i) separate their blueprint types into subfolders, (ii) distinguish them with something in the filename or (iii) have to introspect the files to figure out the type - A wrapper
Blueprint
type with aSelector
/union type at the top level. This means you need to have some type which imports/explicitly unions all possible blueprints (pydantic & our type system's handling of unions is a bit wonky)
I think it's a reasonable API decision but I do find the other option - letting the user encode the blueprint type in the YAML file - also pretty nice:
_blueprint:
module: my_module.blueprints
class: MyFooBlueprint
a_str: foo
an_int: 55
This is less elegant, but avoids user code having to know about all possible blueprints, or defining a big wrapper type. This would be useful for things like integration-provided blueprints:
_blueprint:
module: dagster_embedded_elt.sling
class: SlingSyncBlueprint
...
_blueprint:
module: dagster_dbt
class: DbtModelBlueprint
...
That way a user can just call
defs = load_defs_from_yaml(...)
With a dir full of various YAML files that point at different integrations, as opposed to multiple calls which have to filter out each match/non-match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also a world in which specifying blueprint-in-yaml could nicely match syntactically with linking a YAML schema for editor type checking:
# dagster: blueprint=dagster_embedded_elt.sling.SlingSyncBlueprint
# yaml-language-server: $schema=./.schemas/dagster_embedded_elt.sling.SlingSyncBlueprint
and we'd have some utility dagster blueprint generate-schemas
to populate that .schemas
folder automatically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoids user code having to know about all possible blueprints
The way I have been thinking about it, this is a feature more than a bug. I.e. a big goal of blueprints is to help platform owners create guardrails for their stakeholders. By defining the set of available blueprints up front, you can be confident that only those will show up in the loaded definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense that this is Type[Blueprint] at least right now. In the future it might be ergonomic to allow for for e.g. List[Blueprint] (which would just have a special cased build_defs impl, I guess).
Agreed. I actually started implementing this in this PR, but it added enough complexity that I figured it would be better to consider it separately. Same with unions.
1f129f7
to
5988d5e
Compare
ee483ca
to
2157b6b
Compare
cb39a4e
to
e2fb58c
Compare
2157b6b
to
33ba47f
Compare
33ba47f
to
b93b26c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming q, impl is good 🚢
pass | ||
|
||
|
||
class BlueprintDefinitions(NamedTuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitions
feels like a first-class noun in the Dagster system and at first glance I would expect BlueprintDefinitions
to implement it. There's an unfortunate collision with the lowercase d "definitions", which this seems to actually be describing, but this threw me for a bit of a loop.
How would you feel about e.g. BlueprintDefs
or even PartialDefinitions
, DefinitionsFragment
(leaving open the door for non-Blueprint use-case)?
Is there a world in which we'd have some sort of definitions ABC that would let you .merge
a BlueprintDefinitions
with Definitions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is BlueprintDefinitions
replacing the concept of merging/joining Defintions
? (eg are we returning to the world in which Definitions
is singular)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted this through in person, but for posterity:
- I hope to eliminate
BlueprintDefinitions
and replace it withDefinitions
- Using
Definitions
here will require making it possible to construct aDefinitions
object without binding all the resource keys to resource definitions, which is a behavior change from the currentDefinitions
object. If we do make that change, we will probably want to at least wait until 1.8. Adding a new class allows us to make progress on blueprints without waiting. More discussion here:Definitions.merge
and don't immediately validate definitions #21746.
500afd1
to
30ff526
Compare
@benpankow actually I updated this to handle unions. It just required switching from |
branch-name: blueprints-and-load
30ff526
to
fa62b62
Compare
fa62b62
to
9c4d0c4
Compare
## Summary & Motivation ## How I Tested These Changes
Summary & Motivation
How I Tested These Changes