Skip to content

Commit

Permalink
[draft][pythonic config][docs] Introduce intro to Config doc utilizin…
Browse files Browse the repository at this point in the history
…g Pythonic config (#12349)

## Summary

Introduces a new experimental guide, aiming to introduce the new
Pythonic config APIs.

Also see #12260 for the corresponding resources guide.

## Test Plan


https://dagster-git-benpankow-update-config-schema-docs-acca2b-elementl.vercel.app/guides/dagster/pythonic-config
  • Loading branch information
benpankow authored and clairelin135 committed Feb 23, 2023
1 parent e46226f commit c910a6d
Show file tree
Hide file tree
Showing 7 changed files with 461 additions and 10 deletions.
4 changes: 4 additions & 0 deletions docs/content/_navigation.json
Original file line number Diff line number Diff line change
Expand Up @@ -741,6 +741,10 @@
"title": "Asset versioning and caching",
"path": "/guides/dagster/asset-versioning-and-caching"
},
{
"title": "Pythonic config",
"path": "/guides/dagster/pythonic-config"
},
{
"title": "Pythonic resources",
"path": "/guides/dagster/pythonic-resources"
Expand Down
1 change: 1 addition & 0 deletions docs/content/guides.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ Learn to apply [Dagster concepts](/concepts) to your work, explore experimental

## Experimental features

- [Pythonic config](/guides/dagster/pythonic-config) - Get started using the new Pythonic config APIs, a more ergonomic way to define op and asset config
- [Pythonic resources](/guides/dagster/pythonic-resources) - Get started using the new Pythonic resource APIs, a more ergonomic way to define and use resources
- [Using Custom Run Coordinators to perform run attribution](/guides/dagster/run-attribution) - A look at using a Custom Run Coordinator to perform run attribution
- [Asset versioning and caching](/guides/dagster/asset-versioning-and-caching) - Memoize assets using Dagster's logical versioning system
Expand Down
250 changes: 250 additions & 0 deletions docs/content/guides/dagster/pythonic-config.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
---
title: Pythonic run configuration | Dagster
description: Pythonic run configuration is an experimental new feature that makes Dagster config easier to use.
---

# Pythonic run configuration

<Note>
This feature is considered <strong>experimental</strong>.
</Note>

This guide acts as an introduction to providing parameters to assets and jobs using the experimental Pythonic configuration APIs, which makes defining and passing this [configuration](/concepts/configuration/config-schema) more lightweight.

It's often useful to provide user-chosen values to Dagster jobs or software-defined assets at runtime. For example, you might want to choose what dataset an op runs against, or provide a connection URL for a database resource. Dagster exposes this functionality through a configuration API.

---

## Using Pythonic configuration

### Defining Pythonic configuration on assets and ops

Configurable parameters accepted by an op or asset are specified by defining a config model subclass of <PyObject object="Config"/> and a `config` parameter to the corresponding op or asset function. Under the hood, these config models utilize [Pydantic](https://docs.pydantic.dev/), a popular Python library for data validation and serialization.

During execution, the specified config is accessed within the body of the op or asset using the `config` parameter, which is reserved specifically for this purpose.

<TabGroup persistentKey="assetsorops">
<TabItem name="Using software-defined-assets">

#### Using software-defined assets

Here, we define a subclass of <PyObject object="Config"/> holding a single string value representing the name of a user. We can access the config through the `config` parameter in the asset body.

```python file=/guides/dagster/pythonic_config/pythonic_config.py startafter=start_basic_asset_config endbefore=end_basic_asset_config dedent=4
from dagster import asset
from dagster._config.structured_config import Config

class MyAssetConfig(Config):
person_name: str

@asset
def greeting(config: MyAssetConfig) -> str:
return f"hello {config.person_name}"
```

</TabItem>
<TabItem name="Using ops and jobs">

#### Using ops

Here, we define a subclass of <PyObject object="Config"/> holding a single string value representing the name of a user. We can access the config through the `config` parameter in the op body.

```python file=/guides/dagster/pythonic_config/pythonic_config.py startafter=start_basic_op_config endbefore=end_basic_op_config dedent=4
from dagster import op
from dagster._config.structured_config import Config

class MyOpConfig(Config):
person_name: str

@op
def print_greeting(config: MyOpConfig):
print(f"hello {config.person_name}")
```

</TabItem>
</TabGroup>

### Defining and accessing Pythonic configuration for a resource

Configurable parameters for a resource are defined by specifying attributes for a resource class, which subclasses <PyObject object="ConfigurableResource"/>. The below resource defines a configurable connection URL, which can be accessed in any methods defined on the resource.

```python file=/guides/dagster/pythonic_config/pythonic_config.py startafter=start_basic_resource_config endbefore=end_basic_resource_config dedent=4
from dagster import op
from dagster._config.structured_config import ConfigurableResource

class MyDatabaseResource(ConfigurableResource):
connection_url: str

def query(self, query: str):
return get_engine(self.connection_url).execute(query)
```

For more information on using resources, see [the Pythonic resources guide](/guides/dagster/pythonic-resources).

### Specifying runtime configuration in code

To execute a job or materialize an asset that specifies config, you'll need to provide values for its parameters. When specifying config from the Python API, we can use the `run_config` argument for <PyObject object="JobDefinition" method="execute_in_process"/> or <PyObject object="materialize"/>. This takes a `RunConfig` object, within which we can supply config on a per-op or per-asset basis. The config is specified as a dictionary, with the keys corresponding to the op/asset names and the values corresponding to the config values.

```python file=/guides/dagster/pythonic_config/pythonic_config.py startafter=start_execute_with_config endbefore=end_execute_with_config dedent=4
from dagster import job, materialize, op

from dagster._core.definitions.run_config import RunConfig

@job
def greeting_job():
print_greeting()

job_result = greeting_job.execute_in_process(
run_config=RunConfig({"print_greeting": MyOpConfig(person_name="Alice")})
)

asset_result = materialize(
[greeting],
run_config=RunConfig({"greeting": MyAssetConfig(person_name="Alice")}),
)
```

\--

## Defining complex config schemas

In some cases, you may want to define a more complex config schema. For example, you may want to define a config schema that takes in a list of files or complex data. Below we'll walk through some common patterns for defining more complex config schemas.

### Attaching metadata to config fields

Config fields can be annotated with metadata, which can be used to provide additional information about the field, using the Pydantic <PyObject object="Field"/> class.

For example, we can annotate a config field with a description, which will be displayed in the documentation for the config field. We can add a value range to a field, which will be validated when config is specified.

```python file=/guides/dagster/pythonic_config/pythonic_config.py startafter=start_metadata_config endbefore=end_metadata_config dedent=4
from dagster._config.structured_config import Config
from pydantic import Field

class MyMetadataConfig(Config):
# Here, the ellipses `...` indicates that the field is required and has no default value.
person_name: str = Field(..., description="The name of the person to greet")
age: int = Field(
..., gt=0, lt=100, description="The age of the person to greet"
)

# errors!
MyMetadataConfig(person_name="Alice", age=200)
```

### Basic data structures

Many basic Python data structures can be used in your config schemas, including lists and mappings.

For example, we can define a config schema that takes in a list of user names and a mapping of user names to user scores.

```python file=/guides/dagster/pythonic_config/pythonic_config.py startafter=start_basic_data_structures_config endbefore=end_basic_data_structures_config dedent=4
from dagster._config.structured_config import Config
from typing import List, Dict
from dagster import materialize, asset

from dagster._core.definitions.run_config import RunConfig

class MyDataStructuresConfig(Config):
user_names: List[str]
user_scores: Dict[str, int]

@asset
def scoreboard(config: MyDataStructuresConfig):
...

result = materialize(
[scoreboard],
run_config=RunConfig(
{
"scoreboard": MyDataStructuresConfig(
user_names=["Alice", "Bob"],
user_scores={"Alice": 10, "Bob": 20},
)
}
),
)
```

### Nested schemas

Schemas can be nested in one another, or in basic Python data structures.

Here, we define a schema which contains a mapping of user names to complex user data objects.

```python file=/guides/dagster/pythonic_config/pythonic_config.py startafter=start_nested_schema_config endbefore=end_nested_schema_config dedent=4
from dagster import asset, materialize
from dagster._config.structured_config import Config
from dagster._core.definitions.run_config import RunConfig
from typing import Dict

class UserData(Config):
age: int
email: str
profile_picture_url: str

class MyNestedConfig(Config):
user_data: Dict[str, UserData]

@asset
def average_age(config: MyNestedConfig):
...

result = materialize(
[average_age],
run_config=RunConfig(
{
"average_age": MyNestedConfig(
user_data={
"Alice": UserData(age=10, email="alice@gmail.com", profile_picture_url=...),
"Bob": UserData(age=20, email="bob@gmail.com", profile_picture_url=...),
}
)
}
),
)
```

### Union types

Union types are supported using Pydantic [discriminated unions](https://docs.pydantic.dev/usage/types/#discriminated-unions-aka-tagged-unions). Each union type must be a subclass of <PyObject object="Config"/>. The `discriminator` argument to <PyObject object="Field"/> specifies the field that will be used to determine which union type to use.

Here, we define a config schema which takes in a `pet` field, which can be either a `Cat` or a `Dog`, as indicated by the `pet_type` field.

```python file=/guides/dagster/pythonic_config/pythonic_config.py startafter=start_union_schema_config endbefore=end_union_schema_config dedent=4
from dagster import asset, materialize
from dagster._config.structured_config import Config
from dagster._core.definitions.run_config import RunConfig
from pydantic import Field
from typing import Union
from typing_extensions import Literal

class Cat(Config):
pet_type: Literal["cat"] = "cat"
meows: int

class Dog(Config):
pet_type: Literal["dog"] = "dog"
barks: float

class ConfigWithUnion(Config):
# Here, the ellipses `...` indicates that the field is required and has no default value.
pet: Union[Cat, Dog] = Field(..., discriminator="pet_type")

@asset
def pet_stats(config: ConfigWithUnion):
if isinstance(config.pet, Cat):
return f"Cat meows {config.pet.meows} times"
else:
return f"Dog barks {config.pet.barks} times"

result = materialize(
[pet_stats],
run_config=RunConfig(
{
"pet_stats": ConfigWithUnion(
pet=Cat(meows=10),
)
}
),
)
```
1 change: 1 addition & 0 deletions docs/content/guides/dagster/pythonic-resources.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,7 @@ Then, configuration for the resource can be provided at runtime in the launchpad

```python file=/guides/dagster/pythonic_resources/pythonic_resources.py startafter=start_new_resource_runtime_launch endbefore=end_new_resource_runtime_launch dedent=4
from dagster import sensor, define_asset_job, RunRequest
from dagster._core.definitions.run_config import RunConfig

update_data_job = define_asset_job(
name="update_data_job", selection=[data_from_database]
Expand Down
1 change: 1 addition & 0 deletions docs/content/guides/experimental-features.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ title: Experimental Feature Guides | Dagster Docs

# Experimental feature guides

- [Pythonic config](/guides/dagster/pythonic-config) - Get started using the new Pythonic config APIs, a more ergonomic way to define op and asset config
- [Pythonic resources](/guides/dagster/pythonic-resources) - Get started using the new Pythonic resource APIs, a more ergonomic way to define and use resources
- [Using Custom Run Coordinators to perform run attribution](/guides/dagster/run-attribution) - A look at using a Custom Run Coordinator to perform run attribution
- [Asset versioning and caching](/guides/dagster/asset-versioning-and-caching) - Memoize assets using Dagster's logical versioning system
Expand Down

0 comments on commit c910a6d

Please sign in to comment.