[docs] - Partitioned multi asset sensor examples #9722

clairelin135 · 2022-09-16T22:08:26Z

Adds examples to multi_asset_sensor documentation to detail several use cases for monitoring asset partitions:

Triggering a partitioned run after a corresponding upstream partition is materialized
Updating a weekly asset partition when upstream daily partitions are materialized or replaced
Materializing a daily asset partition if both upstream daily partitions are materialized

vercel · 2022-09-16T22:08:29Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

3 Ignored Deployments

Name	Status	Comments	Updated
dagit-storybook	⬜️ Ignored (Inspect)	💬 Add your feedback	Oct 11, 2022 at 6:46PM (UTC)
dagster	⬜️ Ignored (Inspect)		Oct 11, 2022 at 6:46PM (UTC)
dagster-oss-cloud-consolidated	⬜️ Ignored (Inspect)	💬 Add your feedback	Oct 11, 2022 at 6:46PM (UTC)

clairelin135 · 2022-09-16T22:17:20Z

Side note: these code examples are all of the OR case (e.g. if ANY upstream partition is replaced). We don't handle the AND case as well--for example, materializing a partition only if both upstreams have been replaced.

I think the majority of cases will be the OR case, though I wouldn't be surprised if users also wanted support for the AND case. It's a challenge because we can't evaluate materializations out of order due to the current cursor approach, so we may need to consider other options depending on how high-priority this use case is

sryza · 2022-09-21T18:23:25Z

docs/content/concepts/partitions-schedules-sensors/sensors.mdx


-Returns a dictionary mapping the `AssetKey` for each monitored asset to the most recent materialization record. If there is no materialization event, the mapped value will be `None`
+| Method                                                                                                      | Description                                                                                                                                                                                                          |


This is duplicative of the API reference, and I think we should just point people there to avoid the concept page getting massive. Thoughts?

I agree with this - I feel like this page is already really unwieldy.

got it--I will remove this

sryza · 2022-09-21T18:46:37Z

docs/content/concepts/partitions-schedules-sensors/sensors.mdx

+        if downstream_partitions:  # Check that a downstream daily partition exists
+            # Upstream daily partition can only map to at most one downstream daily partition
+            yield downstream_daily_job.run_request_for_partition(
+                downstream_partitions[0], run_key=None


We can omit run_key now

I think we now prefer returning a list of RunRequests rather than yielding them one by one, because it makes it clear that runs aren't requested at yield time but rather once the function has returned.

sryza · 2022-09-21T18:48:48Z

docs/content/concepts/partitions-schedules-sensors/sensors.mdx

+            to_asset_key=AssetKey("downstream_daily_asset"),
+        )
+        if downstream_partitions:  # Check that a downstream daily partition exists
+            # Upstream daily partition can only map to at most one downstream daily partition


It's a little unclear to me what this comment means. Does this apply to all partitioned asset sensors, or just to this particular situation?

This applied to just this particular situation. In any case, I am removing this code example in favor of only displaying the example of monitoring multiple upstream assets.

sryza · 2022-09-21T18:49:59Z

docs/content/concepts/partitions-schedules-sensors/sensors.mdx

+)
+
+
+@multi_asset_sensor(


Did you consider including multiple assets here? It would make the code more complex, but I think there's value in the general case. It's easy for users to adapt a multi-asset example to a single-asset example, but harder for them to go the other way around.

That makes sense. I've updated this code example instead to be of the multi-asset case

erinkcochran87 · 2022-09-22T15:55:22Z

docs/content/concepts/partitions-schedules-sensors/sensors.mdx


-Returns a dictionary mapping the `AssetKey` for each monitored asset to the most recent materialization record. If there is no materialization event, the mapped value will be `None`
+| Method                                                                                                      | Description                                                                                                                                                                                                          |


I agree with this - I feel like this page is already really unwieldy.

erinkcochran87 · 2022-09-22T16:08:03Z

docs/content/concepts/partitions-schedules-sensors/sensors.mdx

+
+If a partition mapping is not defined, Dagster will use the default partition mapping, which is the <PyObject object="TimeWindowPartitionMapping"/> for time window partitions definitions and the <PyObject object="IdentityPartitionMapping"/> for other partitions definitions. The <PyObject object="TimeWindowPartitionMapping"/> will map an upstream partition to the downstream partitions that overlap with it.
+
+#### Additional examples


Thoughts on making this a callout instead of a section with a heading? I almost missed it when looking at the preview, and a callout could make it stand out.

Ex:

Looking for more? Check out the Examples section! - [Updating a weekly asset partition when upstream daily partitions are materialized]() - [Materializing a daily asset partition when both upstream daily partitions are materialized]()

erinkcochran87 · 2022-09-22T16:10:09Z

docs/content/concepts/partitions-schedules-sensors/sensors.mdx

@@ -721,6 +761,89 @@ def uses_db_connection():

 If a resource you want to initialize has dependencies on other resources, those can be included in the dictionary passed to <PyObject object="build_resources"/>. For more in-depth usage, check out the [Initializing Resources Outside of Execution](/concepts/resources#initializing-resources-outside-of-execution) section.

+### Monitoring asset partitions with sensors
+


Could you add a list of links to the subsections here? It makes it easy to see what's in this bit and jump around. Ex:

- [Updating a weekly asset partition when upstream daily partitions are materialized]() - [Materializing a daily asset partition if both upstream daily partitions are materialized]()

erinkcochran87 · 2022-09-22T16:11:31Z

docs/content/concepts/partitions-schedules-sensors/sensors.mdx

@@ -721,6 +761,89 @@ def uses_db_connection():

 If a resource you want to initialize has dependencies on other resources, those can be included in the dictionary passed to <PyObject object="build_resources"/>. For more in-depth usage, check out the [Initializing Resources Outside of Execution](/concepts/resources#initializing-resources-outside-of-execution) section.

+### Monitoring asset partitions with sensors
+
+#### Updating a weekly asset partition when upstream daily partitions are materialized


This is probably a nit, but if there's a way to shorten this heading (and the other one in this section) I think it's worth a shot. It's getting squashed in the page nav and is pretty long for a heading in general.

erinkcochran87 · 2022-09-22T16:13:08Z

docs/content/concepts/partitions-schedules-sensors/sensors.mdx

+
+#### Materializing a daily asset partition if both upstream daily partitions are materialized
+
+The following example monitors two upstream daily-partitioned assets, kicking off a run in the downstream daily-partitioned asset if any upstream daily partition is replaced and the other upstream daily partition has an existing materialization.


I recommend breaking this sentence up into two, at a minimum. It's currently a bit hard to read + absorb due to its length.

…laire/docs/partitioned-multi-asset-sensor-examples

clairelin135 · 2022-09-26T22:40:09Z

@sryza @erinkcochran87 I added updates to this page per your feedback, including:

Featuring the multi-asset monitoring example at the forefront (materializing a daily asset when upstream partitions from 2 assets are materialized). Also deleted the original daily -> daily partition asset example for brevity.
Formatting changes and text rewording
Miscellaneous code changes

I think this is ready for you to take another look!

docs/content/concepts/partitions-schedules-sensors/sensors.mdx

sryza · 2022-09-26T23:59:15Z

docs/content/concepts/partitions-schedules-sensors/sensors.mdx

+    asset_keys=[AssetKey("upstream_daily_1"), AssetKey("upstream_daily_2")],
+    job=downstream_daily_job,
+)
+def trigger_daily_asset_if_all_upstream_partitions_materialized(context):


This is a pretty complex implementation for such a common use case. I wonder if there's additional utility methods we could supply to make it simpler?

Hm... I updated this example to take your feedback into account.

One small optimization I can think of is to allow all_partitions_materialized to accept a list of asset keys, or use all monitored asset keys by default. But aside from that, it's hard to think of more because we are constrained to advancing the cursor for each monitored asset key sequentially

What if we had a method that returned a dictionary that mapped each partition key to a list or set of asset keys that had materializations with that partition? (We would probably want to exclude partitions with no materializations or something). Then the user could just iterate through that list?

Yeah, we could do that. If we did that we'd have to advance all of the cursors in that call (unless if we plan on passing the materialization objects to the user within the call).

(unless if we plan on passing the materialization objects to the user within the call).

Is there a downside to doing that? I think in general it's better to separate the methods that return information from the methods that advance the cursor.

I think the only downside is that the returned object gets a little bit more complicated, it would be Mapping[str, Mapping[AssetKey, List[EventLogEntry]]]

Agree that we want to avoid advancing the cursor in this call though

Could it just be Mapping[str, Tuple[AssetKey, EventLogEntry]? I.e. only return the latest entry for that asset-partition combo?

Yeah, we can do that.

One nuance with doing something like this is that the materializations will be returned out of order (from the order they occurred). So if we provided this method, users would have to use context.advance_all_cursors instead of context.advance_cursor.

For example, if asset A had partition A materialized and then partition B, and asset B had partition B materialized and then partition A, this method would return:

{ partition A: [(Asset A, partition a materialization), (Asset B, partition B materialization)], partition B: [(asset A, partition b materialization), (Asset B, partition B materialization)] }

And so calling advance_cursor (instead of advance_all_cursors) for partition B after partition A would backtrack the cursor for partition B

sryza · 2022-09-27T00:00:44Z

This is looking in the right direction! I like it how it is with 2 examples instead of 3.

clairelin135 · 2022-09-30T23:44:13Z

@sryza The code example of kicking off a downstream partition when either of the corresponding upstream partitions are materialized has been updated, following the implementation added in #9856

clairelin135 · 2022-10-03T18:11:23Z

@sryza This PR is now updated with the latest_materialization_by_partition_and_asset method merged in #9856

I think it's ready for you to take another look!

sryza · 2022-10-03T18:24:23Z

docs/content/concepts/partitions-schedules-sensors/sensors.mdx

+        partition,
+        materializations_by_asset,
+    ) in context.latest_materialization_records_by_partition_and_asset().items():
+        for asset_key, materialization in materializations_by_asset.items():


could this inner for loop be replaced with:

if materializations_by_asset.keys() == context.asset_keys: run_requests.append(downstream_daily_job.run_request_for_partition(partition)) for asset_key, record in materializations_by_asset.items(): context.advance_cursor({asset_key: record})

This sensor is currently designed to update a downstream asset when a partitioned materialization occurs and the same partition in the other upstream assets is also materialized.

If we replaced the inner loop, then this sensor would only produce a run request when new materializations occur for the same partition for every upstream asset (and not when one of those partitions is individually replaced).

I see. Is there a reason we prefer that behavior over behavior that only rematerializes if all parents are rematerialized? I could see the case for either, but smaller simpler example code seems like a good tiebreaker.

I think the "and" case is actually more complex. So say:

Downstream asset monitors upstreams A and B

A materializes partitions 1 and 2, in that order

B materializes partitions 3, 2, 1 in that order

The cursor can't actually be updated unless if the set of partitions in the first N materializations are the same for upstreams A and B. So in this case, until we get a partition 3 materialization in asset A, we can't advance the cursor for B (and thus can't advance the cursor for A)

Oh man, this is gnarly. Thinking through all this stuff makes me think about how easy it would be for a user to mess this up.

If we had infinite resources, I think we'd ideally store a cursor component for every (asset key, partition) tuple instead of just for every asset key, right? Would that make programming against this significantly simpler? If so, I wonder if there's some data structure that we could use to make this possible.

For example, what if the cursor was a Mapping[AssetKey, Sequence[StorageIdPartitionKeyRange]]? Where each StorageIdPartitionKeyRange had start_storage_id, end_storage_id, start_partition_key, and end_partition_key fields.

{AssetKey("asset1"): [StorageIdPartitionKeyRange(1, 5, "2020-01-01", "2020-05-05")]} would mean that our sensor has handled every event that both has a storage_id between 1 and 5 and has a partition key between "2020-01-01" and "2020-05-05".

…laire/docs/partitioned-multi-asset-sensor-examples

clairelin135 · 2022-10-11T00:28:35Z

@erinkcochran87 @sryza The implementation for multi assets is now merged! I've updated the docs to contain examples of the new functionality and broken out the asset sensor content into its own page to prevent the page from growing too unwieldy.

Let me know how this looks! https://dagster-hkdzmk6y2-elementl.vercel.app/concepts/partitions-schedules-sensors/asset-sensors

sryza

Left a few small comments, but otherwise, this looks great to me.

clairelin135 · 2022-10-11T16:50:29Z

@sryza oddly, I'm not seeing any new comments from you

sryza · 2022-10-11T17:33:59Z

@clairelin135 they're on the Vercel preview

clairelin135 added 2 commits September 16, 2022 09:58

first stab

379bbef

update documentation

44361fb

undo formatting updates

d021bed

clairelin135 force-pushed the claire/docs/partitioned-multi-asset-sensor-examples branch from 3e3afda to d021bed Compare September 16, 2022 22:09

vercel bot deployed to Preview – dagster September 16, 2022 22:11 View deployment

clairelin135 marked this pull request as ready for review September 16, 2022 22:18

clairelin135 requested review from sryza, erinkcochran87 and jamiedemaria September 16, 2022 22:18

sryza reviewed Sep 21, 2022

View reviewed changes

erinkcochran87 changed the title ~~Partitioned multi asset sensor examples~~ [docs] - Partitioned multi asset sensor examples Sep 22, 2022

erinkcochran87 added the area: docs Related to documentation in general label Sep 22, 2022

erinkcochran87 reviewed Sep 22, 2022

View reviewed changes

erinkcochran87 assigned clairelin135 Sep 22, 2022

Merge branch 'master' of https://github.com/dagster-io/dagster into c…

9dff625

…laire/docs/partitioned-multi-asset-sensor-examples

vercel bot deployed to Preview – dagster September 26, 2022 22:35 View deployment

update examples and copy

2f1ce3a

clairelin135 force-pushed the claire/docs/partitioned-multi-asset-sensor-examples branch from 35838cb to 2f1ce3a Compare September 26, 2022 22:36

vercel bot deployed to Preview – dagster September 26, 2022 22:37 View deployment

clairelin135 requested review from sryza and erinkcochran87 September 26, 2022 22:40

sryza reviewed Sep 26, 2022

View reviewed changes

update code example

009fde0

vercel bot deployed to Preview – dagster September 27, 2022 22:07 View deployment

update code example

0f102d9

vercel bot deployed to Preview – dagster September 30, 2022 23:45 View deployment

update doc example

b201cf7

vercel bot deployed to Preview – dagster October 3, 2022 18:04 View deployment

merge

0e71f16

vercel bot deployed to Preview – dagster October 3, 2022 18:10 View deployment

sryza reviewed Oct 3, 2022

View reviewed changes

move asset sensors to new page

99640fb

vercel bot deployed to Preview – dagster October 10, 2022 23:33 View deployment

clairelin135 added 2 commits October 10, 2022 16:52

Merge branch 'master' of https://github.com/dagster-io/dagster into c…

0037600

…laire/docs/partitioned-multi-asset-sensor-examples

update tests

d738b33

vercel bot deployed to Preview – dagster October 11, 2022 00:24 View deployment

sryza approved these changes Oct 11, 2022

View reviewed changes

vercel bot deployed to Preview – dagster October 11, 2022 18:35 View deployment

merge conflicts

e45f415

clairelin135 force-pushed the claire/docs/partitioned-multi-asset-sensor-examples branch from 4023324 to e45f415 Compare October 11, 2022 18:38

vercel bot deployed to Preview – dagster October 11, 2022 18:39 View deployment

fix tests

456750f

clairelin135 merged commit f91dc3b into master Oct 11, 2022

clairelin135 deleted the claire/docs/partitioned-multi-asset-sensor-examples branch October 11, 2022 19:11


		Returns a dictionary mapping the `AssetKey` for each monitored asset to the most recent materialization record. If there is no materialization event, the mapped value will be `None`
		\| Method \| Description \|


		If a partition mapping is not defined, Dagster will use the default partition mapping, which is the <PyObject object="TimeWindowPartitionMapping"/> for time window partitions definitions and the <PyObject object="IdentityPartitionMapping"/> for other partitions definitions. The <PyObject object="TimeWindowPartitionMapping"/> will map an upstream partition to the downstream partitions that overlap with it.

		#### Additional examples

		@@ -721,6 +761,89 @@ def uses_db_connection():

		If a resource you want to initialize has dependencies on other resources, those can be included in the dictionary passed to <PyObject object="build_resources"/>. For more in-depth usage, check out the [Initializing Resources Outside of Execution](/concepts/resources#initializing-resources-outside-of-execution) section.

		### Monitoring asset partitions with sensors


		#### Materializing a daily asset partition if both upstream daily partitions are materialized

		The following example monitors two upstream daily-partitioned assets, kicking off a run in the downstream daily-partitioned asset if any upstream daily partition is replaced and the other upstream daily partition has an existing materialization.

		)


		@multi_asset_sensor(

[docs] - Partitioned multi asset sensor examples #9722

[docs] - Partitioned multi asset sensor examples #9722

Conversation

clairelin135 commented Sep 16, 2022

vercel bot commented Sep 16, 2022 • edited

clairelin135 commented Sep 16, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clairelin135 commented Sep 26, 2022

Choose a reason for hiding this comment

clairelin135 Sep 27, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clairelin135 Sep 30, 2022 • edited

Choose a reason for hiding this comment

sryza commented Sep 27, 2022

clairelin135 commented Sep 30, 2022

clairelin135 commented Oct 3, 2022

Choose a reason for hiding this comment

clairelin135 Oct 3, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clairelin135 Oct 3, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sryza Oct 3, 2022 • edited

Choose a reason for hiding this comment

clairelin135 commented Oct 11, 2022

sryza left a comment

Choose a reason for hiding this comment

clairelin135 commented Oct 11, 2022

sryza commented Oct 11, 2022

vercel bot commented Sep 16, 2022 •

edited

clairelin135 commented Sep 16, 2022 •

edited

clairelin135 Sep 27, 2022 •

edited

clairelin135 Sep 30, 2022 •

edited

clairelin135 Oct 3, 2022 •

edited

clairelin135 Oct 3, 2022 •

edited

sryza Oct 3, 2022 •

edited