fetch backfills logs in GQL #22038

jamiedemaria · 2024-05-22T19:34:02Z

Summary & Motivation

Fetches backfill logs using the captured log manager.

Adds functionality to the captured log manager to read a sequence of files in a directory as if they are a single file by sorting the file names and then reading through the files N lines at a time. The majority of the logic is in the base CapturedLogManager class, but each implementor class will need to implement a method to get the log keys for a given directory. I implemented this method for the LocalComputeLogManager to enable testing, and follow up PRs can implement it for the remaining ComputeLogManagers. internal pr https://github.com/dagster-io/internal/pull/9879 that implements the method for the cloud compute log manager

Fetching logs via GQL is gated on an instance method we can override to control roll out

How I Tested These Changes

new unit tests for the pagination scheme, added a unit test where we run a backfill and then fetch the logs

jamiedemaria · 2024-05-22T19:34:20Z

fetch backfills logs in GQL #22038 👈
store backfill logs using instigation logger #22003
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @jamiedemaria and the rest of your teammates on Graphite

github-actions · 2024-05-23T18:57:51Z

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-g7avhckmg-elementl.vercel.app
https://jamie-fetch-backfill-logs.core-storybook.dagster-docs.io

Built with commit d8bdf25.
This pull request is being automatically deployed with vercel-action

python_modules/dagster-graphql/dagster_graphql/schema/backfill.py

alangenfeld

on the right path - take care of the stuff you are planning and get it under test and should be good to go

python_modules/dagster-graphql/dagster_graphql/schema/backfill.py

python_modules/dagster/dagster/_core/storage/captured_log_manager.py

python_modules/dagster-graphql/dagster_graphql/schema/backfill.py

jamiedemaria · 2024-05-30T02:53:00Z

python_modules/dagster/dagster/_core/captured_log_api.py

+
+# This should be emitted as a log message to indicate that no more logs will be emitted for that process
+# this lets the log reader know that it can stop polling for more logs
+LOG_STREAM_COMPLETED_SIGIL = "LOGS COMPLETED"


having to remember to emit this log whenever a process completes so that we know the logs are over seems brittle...

do we have a backfill completed / backfill failed log message? maybe we should just key on that?

or come up with some other scheme ie just see the backfill is in a terminal state and expect no more things to be added

the latter would mean probably two layer interpretation, where the compute log methods return whether or not it knows there are are more and then at the layer above you combine that information with knowing if the thing producing logs is done to determine the final hasMore

there is a log message at the end of the backfill we could look for. I was thinking about if we wanted to use this for non-backfill logs we'd want to have some kind of shared indicator that the logs are done

ended up changing the meaning of has_more for the captured log manager. It returns true only if there are more logs it could read right then. Then the caller can compare that has_more value with what it knows about the state of the process to determine if there are actually more logs. So for our case, if the log manager says has_more=False and the backfill is in the COMPLETED state then we know there are no more logs

maybe rename to has_more_now in the lower layer for clarity

jamiedemaria · 2024-05-30T15:58:25Z

python_modules/dagster/dagster/_core/storage/captured_log_manager.py

+        # NOTE: This method was introduced to support backfill logs, which are always stored as .err files.
+        # Thus the implementations only look for .err files when determining the log_keys. If other file extensions
+        # need to be supported, an io_type parameter will need to be added to this method
+        raise NotImplementedError("Must implement get_log_keys_for_log_key_prefix")


having this raise an error rather than be an abstract method so i dont have to implement it for all CapturedLogManagers yet. Follow up PRs will add those implementations and then I can switch to abstractmethod

alangenfeld

I think this is close, but lets do one more iteration pass

alangenfeld · 2024-05-30T16:00:36Z

python_modules/dagster/dagster/_core/captured_log_api.py

+
+# This should be emitted as a log message to indicate that no more logs will be emitted for that process
+# this lets the log reader know that it can stop polling for more logs
+LOG_STREAM_COMPLETED_SIGIL = "LOGS COMPLETED"


do we have a backfill completed / backfill failed log message? maybe we should just key on that?

or come up with some other scheme ie just see the backfill is in a terminal state and expect no more things to be added

the latter would mean probably two layer interpretation, where the compute log methods return whether or not it knows there are are more and then at the layer above you combine that information with knowing if the thing producing logs is done to determine the final hasMore

python_modules/dagster/dagster/_core/storage/captured_log_manager.py

python_modules/dagster/dagster_tests/daemon_tests/test_backfill.py

python_modules/dagster-graphql/dagster_graphql/schema/backfill.py

prha

Nothing blocking here, but some nits about what methods are on the captured log manager vs pushing that functionality into the caller.

prha · 2024-05-30T17:17:12Z

python_modules/dagster/dagster/_core/storage/captured_log_manager.py

+
+        return log_lines
+
+    def read_json_log_lines_for_log_key_prefix(


nit: this feels like a very narrow API for the captured log manager. Seems like this could be a utility function that wraps the captured log manager.

are you thinking something like?

# some utils file def read_json_log_lines_for_log_key_prefix(captured_log_manager, log_key_prefix, cursor, num_lines): ...

alex and i were talking about how there's some general utility in the captured log manager being able to read logs from a sequence of files as if it's one long file, but i agree that in its current state, it feels very narrow. Some ideas:

make the method on captured log manager just return log lines, have a utility function that converts to json

make the whole thing a util function, if we want to use this functionality somewhere else in the future we can refactor to make generic

i guess since we dont have a clear immediate use for this functionality, having it be a util function could be more incremental

make the method on captured log manager just return log lines, have a utility function that converts to json

yeah, it feels cleaner to me to just have it return log lines. the fact that it's structured json is just an implementation detail.

that sounds good to me. one thing i was also running up against with this method was using log_key_prefix in the name. I want to make sure it gets distinguished from a method that reads log lines from a single file, but log_key_prefix isn't very explanatory of what's actually going on. idk if there's like a standard term for reading a series of files as one file that we could use? i'm going to mull it over and do some searching around and see if i can come up with anything better

prha · 2024-05-30T17:17:42Z

python_modules/dagster/dagster/_core/storage/captured_log_manager.py

+        directory defined by the log key prefix and creating a log_key for each file in the directory.
+        """
+        # NOTE: This method was introduced to support backfill logs, which are always stored as .err files.
+        # Thus the implementations only look for .err files when determining the log_keys. If other file extensions


is there any reason why the initial implementation shouldn't also look for .out files?

not really. mostly just reducing surface area to test against, but it wouldn't be hard to add. I have some follow up PRs planned for this stack, so i'll add expanding to .out files as part of that so it doesn't block the backfill work from landing

bengotow

This looks great! Will let others weigh in on the Python, but I will start updating the UI to match this implementation tonight.

js_modules/dagster-ui/packages/ui-core/src/graphql/types.ts

bengotow · 2024-06-03T22:57:32Z

python_modules/dagster-graphql/dagster_graphql/schema/backfill.py

+            and self.status
+            in [
+                GrapheneBulkActionStatus.COMPLETED,
+                GrapheneBulkActionStatus.FAILED,
+                GrapheneBulkActionStatus.CANCELED,
+            ]
+        ):
+            has_more = False
+        else:
+            has_more = True


I think we should remove the self.status check from this logic -- right now, if the action is in progress, it looks like we'll return has_more=True continuously.

I think this is a good idea (we do want it to make another request and ask for more logs), but the UI will probably want to fetch the next page "immediately" if the cursor is behind, and "in 5 seconds" if it's up to the latest cursor and the backfill is in progress. Because of the different delays, it might be better to put this part of the logic on the front-end.

ok sounds good to me - made the update

re-requested review

## Summary & Motivation Fetches backfill logs using the captured log manager. Adds functionality to the captured log manager to read a sequence of files in a directory as if they are a single file by sorting the file names and then reading through the files N lines at a time. The majority of the logic is in the base CapturedLogManager class, but each implementor class will need to implement a method to get the log keys for a given directory. I implemented this method for the LocalComputeLogManager to enable testing, and follow up PRs can implement it for the remaining ComputeLogManagers. internal pr dagster-io/internal#9879 that implements the method for the cloud compute log manager Fetching logs via GQL is gated on an instance method we can override to control roll out ## How I Tested These Changes new unit tests for the pagination scheme, added a unit test where we run a backfill and then fetch the logs

## Summary & Motivation Fetches backfill logs using the captured log manager. Adds functionality to the captured log manager to read a sequence of files in a directory as if they are a single file by sorting the file names and then reading through the files N lines at a time. The majority of the logic is in the base CapturedLogManager class, but each implementor class will need to implement a method to get the log keys for a given directory. I implemented this method for the LocalComputeLogManager to enable testing, and follow up PRs can implement it for the remaining ComputeLogManagers. internal pr https://github.com/dagster-io/internal/pull/9879 that implements the method for the cloud compute log manager Fetching logs via GQL is gated on an instance method we can override to control roll out ## How I Tested These Changes new unit tests for the pagination scheme, added a unit test where we run a backfill and then fetch the logs

jamiedemaria mentioned this pull request May 22, 2024

store backfill logs using instigation logger #22003

Merged

jamiedemaria force-pushed the jamie/store-backfill-logs branch from 8378bb2 to f7d4185 Compare May 23, 2024 18:53

jamiedemaria force-pushed the jamie/fetch-backfill-logs branch from 9d68b9d to 0557a51 Compare May 23, 2024 18:54

jamiedemaria commented May 23, 2024

View reviewed changes

jamiedemaria force-pushed the jamie/fetch-backfill-logs branch from 0557a51 to 613a57b Compare May 23, 2024 20:58

jamiedemaria marked this pull request as ready for review May 23, 2024 21:08

jamiedemaria requested review from alangenfeld, prha and bengotow May 28, 2024 14:00

alangenfeld requested changes May 28, 2024

View reviewed changes

jamiedemaria changed the title ~~[prototype] fetch backfills logs in GQL~~ fetch backfills logs in GQL May 29, 2024

jamiedemaria force-pushed the jamie/store-backfill-logs branch from 1b1b899 to f6c7aaa Compare May 30, 2024 01:47

jamiedemaria force-pushed the jamie/fetch-backfill-logs branch from 613a57b to be5955b Compare May 30, 2024 01:48