-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fetch backfills logs in GQL #22038
fetch backfills logs in GQL #22038
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @jamiedemaria and the rest of your teammates on |
8378bb2
to
f7d4185
Compare
9d68b9d
to
0557a51
Compare
Deploy preview for dagit-core-storybook ready! ✅ Preview Built with commit d8bdf25. |
python_modules/dagster-graphql/dagster_graphql/schema/backfill.py
Outdated
Show resolved
Hide resolved
python_modules/dagster-graphql/dagster_graphql/schema/backfill.py
Outdated
Show resolved
Hide resolved
python_modules/dagster-graphql/dagster_graphql/schema/backfill.py
Outdated
Show resolved
Hide resolved
python_modules/dagster-graphql/dagster_graphql/schema/backfill.py
Outdated
Show resolved
Hide resolved
python_modules/dagster-graphql/dagster_graphql/schema/backfill.py
Outdated
Show resolved
Hide resolved
0557a51
to
613a57b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the right path - take care of the stuff you are planning and get it under test and should be good to go
python_modules/dagster-graphql/dagster_graphql/schema/backfill.py
Outdated
Show resolved
Hide resolved
python_modules/dagster-graphql/dagster_graphql/schema/backfill.py
Outdated
Show resolved
Hide resolved
python_modules/dagster-graphql/dagster_graphql/schema/backfill.py
Outdated
Show resolved
Hide resolved
1b1b899
to
f6c7aaa
Compare
613a57b
to
be5955b
Compare
python_modules/dagster/dagster/_core/storage/captured_log_manager.py
Outdated
Show resolved
Hide resolved
python_modules/dagster-graphql/dagster_graphql/schema/backfill.py
Outdated
Show resolved
Hide resolved
python_modules/dagster-graphql/dagster_graphql/schema/backfill.py
Outdated
Show resolved
Hide resolved
|
||
# This should be emitted as a log message to indicate that no more logs will be emitted for that process | ||
# this lets the log reader know that it can stop polling for more logs | ||
LOG_STREAM_COMPLETED_SIGIL = "LOGS COMPLETED" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having to remember to emit this log whenever a process completes so that we know the logs are over seems brittle...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a backfill completed / backfill failed log message? maybe we should just key on that?
or come up with some other scheme ie just see the backfill is in a terminal state and expect no more things to be added
the latter would mean probably two layer interpretation, where the compute log methods return whether or not it knows there are are more and then at the layer above you combine that information with knowing if the thing producing logs is done to determine the final hasMore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a log message at the end of the backfill we could look for. I was thinking about if we wanted to use this for non-backfill logs we'd want to have some kind of shared indicator that the logs are done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ended up changing the meaning of has_more
for the captured log manager. It returns true only if there are more logs it could read right then. Then the caller can compare that has_more
value with what it knows about the state of the process to determine if there are actually more logs. So for our case, if the log manager says has_more=False
and the backfill is in the COMPLETED state then we know there are no more logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename to has_more_now
in the lower layer for clarity
# NOTE: This method was introduced to support backfill logs, which are always stored as .err files. | ||
# Thus the implementations only look for .err files when determining the log_keys. If other file extensions | ||
# need to be supported, an io_type parameter will need to be added to this method | ||
raise NotImplementedError("Must implement get_log_keys_for_log_key_prefix") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having this raise an error rather than be an abstract method so i dont have to implement it for all CapturedLogManagers yet. Follow up PRs will add those implementations and then I can switch to abstractmethod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is close, but lets do one more iteration pass
|
||
# This should be emitted as a log message to indicate that no more logs will be emitted for that process | ||
# this lets the log reader know that it can stop polling for more logs | ||
LOG_STREAM_COMPLETED_SIGIL = "LOGS COMPLETED" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a backfill completed / backfill failed log message? maybe we should just key on that?
or come up with some other scheme ie just see the backfill is in a terminal state and expect no more things to be added
the latter would mean probably two layer interpretation, where the compute log methods return whether or not it knows there are are more and then at the layer above you combine that information with knowing if the thing producing logs is done to determine the final hasMore
python_modules/dagster/dagster_tests/daemon_tests/test_backfill.py
Outdated
Show resolved
Hide resolved
python_modules/dagster-graphql/dagster_graphql/schema/backfill.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing blocking here, but some nits about what methods are on the captured log manager vs pushing that functionality into the caller.
|
||
return log_lines | ||
|
||
def read_json_log_lines_for_log_key_prefix( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this feels like a very narrow API for the captured log manager. Seems like this could be a utility function that wraps the captured log manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you thinking something like?
# some utils file
def read_json_log_lines_for_log_key_prefix(captured_log_manager, log_key_prefix, cursor, num_lines):
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alex and i were talking about how there's some general utility in the captured log manager being able to read logs from a sequence of files as if it's one long file, but i agree that in its current state, it feels very narrow. Some ideas:
- make the method on captured log manager just return log lines, have a utility function that converts to json
- make the whole thing a util function, if we want to use this functionality somewhere else in the future we can refactor to make generic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess since we dont have a clear immediate use for this functionality, having it be a util function could be more incremental
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the method on captured log manager just return log lines, have a utility function that converts to json
yeah, it feels cleaner to me to just have it return log lines. the fact that it's structured json is just an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sounds good to me. one thing i was also running up against with this method was using log_key_prefix
in the name. I want to make sure it gets distinguished from a method that reads log lines from a single file, but log_key_prefix
isn't very explanatory of what's actually going on. idk if there's like a standard term for reading a series of files as one file that we could use? i'm going to mull it over and do some searching around and see if i can come up with anything better
directory defined by the log key prefix and creating a log_key for each file in the directory. | ||
""" | ||
# NOTE: This method was introduced to support backfill logs, which are always stored as .err files. | ||
# Thus the implementations only look for .err files when determining the log_keys. If other file extensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason why the initial implementation shouldn't also look for .out
files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really. mostly just reducing surface area to test against, but it wouldn't be hard to add. I have some follow up PRs planned for this stack, so i'll add expanding to .out files as part of that so it doesn't block the backfill work from landing
f6c7aaa
to
cee57bd
Compare
931f345
to
07e1168
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Will let others weigh in on the Python, but I will start updating the UI to match this implementation tonight.
and self.status | ||
in [ | ||
GrapheneBulkActionStatus.COMPLETED, | ||
GrapheneBulkActionStatus.FAILED, | ||
GrapheneBulkActionStatus.CANCELED, | ||
] | ||
): | ||
has_more = False | ||
else: | ||
has_more = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove the self.status check from this logic -- right now, if the action is in progress, it looks like we'll return has_more=True continuously.
I think this is a good idea (we do want it to make another request and ask for more logs), but the UI will probably want to fetch the next page "immediately" if the cursor is behind, and "in 5 seconds" if it's up to the latest cursor and the backfill is in progress. Because of the different delays, it might be better to put this part of the logic on the front-end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sounds good to me - made the update
cee57bd
to
96a0354
Compare
46c64f2
to
67ecdd7
Compare
97149ac
to
d8bdf25
Compare
## Summary & Motivation Fetches backfill logs using the captured log manager. Adds functionality to the captured log manager to read a sequence of files in a directory as if they are a single file by sorting the file names and then reading through the files N lines at a time. The majority of the logic is in the base CapturedLogManager class, but each implementor class will need to implement a method to get the log keys for a given directory. I implemented this method for the LocalComputeLogManager to enable testing, and follow up PRs can implement it for the remaining ComputeLogManagers. internal pr dagster-io/internal#9879 that implements the method for the cloud compute log manager Fetching logs via GQL is gated on an instance method we can override to control roll out ## How I Tested These Changes new unit tests for the pagination scheme, added a unit test where we run a backfill and then fetch the logs
## Summary & Motivation Fetches backfill logs using the captured log manager. Adds functionality to the captured log manager to read a sequence of files in a directory as if they are a single file by sorting the file names and then reading through the files N lines at a time. The majority of the logic is in the base CapturedLogManager class, but each implementor class will need to implement a method to get the log keys for a given directory. I implemented this method for the LocalComputeLogManager to enable testing, and follow up PRs can implement it for the remaining ComputeLogManagers. internal pr dagster-io/internal#9879 that implements the method for the cloud compute log manager Fetching logs via GQL is gated on an instance method we can override to control roll out ## How I Tested These Changes new unit tests for the pagination scheme, added a unit test where we run a backfill and then fetch the logs
## Summary & Motivation Fetches backfill logs using the captured log manager. Adds functionality to the captured log manager to read a sequence of files in a directory as if they are a single file by sorting the file names and then reading through the files N lines at a time. The majority of the logic is in the base CapturedLogManager class, but each implementor class will need to implement a method to get the log keys for a given directory. I implemented this method for the LocalComputeLogManager to enable testing, and follow up PRs can implement it for the remaining ComputeLogManagers. internal pr https://github.com/dagster-io/internal/pull/9879 that implements the method for the cloud compute log manager Fetching logs via GQL is gated on an instance method we can override to control roll out ## How I Tested These Changes new unit tests for the pagination scheme, added a unit test where we run a backfill and then fetch the logs
Summary & Motivation
Fetches backfill logs using the captured log manager.
Adds functionality to the captured log manager to read a sequence of files in a directory as if they are a single file by sorting the file names and then reading through the files N lines at a time. The majority of the logic is in the base CapturedLogManager class, but each implementor class will need to implement a method to get the log keys for a given directory. I implemented this method for the LocalComputeLogManager to enable testing, and follow up PRs can implement it for the remaining ComputeLogManagers. internal pr https://github.com/dagster-io/internal/pull/9879 that implements the method for the cloud compute log manager
Fetching logs via GQL is gated on an instance method we can override to control roll out
How I Tested These Changes
new unit tests for the pagination scheme, added a unit test where we run a backfill and then fetch the logs