Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch backfills logs in GQL #22038

Merged
merged 22 commits into from
Jun 5, 2024
Merged

fetch backfills logs in GQL #22038

merged 22 commits into from
Jun 5, 2024

Conversation

jamiedemaria
Copy link
Contributor

@jamiedemaria jamiedemaria commented May 22, 2024

Summary & Motivation

Fetches backfill logs using the captured log manager.

Adds functionality to the captured log manager to read a sequence of files in a directory as if they are a single file by sorting the file names and then reading through the files N lines at a time. The majority of the logic is in the base CapturedLogManager class, but each implementor class will need to implement a method to get the log keys for a given directory. I implemented this method for the LocalComputeLogManager to enable testing, and follow up PRs can implement it for the remaining ComputeLogManagers. internal pr https://github.com/dagster-io/internal/pull/9879 that implements the method for the cloud compute log manager

Fetching logs via GQL is gated on an instance method we can override to control roll out

How I Tested These Changes

new unit tests for the pagination scheme, added a unit test where we run a backfill and then fetch the logs

Copy link
Contributor Author

jamiedemaria commented May 22, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @jamiedemaria and the rest of your teammates on Graphite Graphite

Copy link

github-actions bot commented May 23, 2024

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-g7avhckmg-elementl.vercel.app
https://jamie-fetch-backfill-logs.core-storybook.dagster-docs.io

Built with commit d8bdf25.
This pull request is being automatically deployed with vercel-action

@jamiedemaria jamiedemaria marked this pull request as ready for review May 23, 2024 21:08
Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the right path - take care of the stuff you are planning and get it under test and should be good to go

@jamiedemaria jamiedemaria changed the title [prototype] fetch backfills logs in GQL fetch backfills logs in GQL May 29, 2024

# This should be emitted as a log message to indicate that no more logs will be emitted for that process
# this lets the log reader know that it can stop polling for more logs
LOG_STREAM_COMPLETED_SIGIL = "LOGS COMPLETED"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having to remember to emit this log whenever a process completes so that we know the logs are over seems brittle...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a backfill completed / backfill failed log message? maybe we should just key on that?

or come up with some other scheme ie just see the backfill is in a terminal state and expect no more things to be added

the latter would mean probably two layer interpretation, where the compute log methods return whether or not it knows there are are more and then at the layer above you combine that information with knowing if the thing producing logs is done to determine the final hasMore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a log message at the end of the backfill we could look for. I was thinking about if we wanted to use this for non-backfill logs we'd want to have some kind of shared indicator that the logs are done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ended up changing the meaning of has_more for the captured log manager. It returns true only if there are more logs it could read right then. Then the caller can compare that has_more value with what it knows about the state of the process to determine if there are actually more logs. So for our case, if the log manager says has_more=False and the backfill is in the COMPLETED state then we know there are no more logs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename to has_more_now in the lower layer for clarity

# NOTE: This method was introduced to support backfill logs, which are always stored as .err files.
# Thus the implementations only look for .err files when determining the log_keys. If other file extensions
# need to be supported, an io_type parameter will need to be added to this method
raise NotImplementedError("Must implement get_log_keys_for_log_key_prefix")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having this raise an error rather than be an abstract method so i dont have to implement it for all CapturedLogManagers yet. Follow up PRs will add those implementations and then I can switch to abstractmethod

Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is close, but lets do one more iteration pass


# This should be emitted as a log message to indicate that no more logs will be emitted for that process
# this lets the log reader know that it can stop polling for more logs
LOG_STREAM_COMPLETED_SIGIL = "LOGS COMPLETED"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a backfill completed / backfill failed log message? maybe we should just key on that?

or come up with some other scheme ie just see the backfill is in a terminal state and expect no more things to be added

the latter would mean probably two layer interpretation, where the compute log methods return whether or not it knows there are are more and then at the layer above you combine that information with knowing if the thing producing logs is done to determine the final hasMore

Copy link
Member

@prha prha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing blocking here, but some nits about what methods are on the captured log manager vs pushing that functionality into the caller.


return log_lines

def read_json_log_lines_for_log_key_prefix(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this feels like a very narrow API for the captured log manager. Seems like this could be a utility function that wraps the captured log manager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you thinking something like?

# some utils file

def read_json_log_lines_for_log_key_prefix(captured_log_manager, log_key_prefix, cursor, num_lines):
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alex and i were talking about how there's some general utility in the captured log manager being able to read logs from a sequence of files as if it's one long file, but i agree that in its current state, it feels very narrow. Some ideas:

  • make the method on captured log manager just return log lines, have a utility function that converts to json
  • make the whole thing a util function, if we want to use this functionality somewhere else in the future we can refactor to make generic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess since we dont have a clear immediate use for this functionality, having it be a util function could be more incremental

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the method on captured log manager just return log lines, have a utility function that converts to json

yeah, it feels cleaner to me to just have it return log lines. the fact that it's structured json is just an implementation detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds good to me. one thing i was also running up against with this method was using log_key_prefix in the name. I want to make sure it gets distinguished from a method that reads log lines from a single file, but log_key_prefix isn't very explanatory of what's actually going on. idk if there's like a standard term for reading a series of files as one file that we could use? i'm going to mull it over and do some searching around and see if i can come up with anything better

directory defined by the log key prefix and creating a log_key for each file in the directory.
"""
# NOTE: This method was introduced to support backfill logs, which are always stored as .err files.
# Thus the implementations only look for .err files when determining the log_keys. If other file extensions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason why the initial implementation shouldn't also look for .out files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really. mostly just reducing surface area to test against, but it wouldn't be hard to add. I have some follow up PRs planned for this stack, so i'll add expanding to .out files as part of that so it doesn't block the backfill work from landing

Copy link
Collaborator

@bengotow bengotow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Will let others weigh in on the Python, but I will start updating the UI to match this implementation tonight.

Comment on lines 586 to 595
and self.status
in [
GrapheneBulkActionStatus.COMPLETED,
GrapheneBulkActionStatus.FAILED,
GrapheneBulkActionStatus.CANCELED,
]
):
has_more = False
else:
has_more = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove the self.status check from this logic -- right now, if the action is in progress, it looks like we'll return has_more=True continuously.

I think this is a good idea (we do want it to make another request and ask for more logs), but the UI will probably want to fetch the next page "immediately" if the cursor is behind, and "in 5 seconds" if it's up to the latest cursor and the backfill is in progress. Because of the different delays, it might be better to put this part of the logic on the front-end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sounds good to me - made the update

@jamiedemaria jamiedemaria dismissed alangenfeld’s stale review June 5, 2024 16:37

re-requested review

Base automatically changed from jamie/store-backfill-logs to master June 5, 2024 18:26
@jamiedemaria jamiedemaria merged commit 18c20aa into master Jun 5, 2024
2 checks passed
@jamiedemaria jamiedemaria deleted the jamie/fetch-backfill-logs branch June 5, 2024 19:00
benpankow pushed a commit that referenced this pull request Jun 5, 2024
## Summary & Motivation
Fetches backfill logs using the captured log manager. 

Adds functionality to the captured log manager to read a sequence of
files in a directory as if they are a single file by sorting the file
names and then reading through the files N lines at a time. The majority
of the logic is in the base CapturedLogManager class, but each
implementor class will need to implement a method to get the log keys
for a given directory. I implemented this method for the
LocalComputeLogManager to enable testing, and follow up PRs can
implement it for the remaining ComputeLogManagers. internal pr
dagster-io/internal#9879 that implements the
method for the cloud compute log manager

Fetching logs via GQL is gated on an instance method we can override to
control roll out

## How I Tested These Changes
new unit tests for the pagination scheme, added a unit test where we run
a backfill and then fetch the logs
salazarm pushed a commit that referenced this pull request Jun 10, 2024
## Summary & Motivation
Fetches backfill logs using the captured log manager. 

Adds functionality to the captured log manager to read a sequence of
files in a directory as if they are a single file by sorting the file
names and then reading through the files N lines at a time. The majority
of the logic is in the base CapturedLogManager class, but each
implementor class will need to implement a method to get the log keys
for a given directory. I implemented this method for the
LocalComputeLogManager to enable testing, and follow up PRs can
implement it for the remaining ComputeLogManagers. internal pr
dagster-io/internal#9879 that implements the
method for the cloud compute log manager

Fetching logs via GQL is gated on an instance method we can override to
control roll out

## How I Tested These Changes
new unit tests for the pagination scheme, added a unit test where we run
a backfill and then fetch the logs
danielgafni pushed a commit to danielgafni/dagster that referenced this pull request Jun 18, 2024
## Summary & Motivation
Fetches backfill logs using the captured log manager. 

Adds functionality to the captured log manager to read a sequence of
files in a directory as if they are a single file by sorting the file
names and then reading through the files N lines at a time. The majority
of the logic is in the base CapturedLogManager class, but each
implementor class will need to implement a method to get the log keys
for a given directory. I implemented this method for the
LocalComputeLogManager to enable testing, and follow up PRs can
implement it for the remaining ComputeLogManagers. internal pr
https://github.com/dagster-io/internal/pull/9879 that implements the
method for the cloud compute log manager

Fetching logs via GQL is gated on an instance method we can override to
control roll out

## How I Tested These Changes
new unit tests for the pagination scheme, added a unit test where we run
a backfill and then fetch the logs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants