-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve tty error logging when buildkit vertex is unknown #2188
Improve tty error logging when buildkit vertex is unknown #2188
Conversation
Creates a generic "system" group in the tty output which captures buildkit events that report a non-dagger vertex name. This happens currently when using core.#Dockerfile actions since Dagger delegates the LLB generation to buildkit through it's frontend and we don't get meaningful events that we can correlate from Dagger's side Signed-off-by: Marcos Lilljedahl <marcosnils@gmail.com>
// setup logger here | ||
_, isDockerfileTask := t.(*task.DockerfileTask) | ||
if isDockerfileTask { | ||
fn(log.With().Str("task", "system").Logger()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this part -- if I understand correctly, for Dockerfile tasks, we're logging things twice -- once with task = ""
and once again with task = "system"
, then on the tty/plain writers we rewrite "" -> "system"
.
I don't understand why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point we're not logging with task = ""
since we're inside the cueflow taskFunc and we have the correct Dagger task names.
We're logging twice here because otherwise, the task action name won't be printed to the tty. So, for dockerfile
actions, at this point we need to log twice so both the dagger dockerfile action name and the system
logger get recorded by the tty output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm slow today, still not following :)
We're logging twice here because otherwise, the task action name won't be printed to the tty
The second log call doesn't contain the task name, right? I don't see why it would change what's printed.
Omitting the debug logs for simplicity, this is going to produce:
{"task": "actions.foo", "message": "computing"} # already had this
{"task": "system", "message": "computing"} # new entry with this PR
{"task": "actions.foo", "message": "completed"} # already had this
{"task": "system", "message": "completed"} # new entry with this PR
I don't understand the goal of the system
entry since we can't rely on computing/completed etc (as you can have a bunch of Dockerfile running in parallel)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm slow today, still not following :)
np haha! It's likely that I'm missing something here as well.
The second log call doesn't contain the task name, right? I don't see why it would change what's printed.
Correct, but we need the second log so the system
group state is updated and handled by the tty output here: https://github.com/dagger/dagger/blob/main/cmd/dagger/logger/tty.go#L262
I don't understand the goal of the system entry since we can't rely on computing/completed etc (as you can have a bunch of Dockerfile running in parallel)
You're seeing this in the plain
output, but the tty
output needs this state transition so it can correctly display if something failed or not. In the case you have multiple DAG nodes with Dockerfile actions, system
will have the state of the last execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it!
I think there are some edge cases as the "last" execution is fuzzy: you can have N Dockerfile running in parallel, e.g. the state changes will not be linear (#1 computing / #2 computing / #2 completed / #1 completed), so tty will "flip" between computing/completed for the same docker build.
However, I don't know if this is really a big problem in practice and since this PR is a giant improvement already over what we have, I vote we merge it as is and see how it goes :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not merge this yet. I want to explore something to see if it can be further improved before merging
@aluzzardi added a new commit here that improves the way we're handling state transitions for the new Once thing that I've noticed and spoke with @helderco is that if you have a Dagger plan which contains parallel tasks, if one fails the others get cancelled. Here's an example: --- test.cue ---
Here's the output that I get when running In the case of Below is an example with the -- Dockerfile.fail --
-- Dockerfile.long --
-- test.cue --
Before the 2nd commit in this PR output was: ^ no output was generated here since "cancelled" is overriding the "failed" state for the "system" group After the 2nd commit, output now is : ^ error is properly shown upon cancellation. |
Changes `task.State` from string to iota to better hande state transitions. It also adds a new method in `State` so we can have the logic if states can be transitioned in a single place since it doesn't make sense to transition from failed to completed or cancelled to computing, and so on.. Signed-off-by: Marcos Lilljedahl <marcosnils@gmail.com>
LGTM! Test failure is unrelated -- restarted CI |
Creates a generic "system" group in the tty output similarly to the plain one which captures
buildkit events that report a non-dagger vertex name. This happens
currently when using core.#Dockerfile actions since Dagger delegates the
LLB generation to buildkit through it's frontend and we don't get
meaningful events that we can correlate from Dagger's side.
Given the following dockerfile:
Old output :
New output:
cc @aluzzardi @helderco
Fixes #2044
Related #613