Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve tty error logging when buildkit vertex is unknown #2188

Merged
merged 2 commits into from
Apr 19, 2022
Merged

Improve tty error logging when buildkit vertex is unknown #2188

merged 2 commits into from
Apr 19, 2022

Conversation

marcosnils
Copy link
Contributor

@marcosnils marcosnils commented Apr 13, 2022

Creates a generic "system" group in the tty output similarly to the plain one which captures
buildkit events that report a non-dagger vertex name. This happens
currently when using core.#Dockerfile actions since Dagger delegates the
LLB generation to buildkit through it's frontend and we don't get
meaningful events that we can correlate from Dagger's side.

Given the following dockerfile:

FROM node:10-alpine
RUN apk update
RUN some_error

Old output :

dagger do hello
[✔] client.filesystem.".".read                                                                         0.0s
[✗] actions.hello                                                                                      6.9s
                                                                                                       0.8s
5:18PM FTL failed to execute plan: task failed: actions.hello._build: process "/bin/sh -c some_error" did not complete successfully: exit code: 127

New output:

dagger do  hello
[✔] client.filesystem.".".read                                                                         0.1s
[✗] actions.hello                                                                                      1.8s
#5 0.083 fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/main/x86_64/APKINDEX.tar.gz
#5 0.605 fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/community/x86_64/APKINDEX.tar.gz
#5 0.858 v3.11.13-8-gaf7d80ff31 [http://dl-cdn.alpinelinux.org/alpine/v3.11/main]
#5 0.858 v3.11.11-124-gf2729ece5a [http://dl-cdn.alpinelinux.org/alpine/v3.11/community]
#5 0.858 OK: 11280 distinct packages available
#6 0.052 /bin/sh: some_error: not found
5:16PM FTL failed to execute plan: task failed: actions.hello._build: process "/bin/sh -c some_error" did not complete successfully: exit code: 127

cc @aluzzardi @helderco

Fixes #2044
Related #613

@marcosnils marcosnils marked this pull request as ready for review April 14, 2022 20:19
Creates a generic "system" group in the tty output which captures
buildkit events that report a non-dagger vertex name. This happens
currently when using core.#Dockerfile actions since Dagger delegates the
LLB generation to buildkit through it's frontend and we don't get
meaningful events that we can correlate from Dagger's side

Signed-off-by: Marcos Lilljedahl <marcosnils@gmail.com>
@marcosnils marcosnils changed the title wip: initial support to improve logging Improve tty error logging when buildkit vertex is unknown Apr 14, 2022
// setup logger here
_, isDockerfileTask := t.(*task.DockerfileTask)
if isDockerfileTask {
fn(log.With().Str("task", "system").Logger())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this part -- if I understand correctly, for Dockerfile tasks, we're logging things twice -- once with task = "" and once again with task = "system", then on the tty/plain writers we rewrite "" -> "system".

I don't understand why this is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point we're not logging with task = "" since we're inside the cueflow taskFunc and we have the correct Dagger task names.

We're logging twice here because otherwise, the task action name won't be printed to the tty. So, for dockerfile actions, at this point we need to log twice so both the dagger dockerfile action name and the system logger get recorded by the tty output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm slow today, still not following :)

We're logging twice here because otherwise, the task action name won't be printed to the tty

The second log call doesn't contain the task name, right? I don't see why it would change what's printed.

Omitting the debug logs for simplicity, this is going to produce:

{"task": "actions.foo", "message": "computing"}  # already had this
{"task": "system", "message": "computing"} # new entry with this PR
{"task": "actions.foo", "message": "completed"} # already had this
{"task": "system", "message": "completed"} # new entry with this PR

I don't understand the goal of the system entry since we can't rely on computing/completed etc (as you can have a bunch of Dockerfile running in parallel)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm slow today, still not following :)

np haha! It's likely that I'm missing something here as well.

The second log call doesn't contain the task name, right? I don't see why it would change what's printed.

Correct, but we need the second log so the system group state is updated and handled by the tty output here: https://github.com/dagger/dagger/blob/main/cmd/dagger/logger/tty.go#L262

I don't understand the goal of the system entry since we can't rely on computing/completed etc (as you can have a bunch of Dockerfile running in parallel)

You're seeing this in the plain output, but the tty output needs this state transition so it can correctly display if something failed or not. In the case you have multiple DAG nodes with Dockerfile actions, system will have the state of the last execution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it!

I think there are some edge cases as the "last" execution is fuzzy: you can have N Dockerfile running in parallel, e.g. the state changes will not be linear (#1 computing / #2 computing / #2 completed / #1 completed), so tty will "flip" between computing/completed for the same docker build.

However, I don't know if this is really a big problem in practice and since this PR is a giant improvement already over what we have, I vote we merge it as is and see how it goes :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not merge this yet. I want to explore something to see if it can be further improved before merging

@marcosnils
Copy link
Contributor Author

marcosnils commented Apr 18, 2022

@aluzzardi added a new commit here that improves the way we're handling state transitions for the new system output.

Once thing that I've noticed and spoke with @helderco is that if you have a Dagger plan which contains parallel tasks, if one fails the others get cancelled. Here's an example:

--- test.cue ---

package main

import (
	"dagger.io/dagger"
	"universe.dagger.io/bash"
	"universe.dagger.io/alpine"
)

dagger.#Plan & {
	actions: {
		_image: alpine.#Build & {
			packages: {
				bash: _
				npm:  _
			}
		}

		// Hello World Dagger
		hello: {
			hello1: bash.#Run & {
				input: _image.output
				script: contents: #"fail"#
			}

			hello2: bash.#Run & {
				input: _image.output
				script: contents: #"sleep 5;"#
			}
		}

	}
}

Here's the output that I get when running dagger do hello

image

In the case of Dockerfile actions which share the same system group, my second commit in this PR prevents the cancelled action state to override the failed, and errors will be shown correctly.

Below is an example with the Dockerfile scenario:

-- Dockerfile.fail --

FROM node:10-alpine

RUN fails

-- Dockerfile.long --

FROM node:10-alpine

RUN sleep 5

-- test.cue --

package main

import (
	"dagger.io/dagger"
	"universe.dagger.io/docker"
)

dagger.#Plan & {
	client: filesystem: ".": read: contents: dagger.#FS
	actions: {

		// Hello World Dagger
		hello: {
			hello1: docker.#Dockerfile & {
				source: client.filesystem.".".read.contents
                                dockerfile: path: "./Dockerfile.fail"
			}
			hello2: docker.#Dockerfile & {
				source: client.filesystem.".".read.contents
				dockerfile: path: "./Dockerfile.long"
			}

		}

	}
}

Before the 2nd commit in this PR output was:

image

^ no output was generated here since "cancelled" is overriding the "failed" state for the "system" group

After the 2nd commit, output now is :

image

^ error is properly shown upon cancellation.

Changes `task.State` from string to iota to better hande state
transitions. It also adds a new method in `State` so we can have the
logic if states can be transitioned in a single place since it doesn't
make sense to transition from failed to completed or cancelled to
computing, and so on..

Signed-off-by: Marcos Lilljedahl <marcosnils@gmail.com>
@aluzzardi
Copy link
Member

LGTM!

Test failure is unrelated -- restarted CI

@marcosnils marcosnils merged commit 27d8784 into dagger:main Apr 19, 2022
@marcosnils marcosnils deleted the feat/log_dockerfile_actions branch April 19, 2022 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐞 dagger do doesn't output stderr
2 participants