Improve tty error logging when buildkit vertex is unknown #2188

marcosnils · 2022-04-13T23:58:40Z

Creates a generic "system" group in the tty output similarly to the plain one which captures
buildkit events that report a non-dagger vertex name. This happens
currently when using core.#Dockerfile actions since Dagger delegates the
LLB generation to buildkit through it's frontend and we don't get
meaningful events that we can correlate from Dagger's side.

Given the following dockerfile:

FROM node:10-alpine
RUN apk update
RUN some_error

Old output :

dagger do hello
[✔] client.filesystem.".".read                                                                         0.0s
[✗] actions.hello                                                                                      6.9s
                                                                                                       0.8s
5:18PM FTL failed to execute plan: task failed: actions.hello._build: process "/bin/sh -c some_error" did not complete successfully: exit code: 127

New output:

dagger do  hello
[✔] client.filesystem.".".read                                                                         0.1s
[✗] actions.hello                                                                                      1.8s
#5 0.083 fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/main/x86_64/APKINDEX.tar.gz
#5 0.605 fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/community/x86_64/APKINDEX.tar.gz
#5 0.858 v3.11.13-8-gaf7d80ff31 [http://dl-cdn.alpinelinux.org/alpine/v3.11/main]
#5 0.858 v3.11.11-124-gf2729ece5a [http://dl-cdn.alpinelinux.org/alpine/v3.11/community]
#5 0.858 OK: 11280 distinct packages available
#6 0.052 /bin/sh: some_error: not found
5:16PM FTL failed to execute plan: task failed: actions.hello._build: process "/bin/sh -c some_error" did not complete successfully: exit code: 127

cc @aluzzardi @helderco

Fixes #2044
Related #613

Creates a generic "system" group in the tty output which captures buildkit events that report a non-dagger vertex name. This happens currently when using core.#Dockerfile actions since Dagger delegates the LLB generation to buildkit through it's frontend and we don't get meaningful events that we can correlate from Dagger's side Signed-off-by: Marcos Lilljedahl <marcosnils@gmail.com>

aluzzardi · 2022-04-14T21:16:11Z

plan/runner.go

+	// setup logger here
+	_, isDockerfileTask := t.(*task.DockerfileTask)
+	if isDockerfileTask {
+		fn(log.With().Str("task", "system").Logger())


I don't get this part -- if I understand correctly, for Dockerfile tasks, we're logging things twice -- once with task = "" and once again with task = "system", then on the tty/plain writers we rewrite "" -> "system".

I don't understand why this is needed?

At this point we're not logging with task = "" since we're inside the cueflow taskFunc and we have the correct Dagger task names.

We're logging twice here because otherwise, the task action name won't be printed to the tty. So, for dockerfile actions, at this point we need to log twice so both the dagger dockerfile action name and the system logger get recorded by the tty output.

Sorry, I'm slow today, still not following :)

We're logging twice here because otherwise, the task action name won't be printed to the tty

The second log call doesn't contain the task name, right? I don't see why it would change what's printed.

Omitting the debug logs for simplicity, this is going to produce:

{"task": "actions.foo", "message": "computing"} # already had this {"task": "system", "message": "computing"} # new entry with this PR {"task": "actions.foo", "message": "completed"} # already had this {"task": "system", "message": "completed"} # new entry with this PR

I don't understand the goal of the system entry since we can't rely on computing/completed etc (as you can have a bunch of Dockerfile running in parallel)

Sorry, I'm slow today, still not following :)

np haha! It's likely that I'm missing something here as well.

The second log call doesn't contain the task name, right? I don't see why it would change what's printed.

Correct, but we need the second log so the system group state is updated and handled by the tty output here: https://github.com/dagger/dagger/blob/main/cmd/dagger/logger/tty.go#L262

I don't understand the goal of the system entry since we can't rely on computing/completed etc (as you can have a bunch of Dockerfile running in parallel)

You're seeing this in the plain output, but the tty output needs this state transition so it can correctly display if something failed or not. In the case you have multiple DAG nodes with Dockerfile actions, system will have the state of the last execution.

Got it!

I think there are some edge cases as the "last" execution is fuzzy: you can have N Dockerfile running in parallel, e.g. the state changes will not be linear (#1 computing / #2 computing / #2 completed / #1 completed), so tty will "flip" between computing/completed for the same docker build.

However, I don't know if this is really a big problem in practice and since this PR is a giant improvement already over what we have, I vote we merge it as is and see how it goes :)

Let's not merge this yet. I want to explore something to see if it can be further improved before merging

marcosnils · 2022-04-18T21:50:08Z

@aluzzardi added a new commit here that improves the way we're handling state transitions for the new system output.

Once thing that I've noticed and spoke with @helderco is that if you have a Dagger plan which contains parallel tasks, if one fails the others get cancelled. Here's an example:

--- test.cue ---

package main

import (
	"dagger.io/dagger"
	"universe.dagger.io/bash"
	"universe.dagger.io/alpine"
)

dagger.#Plan & {
	actions: {
		_image: alpine.#Build & {
			packages: {
				bash: _
				npm:  _
			}
		}

		// Hello World Dagger
		hello: {
			hello1: bash.#Run & {
				input: _image.output
				script: contents: #"fail"#
			}

			hello2: bash.#Run & {
				input: _image.output
				script: contents: #"sleep 5;"#
			}
		}

	}
}

Here's the output that I get when running dagger do hello

In the case of Dockerfile actions which share the same system group, my second commit in this PR prevents the cancelled action state to override the failed, and errors will be shown correctly.

Below is an example with the Dockerfile scenario:

-- Dockerfile.fail --

FROM node:10-alpine

RUN fails

-- Dockerfile.long --

FROM node:10-alpine

RUN sleep 5

-- test.cue --

package main

import (
	"dagger.io/dagger"
	"universe.dagger.io/docker"
)

dagger.#Plan & {
	client: filesystem: ".": read: contents: dagger.#FS
	actions: {

		// Hello World Dagger
		hello: {
			hello1: docker.#Dockerfile & {
				source: client.filesystem.".".read.contents
                                dockerfile: path: "./Dockerfile.fail"
			}
			hello2: docker.#Dockerfile & {
				source: client.filesystem.".".read.contents
				dockerfile: path: "./Dockerfile.long"
			}

		}

	}
}

Before the 2nd commit in this PR output was:

^ no output was generated here since "cancelled" is overriding the "failed" state for the "system" group

After the 2nd commit, output now is :

^ error is properly shown upon cancellation.

Changes `task.State` from string to iota to better hande state transitions. It also adds a new method in `State` so we can have the logic if states can be transitioned in a single place since it doesn't make sense to transition from failed to completed or cancelled to computing, and so on.. Signed-off-by: Marcos Lilljedahl <marcosnils@gmail.com>

aluzzardi · 2022-04-18T23:26:18Z

LGTM!

Test failure is unrelated -- restarted CI

marcosnils marked this pull request as ready for review April 14, 2022 20:19

marcosnils requested review from shykes, samalba, aluzzardi and a team as code owners April 14, 2022 20:19

marcosnils changed the title ~~wip: initial support to improve logging~~ Improve tty error logging when buildkit vertex is unknown Apr 14, 2022

aluzzardi reviewed Apr 14, 2022

View reviewed changes

aluzzardi approved these changes Apr 15, 2022

View reviewed changes

marcosnils mentioned this pull request Apr 18, 2022

✨ Improve log outputs for non-llb generated tasks #2224

Closed

marcosnils merged commit 27d8784 into dagger:main Apr 19, 2022

marcosnils deleted the feat/log_dockerfile_actions branch April 19, 2022 00:50

marcosnils mentioned this pull request Apr 26, 2022

Use progress group to track components of vertices. #2306

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve tty error logging when buildkit vertex is unknown #2188

Improve tty error logging when buildkit vertex is unknown #2188

marcosnils commented Apr 13, 2022 •

edited

aluzzardi Apr 14, 2022

marcosnils Apr 14, 2022 •

edited

aluzzardi Apr 14, 2022

marcosnils Apr 14, 2022 •

edited

aluzzardi Apr 15, 2022

marcosnils Apr 17, 2022

marcosnils commented Apr 18, 2022 •

edited

aluzzardi commented Apr 18, 2022

Improve tty error logging when buildkit vertex is unknown #2188

Improve tty error logging when buildkit vertex is unknown #2188

Conversation

marcosnils commented Apr 13, 2022 • edited

aluzzardi Apr 14, 2022

Choose a reason for hiding this comment

marcosnils Apr 14, 2022 • edited

Choose a reason for hiding this comment

aluzzardi Apr 14, 2022

Choose a reason for hiding this comment

marcosnils Apr 14, 2022 • edited

Choose a reason for hiding this comment

aluzzardi Apr 15, 2022

Choose a reason for hiding this comment

marcosnils Apr 17, 2022

Choose a reason for hiding this comment

marcosnils commented Apr 18, 2022 • edited

aluzzardi commented Apr 18, 2022

marcosnils commented Apr 13, 2022 •

edited

marcosnils Apr 14, 2022 •

edited

marcosnils Apr 14, 2022 •

edited

marcosnils commented Apr 18, 2022 •

edited