Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncate BUILDKITE_MESSAGE to 64 KiB #1307

Merged
merged 2 commits into from
Sep 24, 2020
Merged

Truncate BUILDKITE_MESSAGE to 64 KiB #1307

merged 2 commits into from
Sep 24, 2020

Conversation

pda
Copy link
Member

@pda pda commented Sep 24, 2020

Truncate the BUILDKITE_MESSAGE commit message passed to the bootstrap process to 64 KiB so that large commit messages won't stop execution.

Follow-up to #1302, which enabled large commit messages to be pushed from the agent to Buildkite via buildkite-agent meta-data set buildkite:git:commit, but which uncovered this other bottleneck for message size.

The 64 KiB figure was chosen somewhat arbitrarily; but it is…

  • half the per-env limit of modern linux
  • quarter the shared limit of macOS
  • a tiny fraction of the modern Windows limit
  • double the limit of Windows circa 2010 (if this becomes a problem, we can reduce it for that platform)

Here's some ENV limits I've researched/discovered (see notes below):

// macOS 10.15:    256 KiB shared by environment & argv
// Linux 4.19:     128 KiB per k=v env
// Windows 10:  16,384 KiB shared
// POSIX:           (4 KiB minimum shared)

A truncation marker is added to the end of the message like this:

A long commit message

Lots of text here
... even more text here
... even more text here
... even mo[value truncated 131071 -> 65517 bytes]

And also logged to the agent:

2020-09-24 16:17:18 WARN   localhost BUILDKITE_MESSAGE value truncated 131071 -> 65517 bytes

I considered finding and re-embedding any [skip ci] / [ci skip] markers, but then realised that would be silly; they prevent the build getting this far.


Investigation into auto-discovering environment limit:

#include <stdio.h>
#include <unistd.h>
#include <limits.h>
int main(int argc, char ** argv) {
#ifdef ARG_MAX
  printf("            ARG_MAX : %d\n", ARG_MAX);
#else
  printf("            ARG_MAX : (undefined)\n");
#endif
  printf("sysconf(_SC_ARG_MAX): %ld\n", sysconf(_SC_ARG_MAX));
}
$ uname -a
Darwin paulbookpro.local 19.6.0 Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64 x86_64

$ gcc -o testlimit testlimit.c && ./testlimit
            ARG_MAX : 262144
sysconf(_SC_ARG_MAX): 262144
$ uname -a
Linux localhost 4.4.0-173-generic #203-Ubuntu SMP Wed Jan 15 02:55:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ gcc -o testlimit testlimit.c && ./testlimit
            ARG_MAX : (undefined)
sysconf(_SC_ARG_MAX): 2097152

But this doesn't work for Windows.

A program to brute-force bisect to discover largest permitted env size:

package main

import (
	"fmt"
	"os"
	"os/exec"
	"strconv"
	"strings"
)

const (
	min  = 1024
	max  = 640 * 1024 // 640K should be enough for anybody
	name = "X"
)

func main() {
	if len(os.Args) == 3 && os.Args[1] == "assert-size" {
		assertSize(os.Args[2])
	} else {
		fmt.Println(findEnvLimit())
	}
}

func findEnvLimit() int {
	good := min
	bad := max
	for bad-good > 1 {
		mid := good + (bad-good)/2
		if err := attempt(mid); err == nil {
			good = mid
			debugf("size=%d ok\n", mid)
		} else {
			bad = mid
			debugf("size=%d %v\n", mid, err)
		}
	}
	return good
}

func attempt(size int) error {
	return (&exec.Cmd{
		Path:   os.Args[0], // run self
		Args:   []string{os.Args[0], "assert-size", strconv.FormatInt(int64(size), 10)},
		Env:    []string{name + "=" + strings.Repeat(".", size)},
		Stdout: os.Stdout,
		Stderr: os.Stderr,
	}).Run()
}

func assertSize(arg string) {
	wantLen, err := strconv.ParseInt(arg, 10, 64)
	if err != nil {
		panic(err)
	}
	value, ok := os.LookupEnv(name)
	if !ok {
		fmt.Fprintln(os.Stderr, name, "not in environment")
		os.Exit(1)
	}
	if len(value) != int(wantLen) {
		fmt.Fprintf(os.Stderr, "expected %d bytes, got %d\n", wantLen, len(value))
		os.Exit(1)
	}
}

func debugf(format string, a ...interface{}) {
	fmt.Fprintf(os.Stderr, format, a...)
}

Linux:

root@23f42323a677:/envlimit# go build -o envlimit-linux && ./envlimit-linux
size=328192 fork/exec ./envlimit-linux: argument list too long
size=164608 fork/exec ./envlimit-linux: argument list too long
size=82816 ok
size=123712 ok
size=144160 fork/exec ./envlimit-linux: argument list too long
size=133936 fork/exec ./envlimit-linux: argument list too long
size=128824 ok
size=131380 fork/exec ./envlimit-linux: argument list too long
size=130102 ok
size=130741 ok
size=131060 ok
size=131220 fork/exec ./envlimit-linux: argument list too long
size=131140 fork/exec ./envlimit-linux: argument list too long
size=131100 fork/exec ./envlimit-linux: argument list too long
size=131080 fork/exec ./envlimit-linux: argument list too long
size=131070 fork/exec ./envlimit-linux: argument list too long
size=131065 ok
size=131067 ok
size=131068 ok
size=131069 ok
131069

macOS:

size=328192 fork/exec ./envlimit: argument list too long
size=164608 ok
size=246400 ok
size=287296 fork/exec ./envlimit: argument list too long
size=266848 fork/exec ./envlimit: argument list too long
size=256624 ok
size=261736 ok
size=264292 fork/exec ./envlimit: argument list too long
size=263014 fork/exec ./envlimit: argument list too long
size=262375 fork/exec ./envlimit: argument list too long
size=262055 ok
size=262215 fork/exec ./envlimit: argument list too long
size=262135 fork/exec ./envlimit: argument list too long
size=262095 fork/exec ./envlimit: argument list too long
size=262075 fork/exec ./envlimit: argument list too long
size=262065 fork/exec ./envlimit: argument list too long
size=262060 ok
size=262062 ok
size=262063 ok
size=262064 fork/exec ./envlimit: argument list too long
262063

Windows: 16,777,190 (16 MiB!)

@pda
Copy link
Member Author

pda commented Sep 24, 2020

Should help #1269

@pda pda requested a review from sj26 September 24, 2020 07:19
Otherwise a large commit message can cause complete failure:

    fork/exec ./buildkite-agent: argument list too long

Environment size is included in "argument list" limit, which varies by
system:

    macOS 10.15:    256 KiB shared by environment & argv
    Linux 4.19:     128 KiB per k=v env
    Windows 10:  16,384 KiB shared
    POSIX:            4 KiB minimum shared
Copy link
Contributor

@yob yob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great research and write up, I learnt a bunch!

Is this only truncating BUILDKITE_MESSAGE, and if so... is there any value in truncating all env values we pass to bootstrap? Most are very unlikely to exceed 64kb (in some cases it's impossible), but the code might be slightly simpler and it might just catch another edge case.

@pda
Copy link
Member Author

pda commented Sep 24, 2020

is there any value in truncating all env values we pass to bootstrap

I wondered that…

Most are very unlikely to exceed 64kb

That was half the reason I didn't.
The other half was that, for the general case, I wasn't sure if truncating was better than exploding.

Also also, we'd probably want a conditional to prevent the [value truncated 131071 -> 65517 bytes] thing being appended to truncated vars that maybe don't want random text in them… and that felt like extra complexity / code paths to handle a scenario I wasn't sure existed :)

@yob
Copy link
Contributor

yob commented Sep 24, 2020

makes sense 👍

Copy link
Contributor

@yob yob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a commit with a message of ~200Kb, and tried to build it.

With an agent built from master, it failed as you described.

With an agent built from this branch, it ran successfully. Nice 👍

It's interesting that BUILDKITE_MESSAGE looks un-truncated through the UI. I can see the truncation in the agent log output, but that won't be visible to end users.

2020-09-24 20:56:27 DEBUG tak-1 [JobRunner] Created env file: /tmp/job-env-69dd93df-a91a-484a-a1b5-4a02dc184458300191490
2020-09-24 20:56:27 WARN tak-1 BUILDKITE_MESSAGE value truncated 204999 -> 65517 bytes
2020-09-24 20:56:27 INFO tak-1 Starting job 69dd93df-a91a-484a-a1b5-4a02dc184458

This is such an edge case I don't think that's a huge problem, but I thought it worth mentioning.

@pda
Copy link
Member Author

pda commented Sep 24, 2020

Yeah — it makes sense, but could be surprising. It remains non-truncated everywhere except when it's being passed to the bootstrap.

It's a shame that the pre-bootstrap phase of the agent running a job can't (AFAIK) easily send log output to the UI. I briefly considered passing the truncation info via yet another env var to the bootstrap, like we do for e.g. BUILDKITE_IGNORED:

agent/agent/job_runner.go

Lines 416 to 455 in e99cae5

// Certain env can only be set by agent configuration.
// We show the user a warning in the bootstrap if they use any of these at a job level.
var protectedEnv = []string{
`BUILDKITE_AGENT_ENDPOINT`,
`BUILDKITE_AGENT_ACCESS_TOKEN`,
`BUILDKITE_AGENT_DEBUG`,
`BUILDKITE_AGENT_PID`,
`BUILDKITE_BIN_PATH`,
`BUILDKITE_CONFIG_PATH`,
`BUILDKITE_BUILD_PATH`,
`BUILDKITE_GIT_MIRRORS_PATH`,
`BUILDKITE_HOOKS_PATH`,
`BUILDKITE_PLUGINS_PATH`,
`BUILDKITE_SSH_KEYSCAN`,
`BUILDKITE_GIT_SUBMODULES`,
`BUILDKITE_COMMAND_EVAL`,
`BUILDKITE_PLUGINS_ENABLED`,
`BUILDKITE_LOCAL_HOOKS_ENABLED`,
`BUILDKITE_GIT_CLONE_FLAGS`,
`BUILDKITE_GIT_FETCH_FLAGS`,
`BUILDKITE_GIT_CLONE_MIRROR_FLAGS`,
`BUILDKITE_GIT_MIRRORS_LOCK_TIMEOUT`,
`BUILDKITE_GIT_CLEAN_FLAGS`,
`BUILDKITE_SHELL`,
}
var ignoredEnv []string
// Check if the user has defined any protected env
for _, p := range protectedEnv {
if _, exists := r.job.Env[p]; exists {
ignoredEnv = append(ignoredEnv, p)
}
}
// Set BUILDKITE_IGNORED_ENV so the bootstrap can show warnings
if len(ignoredEnv) > 0 {
env["BUILDKITE_IGNORED_ENV"] = strings.Join(ignoredEnv, ",")
}

// The job runner sets BUILDKITE_IGNORED_ENV with any keys that were ignored
// or overwritten. This shows a warning to the user so they don't get confused
// when their environment changes don't seem to do anything
if ignored, exists := b.shell.Env.Get("BUILDKITE_IGNORED_ENV"); exists {
b.shell.Headerf("Detected protected environment variables")
b.shell.Commentf("Your pipeline environment has protected environment variables set. " +
"These can only be set via hooks, plugins or the agent configuration.")
for _, env := range strings.Split(ignored, ",") {
b.shell.Warningf("Ignored %s", env)
}
b.shell.Printf("^^^ +++")
}

But I really don't want to add more environment vars, especially having just witnessed the shared env limits of some platforms (macOS being the most worrying with 256 KiB shared by all environment & argv).

We could also truncate it server-side on the way in, before storing it. That would also relieve us of having to store enormous values. But I figured it'd be nice to instead truncate to a per-system-discovered env limit in the agent itself. That was before I found how hard it is to determine the maximum reasonable size for an environment variable.

I think even if we eventually do that (server-side truncation), this patch should remain in the agent.

@pda
Copy link
Member Author

pda commented Sep 24, 2020

BTW in case you didn't notice it, the “Buildkite environment variables” collapsed part of your build log does include a dump of the post-truncation env, including this line:

BUILDKITE_MESSAGE=aaaa…aaaa[value truncated 204999 -> 65517 bytes]

But, that's pretty hidden, doesn't directly indicate where it was truncated, and probably only appears if you're running the agent in debug mode? But 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants