Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAVE ARTIFACT of large files on failure under FINALLY block is truncated (or missing under later versions of earthly) #2452

Open
darabos opened this issue Nov 25, 2022 · 11 comments · Fixed by #2458
Labels
type:bug Something isn't working

Comments

@darabos
Copy link

darabos commented Nov 25, 2022

I want to save a test report even on failure, so I'm trying to use --try from #587. If the artifact is not tiny, I see it getting cropped somehow! Check it out:

VERSION --try 0.6
FROM mambaorg/micromamba:jammy

works:
  RUN head -c 500k < /dev/zero > hello
  RUN ls -l hello
  TRY
    RUN true
  FINALLY
    SAVE ARTIFACT hello AS LOCAL hello
  END

buggy:
  RUN head -c 500k < /dev/zero > hello
  RUN ls -l hello
  TRY
    RUN false
  FINALLY
    SAVE ARTIFACT hello AS LOCAL hello
  END

Output:

$ rm -f hello; earthly --no-cache +works; ls -l hello
              +works | --> RUN head -c 500k < /dev/zero > hello
              +works | --> RUN ls -l hello
              +works | -rw-r--r-- 1 mambauser mambauser 512000 Nov 25 20:49 hello
              +works | --> RUN true
              +works | --> SAVE ARTIFACT hello +works/hello AS LOCAL hello
              output | --> exporting outputs
========================== 🌍 Earthly Build  ✅ SUCCESS ==========================
-rw-r--r-- 1 darabos darabos 512000 Apr 16  2020 hello
$ rm -f hello; earthly --no-cache +buggy; ls -l hello
              +buggy | --> RUN head -c 500k < /dev/zero > hello
              +buggy | --> RUN ls -l hello
              +buggy | -rw-r--r-- 1 mambauser mambauser 512000 Nov 25 20:45 hello
              +buggy | --> RUN false
              +buggy | ERROR Earthfile line 17:4
              +buggy |       The command
              +buggy |           RUN false
              +buggy |       did not complete successfully. Exit code 1
============================ ❌ FAILURE [2. Build 🔧] ============================
-rw-r--r-- 1 darabos darabos 53248 Nov 25 21:45 hello

It still manages to create hello on failure, but it's just 53,248 bytes! Where's the rest of my file?

For sizes up to 65,535 bytes it works. For sizes larger than that I get (SIZE % 65536) bytes.

Looks like an all-zero file is kind of a special case. With head -c 500k < /dev/random the file doesn't get created if it's longer than 65,535 bytes. And this is what I see in my real use case with an 8 MB zip file.

@darabos
Copy link
Author

darabos commented Nov 25, 2022

Looks like this is explained by debugger/common/common.go#L69:

	err := binary.Write(w, binary.LittleEndian, uint16(n))

The number of bytes is cast to uint16. I'll try to change it to send multiple packets.

alexcb added a commit that referenced this issue Nov 28, 2022
Fixes #2452

Signed-off-by: Alex Couture-Beil <alex@earthly.dev>
alexcb added a commit that referenced this issue Nov 28, 2022
Fixes #2452

Signed-off-by: Alex Couture-Beil <alex@earthly.dev>
alexcb added a commit that referenced this issue Nov 28, 2022
Fixes #2452

Signed-off-by: Alex Couture-Beil <alex@earthly.dev>
@alexcb alexcb closed this as completed in 95e729c Nov 28, 2022
@darabos
Copy link
Author

darabos commented Jan 25, 2023

It's a bit improved in 0.7-rc1, but I still don't get the whole file. Using the example above:

$ rm -f hello; earthly --no-cache +buggy; ls -l hello
              +buggy | --> RUN head -c 500k < /dev/zero > hello
              +buggy | --> RUN ls -l hello
              +buggy | -rw-r--r-- 1 mambauser mambauser 512000 Jan 25 11:06 hello
              +buggy | --> RUN false
              +buggy | ERROR Earthfile line 17:4
              +buggy |       The command
              +buggy |           RUN false
              +buggy |       did not complete successfully. Exit code 1
============================ ❌ FAILURE [2. Build 🔧] ============================
-rw-r--r-- 1 darabos darabos 327675 Jan 25 12:06 hello

That's 327,675 bytes out of 512,000, or 64% of the way.

@darabos
Copy link
Author

darabos commented Jan 26, 2023

Oh no! It's not deterministic! Retrying it a bit more I get 131,070 bytes, then 196,605, then 512,000 (yay!), 262,140, and then no file at all. After that I get no file at all until I delete the earthly-buildkit container. Then we're back to 196,605, 393,210, etc.

This behavior was discovered by my coworker @lacca0. (Thanks!)

Looks quite random. So I'm not sure it can be the fault of the wire protocol. Maybe the sender or received gets killed halfway?

Do you think it's something on our end? Or are you getting this behavior too? We're using the v0.7.0-rc1 binary and the Earthfile above. Though I changed 0.6 to 0.7.

@nelsam
Copy link
Contributor

nelsam commented Jan 26, 2023

I'm seeing the same behavior. 256k one time, 448k another, sometimes it just doesn't show up at all. 🤔

@nelsam nelsam reopened this Jan 26, 2023
@nelsam
Copy link
Contributor

nelsam commented Jan 26, 2023

I've tracked down part of the problem: in this defer statement, we are re-checking the same err value from os.Create - so that if statement is unreachable (the condition will always be false). When I fixed it, I started seeing ErrDataUnderflow, which causes earthly to remove the file.

@nelsam
Copy link
Contributor

nelsam commented Jan 26, 2023

cb41246 means that we won't ignore the underflow error any more, but all that means is that the file will be deleted if it didn't get fully written. It doesn't actually solve this issue.

@darabos
Copy link
Author

darabos commented Feb 2, 2023

Thanks for investigating this!

In the meantime I tried a workaround and it seems to work with 0.7-rc2.

test-with-try:
  TRY
    RUN my-test-code
  FINALLY
    SAVE ARTIFACT test-results.zip AS LOCAL test-results.zip
  END

test-with-workaround-part-1:
  RUN my-test-code || touch failed
  SAVE ARTIFACT test-results.zip

test-with-workaround-part-2:
  LOCALLY
  COPY +test-with-workaround-part-1/test-results.zip ./

test-with-workaround-part-3:
  BUILD +test-with-workaround-part-2
  FROM +test-with-workaround-part-1
  RUN [ ! -f failed ]

When I run earthly +test-with-workaround-part-3 it runs the tests (part 1), saves the results locally (part 2), and fails if the test failed (part 3). Is there anything wrong with this? Could TRY/FINALLY be syntactic sugar for doing the same? It could be simpler than rescuing the file through the socket with the debugger.

@alexcb alexcb added the type:bug Something isn't working label Mar 6, 2023
@alexcb alexcb changed the title Artifact saving in FINALLY is buggy SAVE ARTIFACT of large files on failure under FINALLY block is truncated (or missing under later versions of earthly) Mar 23, 2023
@chroder
Copy link

chroder commented Jul 27, 2023

My workaround has been to wrap the SAVE ARTIFACT in WAIT:

VERSION 0.7
FROM alpine:3.17

test:
  RUN head -c 500k < /dev/zero > hello
  RUN ls -l hello
  RUN false || touch failed
  WAIT
    SAVE ARTIFACT hello AS LOCAL hello
  END
  RUN [ ! -f failed ]

@glehmann
Copy link

glehmann commented Jan 19, 2024

@darabos workaround is working for me. Thanks!

@chroder workaround is not working for me, and in fact I'm not sure how it could, as the artifacts are not saved locally in case of failure anywhere in the target. Am I missing something?

@alexcb
Copy link
Contributor

alexcb commented Jan 19, 2024

@chroder workaround is not working for me, and in fact I'm not sure how it could, as the artifacts are not saved locally in case of failure anywhere in the target. Am I missing something?

The key to this work-around is the || touch failed which prevents any RUN commands from failing.

This is not bullet-proof since the entire runc container executing the RUN command could get killed (e.g. out of memory).

Another approach is to do RUN false; echo $? > exit_status, then later on check the contents of exit_status to see if it failed (i.e. non-zero exit code).

@glehmann
Copy link

yes, I'm personally using the echo $? > exit-code version 👍

I thought the outputs were generated only if no step has failed. In fact the documentation says:

If AS LOCAL ... is also specified, it additionally marks the artifact to be copied to the host at the location specified by , once the build is deemed as successful.

but indeed

VERSION 0.7

test:
	FROM alpine
	RUN echo lalala > foo
	WAIT
	    SAVE ARTIFACT foo AS LOCAL ./foo
	END
	RUN sleep 20
	RUN false

copies the file foo locally before running the sleep command, and even if the RUN false make the build fail.

I have to retry @chroder workaround on my real build — it should have worked

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
Status: Icebox
5 participants