Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃 dagger call --source=.:default test all is timing out locally - inconsistent behaviour #7339

Open
gerhard opened this issue May 9, 2024 · 0 comments
Labels
kind/bug Something isn't working

Comments

@gerhard
Copy link
Member

gerhard commented May 9, 2024

What is the issue?

This is an uber issue that we have been tackling in various places, including private Discord threads. It is meant to centralise & summarise the learnings. Expect heavy editing.

The command that we are running:

dagger call --debug --source=".:default" test all 2>&1 \
| ts \
| tee engine.test.all.txt

# ts is part of moreutils 馃憠 https://manpages.debian.org/testing/moreutils/ts.1.en.html

The root of the problem is inconsistent behaviour when running Engine tests locally:

  • Sometimes they complete within 15mins, sometimes they timeout after 30mins;
  • Sometimes they succeed, and sometimes they fail. When they fail, they usually do so at 30mins, due to the timeout;

So far, the following minimum system resources are known to make the tests pass more reliably (more is better):

  • 16 CPUs
  • 32GB RAM
  • NVMe disk

Note

The above config is what @aluzzardi is able to make it pass locally.

We have a bunch more details here, including system metrics: #7223 (comment)

Some of us - cc @samalba - had success with the following change:

diff --git a/ci/config.go b/ci/config.go
index 15636081a..a8155f729 100644
--- a/ci/config.go
+++ b/ci/config.go
@@ -47,6 +47,12 @@ insecure-entitlements = ["security.insecure"]
 [{{ $key }}]
 {{ index $.ConfigEntries $key }}
 {{ end -}}
+
+[worker.oci]
+gc = false
+gckeepstorage = "90%"
+[[worker.oci.gcpolicy]]
+keepBytes = "90%"
 `
 
 func generateEntrypoint(kvs []string) (*File, error) {

I applied the same change locally & couldn't see any difference:

The host that I ran this on is a clean Linux 6.8.0 (Pop!_OS 22.04) install with:

  • 16 CPUs + 32GB RAM + NVMe disk
  • Docker 26.1.2
  • Dagger 0.11.4

I was able to see the same inconsistent behaviour on this fresh local machine, as well as a bunch of all other local machines:

  1. M1 Max (10 CPUs) + 32GB + NVMe disk
  2. Ryzen 7 5800X (16 CPUs) + 64GB + NVMe disk
  3. Xeon W-2150B (20 CPUs) + 64GB + NVMe disk

The investigation continues...
image

@gerhard gerhard added the kind/bug Something isn't working label May 9, 2024
@gerhard gerhard changed the title 馃悶 path: ETOOBIG File name length exceeds the maximum supported 255 characters 馃 dagger call --source=.:default test all is timing out locally - inconsistent behaviour May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant