-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: Harden flaky Aeron tests in CI #32242
Conversation
@@ -215,8 +215,6 @@ jobs: | |||
-Dakka.test.tags.exclude=gh-exclude,gh-exclude-aeron,timing \ | |||
-Dakka.test.multi-in-test=false \ | |||
-Dakka.cluster.assert=on \ | |||
-Daeron.dir=/opt/volumes/media-driver \ | |||
-Daeron.term.buffer.length=33554432 \ | |||
clean ${{ matrix.command }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This job is not in Kubernetes. Might have same problem with too small /dev/shm. Let me try...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plenty of space, no problem.
Filesystem Size Used Avail Use% Mounted on
/dev/root 84G 62G 22G 74% /
tmpfs 7.9G 172K 7.9G 1% /dev/shm
tmpfs 3.2G 1.1M 3.2G 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sdb15 105M 6.1M 99M 6% /boot/efi
/dev/sda1 63G 4.1G 56G 7% /mnt
tmpfs 1.6G 12K 1.6G 1% /run/user/1001
.github/workflows/multi-node.yml
Outdated
@@ -147,7 +144,8 @@ jobs: | |||
gcloud components install gke-gcloud-auth-plugin | |||
gcloud config set compute/region us-central1 | |||
gcloud config set compute/zone us-central1-c | |||
./kubernetes/create-cluster-gke.sh "akka-artery-aeron-cluster-${GITHUB_RUN_ID}" | |||
gcloud container clusters get-credentials akka-artery-aeron-cluster-test --zone us-central1-c --project akka-team | |||
# ./kubernetes/create-cluster-gke.sh "akka-artery-aeron-cluster-test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this intentional? Not calling the script to create the cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover from my testing, thanks
77334b7
to
a50c9e5
Compare
* increase /dev/shm and use that (by default) * use default term buffer size * increase cpu requests, shouldn't matter but corresponds to what we want to use, 2 pods per node
a50c9e5
to
bf1d4f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* more memory request * separate Aeron run in another workflow to make such test failures more clear
There was an error: "insuffiient usable storage for new log of ". I have increased it. I don't know if it accumulates when running all tests? It's supposed to delete the files on shutdown. |
I separated the aeron run in separate workflow. I hope that shows up so I can trigger a manual run if I merge this? |
a successful run in https://github.com/akka/akka/actions/runs/7015560435 |
* increase /dev/shm and use that (by default) * use default term buffer size * increase cpu requests, shouldn't matter but corresponds to what we want to use, 2 pods per node * more memory request * separate Aeron run in another workflow to make such test failures more clear
This looks very promising. I have tried in a gke cluster. Verified with
df -h
. It was 64 MB and now 1G.No more "Scheduled sending of heartbeat was delayed".
This wasn't possible when we tried last time #30601