Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: Harden flaky Aeron tests in CI #32242

Merged
merged 2 commits into from
Nov 28, 2023
Merged

build: Harden flaky Aeron tests in CI #32242

merged 2 commits into from
Nov 28, 2023

Conversation

patriknw
Copy link
Member

  • increase /dev/shm and use that (by default)
  • use default term buffer size
  • increase cpu requests, shouldn't matter but corresponds to what we want to use, 2 pods per node

This looks very promising. I have tried in a gke cluster. Verified with df -h. It was 64 MB and now 1G.

No more "Scheduled sending of heartbeat was delayed".

This wasn't possible when we tried last time #30601

@@ -215,8 +215,6 @@ jobs:
-Dakka.test.tags.exclude=gh-exclude,gh-exclude-aeron,timing \
-Dakka.test.multi-in-test=false \
-Dakka.cluster.assert=on \
-Daeron.dir=/opt/volumes/media-driver \
-Daeron.term.buffer.length=33554432 \
clean ${{ matrix.command }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job is not in Kubernetes. Might have same problem with too small /dev/shm. Let me try...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plenty of space, no problem.

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   62G   22G  74% /
tmpfs           7.9G  172K  7.9G   1% /dev/shm
tmpfs           3.2G  1.1M  3.2G   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sdb15      105M  6.1M   99M   6% /boot/efi
/dev/sda1        63G  4.1G   56G   7% /mnt
tmpfs           1.6G   12K  1.6G   1% /run/user/1001

@@ -147,7 +144,8 @@ jobs:
gcloud components install gke-gcloud-auth-plugin
gcloud config set compute/region us-central1
gcloud config set compute/zone us-central1-c
./kubernetes/create-cluster-gke.sh "akka-artery-aeron-cluster-${GITHUB_RUN_ID}"
gcloud container clusters get-credentials akka-artery-aeron-cluster-test --zone us-central1-c --project akka-team
# ./kubernetes/create-cluster-gke.sh "akka-artery-aeron-cluster-test"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intentional? Not calling the script to create the cluster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover from my testing, thanks

* increase /dev/shm and use that (by default)
* use default term buffer size
* increase cpu requests, shouldn't matter but corresponds
  to what we want to use, 2 pods per node
Copy link
Contributor

@pvlugter pvlugter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* more memory request
* separate Aeron run in another workflow to make
  such test failures more clear
@patriknw
Copy link
Member Author

There was an error: "insuffiient usable storage for new log of ". I have increased it. I don't know if it accumulates when running all tests? It's supposed to delete the files on shutdown.

@patriknw
Copy link
Member Author

I separated the aeron run in separate workflow. I hope that shows up so I can trigger a manual run if I merge this?

@patriknw patriknw merged commit 95d7210 into main Nov 28, 2023
5 checks passed
@patriknw patriknw deleted the wip-dev-shm-patriknw branch November 28, 2023 07:31
@patriknw patriknw added this to the 2.9.1 milestone Nov 28, 2023
@patriknw
Copy link
Member Author

a successful run in https://github.com/akka/akka/actions/runs/7015560435

He-Pin pushed a commit to He-Pin/akka that referenced this pull request Jan 7, 2024
* increase /dev/shm and use that (by default)
* use default term buffer size
* increase cpu requests, shouldn't matter but corresponds
  to what we want to use, 2 pods per node
* more memory request
* separate Aeron run in another workflow to make
  such test failures more clear
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants