Increase run-complete-tests parallelism to 4 which is the combinations count#35828
Increase run-complete-tests parallelism to 4 which is the combinations count#35828hussein-awala wants to merge 2 commits intoapache:mainfrom
run-complete-tests parallelism to 4 which is the combinations count#35828Conversation
|
Few comments: Let's see. Tried that before but it was introducing a lot of random failures (likely 4 k8s simply use too much memory). Likely we could think about switching a bit memory utilisation on our self-hosted runners (we allocate a lot of memory for tmpfs to do things more stable/faster - so whole docker installation runs using docker backed by tmpfs - not by "regular" filesystems on our self-hosted runners. Currently 85% of memory (of 64 GB) is used as tmpfs and only remaining 15% is used for regular memory of the system. Maybe some optimisations of images and size could allow us to decrease the % memory used for filesystem and then running 4 k8s will be fine ? It's difficult to play with however because that setting is per image and we will have to switch and test (ideally during weekend/hours when no other committers do stuf). https://github.com/apache/airflow-ci-infra/blob/main/github-runner-ami/packer/files/mounts_setup.sh One more comment - doing it this way is good for experimenting, but not good for merging. Hardcoding parallelism should not happen - because when it runs on self-hosted runners it's fine to run 3 but for public runners even 2 is too much - so we have special parallelism for k8s tests that has a bit weird formula to fine-tune behaviour of automated parallelism calculation So eventually if we want to change parallelism - we will have to change default calculation here, or have a conditional PARALLELISM setting in CI depending on type of worker (I'd very much prefer the default calculation as conditionals in GH aations CI tend to look strange and be prone to errors. |
Yes, it's just to test, then we can discuss the implementation if we find some benefits. |
And yeah. I would very much love to optimse these tests - they are now on critical path after all the other improvements :) |
|
Yeah... same case - a lot of "connection reset by peer" - which likely means killed by lack of memory. |
Currently, the test runs with 3 parallel processes, but there are 4 tests. The last test starts when the 3 other tests finish, which leads to resources and time wasting. Running the 4 tests in parallel will help to reduce the overall time.