Skip to content

Increase amount of memory available for tests in self-hosted runners ??#63

Closed
potiuk wants to merge 1 commit into
mainfrom
increase-memory-available-for-tests
Closed

Increase amount of memory available for tests in self-hosted runners ??#63
potiuk wants to merge 1 commit into
mainfrom
increase-memory-available-for-tests

Conversation

@potiuk
Copy link
Copy Markdown
Member

@potiuk potiuk commented Nov 27, 2023

Our self-hosted runners have a lot of memory allocated to the /var/lib/docker which is tmpfs mounted storage for docker engine.

This is done in order to speed up immensely any docker-related operations - such as building and creating images and running and deploying kubernetes instances.

The memory allocated was at the 85% of capacity ~ 52 GB. However it seems that with our setup when we run up to two full K8S clusters, we are peaking at ~40GB . We can safely allocate more memory for tests and other operations - which might speed up the speed of tests.

Update: After reading a bit, I think it will not gain us much because it looks like tmpfs will only use as much memory as it actually uses, it does not reserve the whole amount available.

Our self-hosted runners have a lot of memory allocated to the
/var/lib/docker which is tmpfs mounted storage for docker engine.

This is done in order to speed up immensely any docker-related
operations - such as building and creating images and running and
deploying kubernetes instances.

The memory allocated was at the 85% of capacity ~ 52 GB. However
it seems that with our setup when we run up to two full K8S clusters,
we are peaking at ~40GB . We can safely allocate more memory for
tests and other operations - which might speed up the speed
of tests.
@potiuk
Copy link
Copy Markdown
Member Author

potiuk commented Nov 27, 2023

I've run a few builds with debug-ci-resources turned on and this one was the biggest tmpfs usage I could find:

image

I think we can safely give our tests ~10% memory.

Actually - I am not even sure if it will change anything (I believe when tmpfs capacity is set, it will use only as much as it actually uses, it does not reserve it, it's just the maximum it will use.

So I am not even sure if we should merge it. I just wanted to start potentially a discussion if we want to do some optimisations there (and show @hussein-awala the nice resource debugging feature we have by assigning 'debug-ci-resources` label.

BTW. We are discussing with datadog a possibilit of them donating us CI Datadog monitoring solution that we could plug-in instead of those "do-it-yourself" monitoring we have :)

Copy link
Copy Markdown
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it should be ok to reduce it to 60% (31.2Gb), but let's start with 10% of extra memory for tests, then we can check if we need more memory

@potiuk
Copy link
Copy Markdown
Member Author

potiuk commented Nov 27, 2023

IMHO it should be ok to reduce it to 60% (31.2Gb), but let's start with 10% of extra memory for tests, then we can check if we need more memory

See above - K8S tests take way more of docker filesystem (this is also the reason why we cannot run more than 3 of them in parallel). I also am not sure if that changes anything (it's more of an exercise here to show what we have) - I believe tmpfs will only take as memory as it actually uses, so I am not sure if we are gaining much by decreasing the limits :)

@potiuk
Copy link
Copy Markdown
Member Author

potiuk commented Nov 27, 2023

IMHO it should be ok to reduce it to 60% (31.2Gb), but let's start with 10% of extra memory for tests, then we can check if we need more memory

See above - K8S tests take way more of docker filesystem (this is also the reason why we cannot run more than 3 of them in parallel). I also am not sure if that changes anything (it's more of an exercise here to show what we have) - I believe tmpfs will only take as memory as it actually uses, so I am not sure if we are gaining much by decreasing the limits :)

BTW. This actually made me think that possibly we could see if we could have the k8s tests themselves optimized. We are using kind and deploying the clusters in their "default" configuration, which is - I guess - kind of not really "development" setting but something in-between development and production (usually those come with default settings that are suitable for low-production setting as well) . But our tests are not very demanding (except setting up the cluster and deploying Airflow we merely run a few pods and several DAGs). So maybe we could somehow optimize the settings of the kind clusters we have to take less of /var/lib/docker. Nothing really immediate but if we do - we could possibly run more of them in parallel.

I've done similar exercise in the past to decrease memory used by Postgres/MySQL (and I was able to save 50% of memory they used by using low settings that were enough for our tests).

@potiuk potiuk changed the title Increase amount of memory available for tests in self-hosted runners Increase amount of memory available for tests in self-hosted runners ?? Nov 27, 2023
@potiuk potiuk marked this pull request as draft November 27, 2023 11:29
@potiuk
Copy link
Copy Markdown
Member Author

potiuk commented Nov 27, 2023

Updated the description and marked it as a Draft for now. Just wanted to spark a possible discussion :)

@hussein-awala
Copy link
Copy Markdown
Member

See above - K8S tests take way more of docker filesystem (this is also the reason why we cannot run more than 3 of them in parallel). I also am not sure if that changes anything (it's more of an exercise here to show what we have) - I believe tmpfs will only take as memory as it actually uses, so I am not sure if we are gaining much by decreasing the limits :)

I'm checking the CI log to analyze the memory usage over time.

This actually made me think that possibly we could see if we could have the k8s tests themselves optimized.

It would be great! I'm not too familiar with Kind, usually, I use Minikube for dev clusters, but I will try to help as much as I can to improve these tests.

@potiuk potiuk closed this Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants