Worker: find a better way to manage ephemeral storage

We are learning that when a worker uses too much ephemeral storage (disk space), the pod will be killed. Instantly and without mercy, and at the cost of lost runs.

The problem here  of course is autoinstall and node_modules

What solutions do we have here?

- We can increase the ephemeral storage to reduce the frequency of this occuring
- We  can kill workers every 24 hours to reduce the rate of this happening. A purge. This also helps with the memory leak btw
- Workers could manage their node_modules installation, aiming to keep < 20 modules installed and then removing the least used adaptors periodically. This will result in more installations but should ensure better memory management
- Can we do something like: a worker only claims for certain adaptor versions? But this is hard to track, what do we do if eg no adaptor wants to install kobotoolbox@0.2.0? So I don't think this is anything
- We could use a shared volume in kubernetes. This means workers start up faster (no need to autoinstall common). The downside being that that one shared volume might need to store every adaptor version ever released all at once (I guess the shared volume would need some management or a regular purge). Then again, a 1TB volume would presumably last us a very long time. Does the risk of the npm registry getting corrupted (which can happen locally with the CLI) increase? Yes - and if the installation DOES get corrupted, then all workers will break.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker: find a better way to manage ephemeral storage #919

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Worker: find a better way to manage ephemeral storage #919

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions