This is a Python module to coordinate Flux bursting. 🧋️
- ⭐️ Documentation ⭐️
- 📦️ Pypi Package 📦️
Current and desired plugins are:
- flux-burst-local to "burst" on a local HPC system
- flux-burst-gke to burst to Google Kubernetes Engine
- flux-burst-eks to burst to Amazon EKS
- flux-burst-compute-engine to burst to Google Cloud Compute Engine
- How should the plugins (or client) manage checking when to create / destroy clusters?
- Can we have a better strategy for namespacing different bursts (e.g., beyond burst-0, burst-1, ..., burst-N)
- We need a reasonable default for what a plugin should do if something fails (e.g., setup/config)
- How should each plugin decide what size cluster to make? Right now I'm just taking the max size of the job, and we are assuming the jobs need the same node type.
- We will eventually want to use namespaces in a meaningful way (e.g., users)
- We will eventually want a specific burst for a job to be able to customize in more detail, e.g., the namespace or other attribute that comes from a jobspec (right now they are global to the plugin)
- Who controls cleanup? It can be done by the flux-burst global controller or a plugin, automated or manual, either way.
- All plugins should have support to read in YAML parameters (some spec for bursting)
- All plugins should be able to match a resource request to, for example, instance types.
- Should the plugin "local queue" (self.jobs) assume to be associated with one burst, where the size is the max job size?
- Should we derive names based on provided name + size so clusters are unique by name and size?
We use the all-contributors tool to generate a contributors graphic below.
Vanessasaurus 💻 |
HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.
See LICENSE, COPYRIGHT, and NOTICE for details.
SPDX-License-Identifier: (MIT)
LLNL-CODE- 842614