Prototypical implementation of "Bellamy" for runtime prediction / resource allocation. Please consider reaching out if you have questions or encounter problems.
- Python
3.8.0+
- PyTorch
1.8.0
- PyTorch Ignite
0.4.2
- PyTorch Geometric
1.7.0
- Ray Tune
1.1.0
- Optuna
2.3.0
All necessary packages will be installed in an environment.
install dependencies with:
conda env [create / update] -f environment.yml
For activation of the environment, execute:
[source / conda] activate bellamyV1
During Model-Pretraining, we use Ray Tune and Optuna to carry out a search for optimized hyperparameters. As of now, we draw 12 trials using Optuna from a manageable search space. All details are listed in the config.yml
and are in most cases directly passed to the respective functions / methods / classes in Ray Tune. For instance, it can be inferred from the config.yml
file that we used the ASHAScheduler implemented in Ray Tune to early terminate bad trials, pause trials, clone trials, and alter hyperparameters of a running trial. Also, we used Optuna's default sampler, namely TPESampler, to provide trial suggestions.
Firstly, we need to make sure that a pretrained model is available. In a production environment, this could be assured by a periodically executed job. For instance, run the following command to pretrain a model with the configuration specified in config.yml
:
python examples/example-pretraining.py -tt c3o -a grep
This will train a model on the C3O Dataset for the grep-algorithm. The best checkpoint and the fitted pipeline is eventually persisted.
After obtaining a pretrained model, we can proceed to use it for predictions. For instance, running
python examples/example-prediction.py -tt c3o -a grep -r "examples/example-request.yml"
will use an exemplary request containing a min scale-out, a max scale-out and the necessary information for the essential context properties. The pretrained model will be loaded, ideally fine-tuned on data found for this specific configuration, and eventually used for retrieving runtimes. Finally, we report the scale-out with the smallest runtime. This scale-out can then be used for allocating resources accordingly.
Of course, the code can be extended to incorporate other criterions.