Skip to content

SnapperML is a framework for machine learning. It has many functionalities including scalability through docker instances and reproducibility.

License

Notifications You must be signed in to change notification settings

SnapperML/SnapperML

Repository files navigation

SnapperML

Documentation Status

SnapperML is a framework for experiment tracking and machine learning operationalization that combines existent and well-supported technologies. These technologies include Docker, Mlflow, Ray, among others.

The framework provides an opinionated workflow to design and execute experiments either on a local environment or the cloud. ml-experiment includes:

  • An automatic tracking system
  • First-class support for distributed training and hyperparameter optimization
  • Command Line Interface (CLI) for packaging and running projects inside containers.

How to install?

The project has some core dependencies:

  • mlflow
  • optuna>=1.1.0
  • ray>=0.8.2
  • docker>=4.1.0

The python package can be install using pip:

pip install snapper-ml

Please note that ray is not available for newer Python versions (3.9). In order to configure this, you have to install a previous version and configure it. For a Linux system (Fedora) you would have to do:

# install python 3.7
sudo dnf install python3.7
# configure the system to use python 3.7
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.9 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 2
# here select 3.7 version
sudo alternatives --config python3 
# make pip available to be used by python 3.7
python -m ensurepip --default-pip
# Install SnapperML
pip install snapper-ml

WARNING: while chaging the default interpreter to python3.7 some of the native functions of the system might not operate properly.

Architecture

The framework main core is divided into four modules that interact with the user through a Command-Line Interface (CLI) and a Python library. The objective of the library is to minimize the code changes required to instrument scripts to be executed by the Job Runner and to provide the abstractions to interact with the Tracking and Hyperparameter Optimization engines. On the other hand, the CLI is in charge of executing scripts either in a local environment or a remote environment.

Architecture Overview

Documentation

The documentation is available here

Example

# train_svm.py

from snapper_ml import job

@job
def main(C, kernel, gamma='scale'):
    np.random.seed(1234)
    X_train, X_val, y_train, y_val = load_data()
    model = SVC(C=C, gamma=gamma, kernel=kernel)
    model.fit(X_train, y_train)
    accuracy = model.score(X_val, y_val)
    return {'val_accuracy': accuracy}


if __name__ == '__main__':
    main()
# train_svm.yaml

name: "SVM"
kind: 'group'
num_trials: 12
sampler: TPE

param_space:
  C: loguniform(0.01, 1000)
  gamma: choice(['scale', 'auto'])

metric:
  name: val_accuracy
  direction: maximize

ray_config:
  num_cpus: 4

run:
  - train_svm.py
snapper-ml run --config_file=train_svm.yaml

About

SnapperML is a framework for machine learning. It has many functionalities including scalability through docker instances and reproducibility.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages