Skip to content

MATRYCS/dummy-mlops-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Example of an MLOps pipeline with Apache Airflow and MLFlow

The repository contains a minimalistic example of the MLOps pipeline used by MATRYCS project partners. Its purpose is to help partners integrate MLOps practices into their existing ML pipelines of (re)training and deployment of the models. The following significant services are used:

Quickstart (local deployment)

  1. Make sure that Docker and docker-compose are installed.

  2. Clone THIS and complementary MATRYCS/ ml_model_tracking_framework repository.

  3. Start two docker-compose clusters of services:

    • Start MLFlow service from ml_model_tracking_framework according to its README.md file.
    • Start THIS project with make images (only first-time) and docker-compose up to bring up the services.
  4. Access services:

The Pipeline

The example in the repository is meant for MATRYCS project partners. Its purpose is to help partners integrate MLOps practices into their existing ML pipelines of (re)training and deployment of the models.

Apache Airflow screenshot of DAGs

As presented in the Figure above, the demo consists of two separate pipelines. The first pipeline/DAG (i.e. retrain-dummy-model) is responsible for retraining the machine learning model. The source code of the DAG is located in dags/retrain_dummy_model.py. The retraining DAG expects a GitHub repository with the latest working machine learning model to prepare its dependencies and retrain the model according to the train.py script. In addition, the train.py script is responsible for correctly logging and pushing the binary of built model(s) to the MLFlow service. The example code is located within MATRYCS/dummy-regression-model demo model repository.

List of recorded models in MLFlow

Registered model in MLFlow

The second pipeline/DAG (i.e. deploy-dummy-model) is responsible for model deployment. The source code of the DAG is located in dags/deploy_dummy_model.py. The deployment DAG expects MLFlow service to contain logs of a particular model, where one of the models is marked as Production (see figures above). The final stage of DAG wraps produced model into "bentos", which can be forwarded to BentoML serving infrastructure.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published