This repository contains all materials from the workshop about putting Machine Learning models to production we teached in September 2019 at IronHack.
This is a practical workshop with the goals of learning the following concepts:
- How to setup MLFLow, a tool for ML experiment tracking and model deploying, from zero to hero.
- How to track ML experiments with MLFLow
- How to put models to production with MLFLow.
- How to deploy models to production in AWS Sagemaker with just a couple lines of code.
- How to setup Apache Airflow, a powerful tool to design, schedule and monitor workflows.
- How to create workflows that take advantage of deployed models.
In order to follow tutorials in a standard setup, there is a Linux VM included in this repository
with repository itself and conda preinstalled. Please download VirtualBox and import
As this is a large file, you can download it from
VM login credentials are:
- username: ubuntu
- password: ubuntu
Friday 27/09/2019 from 17 to 20h
- Introduction to Machine Learning in Production
- Introduction to MLFlow, MLFLow full setup
- Introduction to Dataset and Business Case (Renfe AVE ticket price forecasting)
- MLFLow training API
Saturday from 10 to 20h
- MLFLow deployment API
- Python Virtual Environments distribution
- AWS model deployment with SageMaker
- Introduction to Apache Airflow
- Airflow orchestration
Unsupervised learning - high speed train tickets clustering using the following algorithms:
- Dimensionality reduction with UMAP
- HDBSCAN clustering
- Model to production using MLFlow so that the REST API returns a cluster ID for new tickets
Supervised learning - high speed train tickets forecasting using the following algorithms:
- XGBoost implementation of AWS Sagemaker (both cloud training and model deployment)
- scikit-learn Random Forest (local training and cloud deployment in AWS Sagemaker)
- Putting models to production in virtually any linux machine or server
- Putting model to production in cloud with AWS SageMaker
- Orchestration of (batch) clustering and price forecasting for new data using Apache Airflow