Skip to content
No description, website, or topics provided.
Jupyter Notebook Python
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
config updated documentation Oct 1, 2019
data/raw improved documentation Oct 1, 2019
docs Update docs/airflow_setup.md Oct 4, 2019
notebooks updated documentation Oct 1, 2019
scripts/dags airflow examples + light version of dataset Sep 27, 2019
vm added vm Sep 26, 2019
.gitattributes airflow examples + light version of dataset Sep 27, 2019
.gitignore qMerge branch 'airflow' Sep 27, 2019
README.md improved documentation Oct 1, 2019

README.md

ML-IN-PRODUCTION-MADRID

This repository contains all materials from the workshop about putting Machine Learning models to production we teached in September 2019 at IronHack.

Approach

This is a practical workshop with the goals of learning the following concepts:

  • How to setup MLFLow, a tool for ML experiment tracking and model deploying, from zero to hero.
  • How to track ML experiments with MLFLow
  • How to put models to production with MLFLow.
  • How to deploy models to production in AWS Sagemaker with just a couple lines of code.
  • How to setup Apache Airflow, a powerful tool to design, schedule and monitor workflows.
  • How to create workflows that take advantage of deployed models.

In order to follow tutorials in a standard setup, there is a Linux VM included in this repository with repository itself and conda preinstalled. Please download VirtualBox and import vm/ubuntu.ova. As this is a large file, you can download it from here:

VM login credentials are:

  • username: ubuntu
  • password: ubuntu

In case you want to follow examples in this repo using your very own setup, we highly recommend using an Ubuntu 18.04 machine with conda installed.

Calendar

  • Friday 27/09/2019 from 17 to 20h

    • Introduction to Machine Learning in Production
    • Introduction to MLFlow, MLFLow full setup
    • Introduction to Dataset and Business Case (Renfe AVE ticket price forecasting)
    • MLFLow training API
  • Saturday from 10 to 20h

    • MLFLow deployment API
    • Python Virtual Environments distribution
    • AWS model deployment with SageMaker
    • Introduction to Apache Airflow
    • Airflow orchestration

Business Case

All examples will use our dataset about high speed train tickets in Spain. You can download the dataset from Kaggle or using this link. The following use cases are covered here:

Unsupervised learning - high speed train tickets clustering using the following algorithms:

  • Dimensionality reduction with UMAP
  • HDBSCAN clustering
  • Model to production using MLFlow so that the REST API returns a cluster ID for new tickets

Supervised learning - high speed train tickets forecasting using the following algorithms:

  • XGBoost implementation of AWS Sagemaker (both cloud training and model deployment)
  • scikit-learn Random Forest (local training and cloud deployment in AWS Sagemaker)

Model deployment:

  • Putting models to production in virtually any linux machine or server
  • Putting model to production in cloud with AWS SageMaker

Scheduling:

  • Orchestration of (batch) clustering and price forecasting for new data using Apache Airflow
You can’t perform that action at this time.