Project Batcomputer is a working example of DevOps applied to machine learning and the field of AI.
Some motivations behind this project:
- Understand the challenges in operationalization of ML models
- Attempt to make a reality of βDevOps for AIβ
- Integration of "closed box" processes (e.g. Azure ML Services) with real DevOps approach
π¬ Why "Project Batcomputer"?
The main model trained and used as the foundation of the project is based on crime data, and predictions of outcomes of crimes (convictions etc). The Batman Batcomputer seemed like a fun way to make using such a prediction model more interesting.
Some of the main technology themes:
- Continuous integration & deployment with Azure Pipelines
- Wrapper app that allows the model to be run as a RESTful web API
- Use of Azure ML Service and Python SDK
- Training Python notebooks that carry out the machine learning using Scikit-Learn
- Infrastructure as code deployments into Azure
- Use of containers and Kubernetes
This shows a high level view of the core functional aspects of the project
{: .framed .padded}
It was a design goal of the project not to present a thin wrapper around the scoring function i.e. model.predict(f)
where a raw array of feature numbers is the expected input to the API. This approach is primitive and not in line with modern RESTful APIs
This project took an approach of not just serializing the trained/fitted model as an output, but also two metadata files
lookup.pkl
file provides a means for the data-scientist working on training the model, to pass the encoded labels and features that the model is expecting at scoring time. These take the form of a pickled dictionary with string labels mapping to encoded label numbersflags.pkl
is used correspondingly to providing meaningful names for the results/scores
Using these two files, the model API wrapper has enough information that can present a more RESTful and developer friendly API
For example, rather than:
POST /api/predict
{ [27, 19, 10] }
Batcomputer API looks like:
POST /api/predict
{
"force": "Thames Valley Police",
"crime": "Bicycle theft",
"month": 10
}
The project doesn't represent a single codebase, there are multiple sets of artifacts, configuration files and sourcecode held here. The top level folders are as follows:
/aml - Azure ML Service orchestration scripts (Python)
/assets - Art and stuff
/azure - Azure ARM templates & scripts
/batclient - Frontend web client of Batcomputer to demo the model API
/data - Source training data
/docs - Documentation & guides
/kubernetes - Kubernetes, Helm charts to deploy the wrapper API into Kubernetes
/model-api - Source for Python model wrapper API
/pipelines - Azure DevOps pipelines
/tests - Postman collection for API testing
/training - Training Python notebooks
If you wish to setup this project in your own Azure subscription and or Azure DevOps account, this guide provides of the steps to do so:
This slide deck goes into more details about some of the motivations and shape of the project, and also provides a more visual guide to how it hangs together
As there are a significant number of components, interactions & products involved in this project. An attempt has been made to break the things into four main sections, and to make those sections as standalone as possible:
- DevOps CI/CD automation & pipelines
- Model training & machine learning
- Wrapping the model in an API service
- Batcomputer Client
The core of this project is focused on DevOps, and the practices of 'Continuous Integration' (CI), 'Continuous Deployment' (CD) & 'Infrastructure As Code' (IaC). In essence these are sets of automated "pipelines" or sets of tasks that drive the process of training the models, building the API and deploying that into Azure
Azure DevOps Pipelines (or simply Azure Pipelines) gives us the means to create and run these pipelines, and the likes of
For runtime and hosting of the API, containers are used and two services were selected, Azure Container Instances for simple standalone deployments, and Azure Kubernetes Service for demonstrating the deployment into Kubernetes
- Azure DevOps Pipelines
- Azure Container Registry
- Azure Resource Manager (ARM) Templates
- Helm
- Azure Container Instances (ACI)
- Azure Kubernetes Service (AKS)
Azure Pipelines (part of Azure DevOps) is used to provide CI/CD automation. These drive the whole process; data-prep, training and building of the API - plus deployment
ARM Template(s) for standing up the wrapper API app using Azure Container Instances
A Helm chart will deploy the wrapper model API app and configure a Kubernetes Ingress to route traffic to it.
The primary focus of this project is on the operationalization aspects of machine learning, rather than the "science" of machine learning and act of training. In fact from the perspective of the model-api app and the CI/CD deployment flows the quality of the model and how it was trained & created is irrelevant
Two ML use cases are provided; one for Batcomputer (based on the crime data described above) and one for the well known "would you survive the Titanic?" used in many ML training examples
The primary model is Batcomputer, the Titanic model is provided to demonstrate how the approach extends with little effort to other models, and is not "hard coded"
The scripts for training can either be run locally, or run within Azure ML Service as a experiment
β‘ Important!
The provided code has been written by someone learning ML and trying it for the first time. It was not developed by a data scientist or someone with a background in AI. It does not represent any sort of best practice or exemplary way of training a classifier model with Scikit/Python or analyzing the data. However it is functional, and the resulting models serves the purposes of this project adequately
If your main interest is in the ML and training side of things, I suggest you look elsewhere, there are thousands of excellent resources available on this topic
Azure Machine Learning service (Azure ML or AML) provides SDKs and services to prep data, train, and deploy machine learning models. Azure ML has a complete end to end workflow, including operationalization and deployment, however it skips many of the best practices of DevOps and is in effect "a closed box" and does not take account of CI/CD
For this project Azure ML is only used for training the model and traceability of the experiments runs, the operationalization part is handled outside of Azure ML, using Azure DevOps Pipelines and infrastructure as code. It's this approach that is unique to this project
Azure ML is driven via a series of 'orchestration scripts', and it's these scripts which are called and executed by Azure DevOps Pipelines. These orchestration scripts provide much of the glue and integrate the Azure ML process with a true DevOps CI/CD approach
The model API wrapper is a Python Flask app, designed to wrap and "serve" the model over a HTTP REST based API. It is standalone, lightweight and designed to run in a container. It consumes the model.pkl
, lookup.pkl
and flags.pkl
files described above, and loads them at runtime.
For more details see the full docs below