This project is a showcase to demonstrate how the recently added Container support for AWS Lamnda, introduced at re:Invent 2020 can be a game changer and open new possibilities.
At re:Invent 2020, AWS announced a long-awaited update for AWS Lambda by many developers and data scientists because it could change the way we build functions. It comes with bonus features that make this release something very welcome in the serverless world: starting from today it is possible to package a lambda function starting from an OCI file format. The best part is that a custom Dockerfile could either extend a lambda base image, provided by AWS for any supported runtime and published on DockerHub, or start from a fresh Alpine or Debian image, thus customizing Linux dependencies, packages, and everything we usually do with a dockerized app container. This has some serious implications for people willing to use AWS Lambda to serve the machine learning model. Moreover a new announced feature of container image support for AWS Lambda is the image size limit of 10GB. This means a lot to us: all the libraries required by a machine learning stack and even the weights of the model can now be packaged and published to a docker registry
More info about container support for AWS Lambda can be found:
- in the AWS Blog launch post
- in this article
This repository has the goal to show how a machine learning model can be packaged and deployed to AWS Lambda with no effort. We choose to focus on a common problem in Customer Experience: customer churn. We used a dataset publicly available on Kaggle to train our machine learning model with scikit-learn on tabular data. For our use case, we leveraged an already implemented DecisionTree as shown in a couple of examples on kaggle.
A Jupyter Notebook with the implemented training model and its evaluation (using F1 score) can be found in notebooks folder. A raw python version of this trained model, without feature encoding (since encoded feature don't correlate with attrition) is available in src/training. A sample request to the deployed model can be run from api/inference.http after code deploy.
Starting using this repo is as easy as just checkout and deploy
git clone https://github.com/aletheia/mlops-lambda-classifier.git
cd mlops-lambda-classifier
Then we have to choose the preferred deployment method: bash script or AWS CDK.
Assuming the AWS CLI is configured with your credentials:
cd script
./create-function.sh
To update the function code after changes to lambda/inference code:
./update-function
Here we use the released Container support in AWS CDK support, released since v1.76 Assuming the AWS CLI is configured with your credentials, the deploy is pretty straightforward, just like any standart AWS CDK deploy:
cdk deploy