Serverless ML inference on AWS

Take a deep learning model and host it on AWS, paying only for the inference time that you actually use. The solution uses API Gateway for handling requests from the internet which then get passed to AWS Lambda which in turn loads its code and deep learning model from a docker image that is hosted on AWS Elastic Container Registry.

The model weights and the inference code are taken from this repository. I hope you find this repository useful for serverlessly hosting your own models. Also, see the accompanying blog post.

I will assume that you have an AWS account and that you have the permissions to use ECR, AWS Lambda and API Gateway.

Creating the Docker image

First you need to download the model weights and place them into your folder. Then you need to build the docker image

docker build -t YOUR-IMAGE-NAME .

Next you should create a repository on ECR and note down your AWS region and the prefix for your ECR (the first part of your repository's URI). Assuming you have the right credentials configured on your AWS CLI you can run the following command in order to allow docker to push to your ECR:

aws ecr get-login-password --region YOUR-AWS-REGION | docker login --username AWS --password-stdin YOUR-ECR-PREFIX.dkr.ecr.YOUR-AWS-REGION.amazonaws.com

Then it's time to actually push to the ECR.

docker tag  YOUR-IMAGE-NAME:latest YOUR-ECR-PREFIX.dkr.ecr.YOUR-AWS-REGION.amazonaws.com/YOUR-IMAGE-NAME:latest
docker push YOUR-ECR-PREFIX.dkr.ecr.YOUR-AWS-REGION.amazonaws.com/YOUR-IMAGE-NAME:latest

If you hit a snag during any of these steps, also refer to the documentation.

Creating the Lambda Function

Using the docker container from AWS Lambda is easy. During creation in the AWS Management Console, you simply select Container Image as the source. After creating the function, don't forget to increase the timeout to at least 35 seconds as during cold starts the model takes a long time to load. During warm-starts the latency is on the order of 4-5 seconds for this particular model.

Creating the API

In order to host the model as an actual API that is accessible from the internet, you can use API Gateway. Simply create a method (like a POST method) and selection your Lambda function as the integration enpoint. Also check the box "Use Lambda Proxy integration" in your method's integration request menu. Finally, you also need to go to settings, "Binary Media Types"and add image/*. If you now deploy your model, you should be able to POST an image and receive a short textual description of its content.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Dockerfile		Dockerfile
ECR.png		ECR.png
LICENSE		LICENSE
README.md		README.md
app.py		app.py
caption.py		caption.py
models.py		models.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dockerfile

Dockerfile

ECR.png

ECR.png

LICENSE

LICENSE

README.md

README.md

app.py

app.py

caption.py

caption.py

models.py

models.py

requirements.txt

requirements.txt

Repository files navigation

Serverless ML inference on AWS

Creating the Docker image

Creating the Lambda Function

Creating the API

About

Releases

Packages

Languages

License

AlexMeinke/serverless-hosting-of-image-captioning

Folders and files

Latest commit

History

Repository files navigation

Serverless ML inference on AWS

Creating the Docker image

Creating the Lambda Function

Creating the API

About

Resources

License

Stars

Watchers

Forks

Languages