Take a deep learning model and host it on AWS, paying only for the inference time that you actually use. The solution uses API Gateway for handling requests from the internet which then get passed to AWS Lambda which in turn loads its code and deep learning model from a docker image that is hosted on AWS Elastic Container Registry.
The model weights and the inference code are taken from this repository. I hope you find this repository useful for serverlessly hosting your own models. Also, see the accompanying blog post.
I will assume that you have an AWS account and that you have the permissions to use ECR, AWS Lambda and API Gateway.
First you need to download the model weights and place them into your folder. Then you need to build the docker image
docker build -t YOUR-IMAGE-NAME .
Next you should create a repository on ECR and note down your AWS region and the prefix for your ECR (the first part of your repository's URI). Assuming you have the right credentials configured on your AWS CLI you can run the following command in order to allow docker to push to your ECR:
aws ecr get-login-password --region YOUR-AWS-REGION | docker login --username AWS --password-stdin YOUR-ECR-PREFIX.dkr.ecr.YOUR-AWS-REGION.amazonaws.com
Then it's time to actually push to the ECR.
docker tag YOUR-IMAGE-NAME:latest YOUR-ECR-PREFIX.dkr.ecr.YOUR-AWS-REGION.amazonaws.com/YOUR-IMAGE-NAME:latest docker push YOUR-ECR-PREFIX.dkr.ecr.YOUR-AWS-REGION.amazonaws.com/YOUR-IMAGE-NAME:latest
If you hit a snag during any of these steps, also refer to the documentation.
Using the docker container from AWS Lambda is easy. During creation in the AWS Management Console, you simply select Container Image as the source. After creating the function, don't forget to increase the timeout to at least 35 seconds as during cold starts the model takes a long time to load. During warm-starts the latency is on the order of 4-5 seconds for this particular model.
In order to host the model as an actual API that is accessible from the internet, you can use API Gateway.
Simply create a method (like a POST method) and selection your Lambda function as the integration enpoint. Also check the box "Use Lambda Proxy integration" in your
method's integration request menu. Finally, you also need to go to settings, "Binary Media Types"and add image/*
. If you now deploy your model, you should
be able to POST an image and receive a short textual description of its content.