# Deploying HuggingFace BERT on AWS Lambda

The main challenge of deploying HuggingFace BERT-based models to AWS Lambda is space - per the [Lambda documentation](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html), the following limits apply:

- Maximum deployment package size (including dependencies): 50MB zipped; 250MB unzipped
- `/tmp` directory storage: 512MB
- Maximum RAM allocation: 3,008MB

This is tricky because a typical PyTorch+Transformers installation will easily exceed 250MB (and in fact can be over 512MB too); and a trained BERT model itself may add another couple of hundred MB of data.

The solution therefore requires us to get a bit creative with our use of storage - and this will come at the cost of latency. BERT-based models are typically resource-intensive anyway, so this example will be relevant to specific use-cases and not normally a preferred deployment pattern.

## Libraries and configuration

As usual, we'll first load and connect to our SDKs:

In [None]:
# For easier dev of local modules:
%load_ext autoreload
%autoreload 2

# Python Built-Ins:
import json
import os

# External Dependencies:
import boto3
import sagemaker


In [None]:
BUCKET_NAME = "2020-05-gym-bert"
%store BUCKET_NAME

SQUAD_V2 = False  # Whether to use V2 (including unanswerable questions)
%store SQUAD_V2

In [None]:
role = sagemaker.get_execution_role()
botosess = boto3.session.Session()
region = botosess.region_name
s3 = botosess.resource("s3")
bucket = s3.Bucket(BUCKET_NAME)
smclient = botosess.client("sagemaker")

## Scratch - IGNOREME

You don't actually need to download and inspect your model tarballs... they're already in S3 from the SageMaker training job

In [None]:
!mkdir -p models

In [None]:
bucket.Object(
    "bert-calssification-distributed-2020-05-05-15-58-03-622/output/output.tar.gz"
).download_file(
    "models/bert-cls.tar.gz"
)
bucket.Object(
    "distilbert-calssification-distributed-2020-05-05-16-24-55-728/output/output.tar.gz"
).download_file(
    "models/distilbert-cls.tar.gz"
)

In [None]:
!mkdir -p models/bert-cls
!tar -C models/bert-cls -zxvf models/bert-cls.tar.gz
!mkdir -p models/distilbert-cls
!tar -C models/distilbert-cls -zxvf models/distilbert-cls.tar.gz

## Install AWS SAM (via Brew)

In this example we'll create our Lambda function with an API Gateway deployment, via a CloudFormation template. AWS SAM CLI will simplify defining the API deployment, and allow us to build the Lambda function in a nice, reproducible Docker environment.

This script is designed to be run on a SageMaker notebook instance. If you're on a local machine with SAM already installed, you can skip it.

In [None]:
# TODO: Maybe factor the .sh into the notebook with %%sh when it's stable
# (At the moment it's convenient to call it either via terminal or notebook though)

# Install AWS SAM
!./install-sam.sh

# The script should add SAM to PATH anyway, but this Kernel is a parent process so we'll have to replicate:
os.environ["PATH"] += "/home/linuxbrew/.linuxbrew/bin"

In [None]:
# Check SAM's installed and visible:
!sam --version

# FIXME: It isn't! Grr... This works though:
!source ~/.profile && sam --version

## Install function dependencies and create Lambda package

Because we need to optimize the way our dependencies are loaded into the Lambda, the standard SAM build requirements.txt method of specifying libraries won't cut it.

We'll install our requirements on a (conda) virtual environment, and copy them in to the bundle.

**To add extra dependencnies, modify [configure-venv.sh](configure-venv.sh)**

In [None]:
# Optionally run this to clear existing env, since the below script re-uses existing:
!conda env remove -n lambda_bert -y

In [None]:
# Create an empty virtual env, install dependencies, then extract it into lambda/packages
!./configure-venv.sh

## Validate Lambda package

Check our unzipped Lambda bundle and the contents we'll extract to /tmp are within the size limits:

In [None]:
%%sh
du -sh lambda/build  # Must be under 250MB
rm -rf lambda/packages-tmp-sizecheck
unzip -q -d lambda/packages-tmp-sizecheck lambda/build/packages-tmpdir.zip
du -sh lambda/packages-tmp-sizecheck  # Must be under 512MB

## Deploy

Now our raw Lambda source code (from [lambda/src](lambda/src)) and the libraries we need (from conda env `lambda_bert`) have been packaged together (in [lambda/build](lambda/build)).

We're ready to build and deploy our SAM-based serverless application stack:

In [None]:
# Deploy the Lambda + API Gateway:
STAGING_BUCKET = "2020-05-gym-bert-sam-staging"
STACK_NAME = "test"

# FIXME: Figure how to get brew on the kernel's path properly so source ~/.profile isn't needed
!source ~/.profile && ./deploy.sh {STAGING_BUCKET} {STACK_NAME}

## Test

TODO

For now, just GET /invoke on the APIEndpoint output above by the stack creation - e.g. in your browser.

If everything is "working", probably first call will time out and second call will give a generic howdy response.