# Deploy LLaVA-v1.5-7B model on Amazon SageMaker

***This notebook works best with the `conda_python3` kernel on a `ml.t3.large` machine***.

---

In this notebook we download the [LLaVA-v1.5-7B](https://huggingface.co/anymodality/llava-v1.5-7b) and deploy it on SageMaker. We use the `huggingface-pytorch-inference` container and deploy this model on a `ml.g5.xlarge` instance type. 

The downloaded model files are archived into a `model.tar.gz` file that is uploaded to the default SageMaker S3 bucket. The `inference.py` file is overwritten with a [`llava_inference.py`](./llava_inference.py) file that has code to run inference on an image stored in S3.

## Step 1. Setup

Install the required Python packages and import the relevant files.2

In [None]:
!pip install -r requirements.txt

In [None]:
import os
import shutil
import logging
import sagemaker
import globals as g
import requests as req
from typing import Dict
from pathlib import Path
from sagemaker.s3 import S3Uploader
from sagemaker import get_execution_role
from huggingface_hub import snapshot_download
from sagemaker.huggingface.model import HuggingFaceModel

In [None]:
# global constants
!pygmentize globals.py

In [None]:
logging.basicConfig(format='[%(asctime)s] p%(process)s {%(filename)s:%(lineno)d} %(levelname)s - %(message)s', level=logging.INFO)
logger = logging.getLogger(__name__)

In [None]:
model_dir: str = g.HF_MODEL_ID.split("/")[-1]
model_tar_gz_path: str = os.path.join(os.path.dirname(os.getcwd()), f"model_{model_dir}.tar.gz")
logger.info(f"HF_MODEL_ID={g.HF_MODEL_ID}, model_dir={model_dir}, model_tar_gz_path={model_tar_gz_path}")

## Step 2: Prepare the `model.tar.gz`

1. Download the model files from HuggingFace.

1. Update the `inference.py` with [`llava_inference.py`](./llava_inference.py)

1. Zip the model directory.

Download the model files. **This takes about 5 minutes**.

In [None]:
%%time
model_path: str = os.path.join(os.path.dirname(os.getcwd()), model_dir)
Path(model_path).mkdir(exist_ok=True)
# Download model from Hugging Face into model_dir
snapshot_download(g.HF_MODEL_ID, local_dir=model_path, local_dir_use_symlinks=False)

In [None]:
# update the inference script
inf_dest: str = os.path.join(model_path, 'code', 'inference.py')
shutil.copyfile("llava_inference.py", inf_dest)

Create a .tar.gz file. **This step takes about 10 minutes**.

In [None]:
%%time
# Create SageMaker model.tar.gz artifact
!cd {model_path};tar -cf {model_tar_gz_path} --use-compress-program=pigz *;cd -

Upload the model.tar.gz to S3. **This steps takes about 3 minutes**.

In [None]:
%%time
# upload model.tar.gz to s3
S3Uploader.upload(local_path=model_tar_gz_path, desired_s3_uri=g.S3_MODEL_URI)
logger.info(f"model uploaded to: {g.S3_MODEL_URI}")

## Step 3: Deploy the model on SageMaker

Here we deploy the model on SageMaker. We use the [HuggingFaceModel](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/sagemaker.huggingface.html) class from the SageMaker SDK. **This steps takes about 10 minutes**.

In [None]:
%%time

# set the env vars for the model
config: Dict = dict(HF_TASK=g.HF_TASK)

model_data: str = os.path.join(g.S3_MODEL_URI, f"model_{os.path.basename(g.HF_MODEL_ID)}.tar.gz")
instance_type: str = "ml.g5.xlarge"
instance_count: int = 1
logger.info(f"going to deploy {g.HF_MODEL_ID} model, model_data={model_data}, instance_type={instance_type}, instance_count={instance_count}")

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=model_data,   
   role=get_execution_role(),                                  
   transformers_version=g.TRANSFORMERS_VERSION,  
   pytorch_version=g.PYTORCH_VERSION,            
   py_version=g.PYTHON_VERSION,                
   model_server_workers=1,
   env=config
)

# deploy the endpoint endpoint
predictor = huggingface_model.deploy(initial_instance_count=instance_count,
                                     instance_type=instance_type)
logger.info(f"finished deploying model")

The [HuggingFaceModel](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/sagemaker.huggingface.html) encapsulated several defaults, lets examine the parameters for the deployed model to review the model settings.

In [None]:
logger.info(f"model info -> {vars(huggingface_model)}")

Save the name of the deployed endpoint so that the other notebooks can create a [`Predictor`](https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html) and use this model.

In [None]:
_ = Path(g.ENDPOINT_FILENAME).write_text(predictor.endpoint_name)