## Create Model and HF Model 

There are different way to create a LLM Hugging Model in SageMaker 

- Option 1. sm model + model uri + model data 
- Option 2. sm jumpstart 
- Option 3. HuggingFaceModel (can load model after endpoint created)
- Option 4. The new HF LLM DLC get_huggingface_llm_image_uri

- [sm rag docs and notebook](https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/question_answering_jumpstart_knn.html)
- [pgvector and sagemaker](https://github.com/aws-samples/aurora-postgresql-pgvector/blob/main/apgpgvector-similiarity-search/genai-pgvector-similarity-search.ipynb)
- [hugging face hub model](https://www.philschmid.de/sagemaker-falcon-180b)
- [Hugging Face LLM Inference Container ](https://www.philschmid.de/sagemaker-huggingface-llm)
- [SageMaker HF Toolkit](https://github.com/aws/sagemaker-huggingface-inference-toolkit)

In [1]:
from sagemaker import Session
from sagemaker.model import Model 
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.huggingface import get_huggingface_llm_image_uri
from sagemaker import image_uris
from sagemaker import model_uris
import json 

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


### Parameters 

In [2]:
endpoint_name = "embedding_demo"

In [3]:
session = Session()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [4]:
model_version = "*"
model_id = "huggingface-text2text-flan-t5-xxl"
instance_type = "ml.g5.12xlarge"

### Option 1. Create Model with Image and Model Data from Jumpstart 

In [5]:
# get image uri from sm  
image_uri = image_uris.retrieve(
  framework=None,
  region=None,
  image_scope="inference",
  model_id=model_id,
  model_version=model_version,
  instance_type=instance_type
)

In [6]:
print(image_uri)

763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04


In [9]:
# get model data from sm jumpstart 
model_data = model_uris.retrieve(
  region=None,
  model_id=model_id, 
  model_version=model_version,
  model_scope="inference"
)


In [10]:
print(model_data)

s3://jumpstart-cache-prod-us-east-1/huggingface-infer/prepack/v1.1.2/infer-prepack-huggingface-text2text-flan-t5-xxl.tar.gz


In [13]:
model = Model(
  image_uri=image_uri,
  model_data=model_data,
  role=None,
  name=endpoint_name,
  env={"SAGEMAKER_MODEL_SERVER_WORKERS": "1", "TS_DEFAULT_WORKERS_PER_MODEL": "1"}
)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [14]:
print(model)

<sagemaker.model.Model object at 0x7fd18864c8b0>


### Option 2. Create Model with Image and Model Data From HF Hub 

In [12]:
hub = {
  'HF_MODEL_ID': 'sentence-transformers/all-MiniLM-L6-v2',
  'HF_TASK': 'feature-extraction'
}

In [17]:
hf_model = HuggingFaceModel(
  role=None,
  # modeal data could be loaded after endpoint created  
  # model_data=model_data, 
  transformers_version="4.26",
  pytorch_version="1.13",
  py_version="py39",
  env=hub
)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [15]:
print(hf_model.model_data)

None


In [16]:
print(hf_model.image_uri)

None


## Option 3. Retrieve the new Hugging Face LLM DLC

In [5]:
# get image uri from aws ecr 
# same hugging face llm dlc for different models 
llm_image_uri = get_huggingface_llm_image_uri(
  session=session,
  version="0.9.3",
  backend="huggingface"
)

In [6]:
print(llm_image_uri)

763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04


In [7]:
hf_model_id = "OpenAssistant/pythia-12b-sft-v8-7k-steps" # model id from huggingface.co/models
use_quantization = False # wether to use quantization or not
instance_type = "ml.g5.12xlarge" # instance type to use for deployment
number_of_gpu = 4 # number of gpus to use for inference and tensor parallelism
health_check_timeout = 300 # Increase the timeout for the health check to 5 minutes for downloading the model

In [8]:
llm_model = HuggingFaceModel(
    role=None,
    image_uri=llm_image_uri,
    env={
    'HF_MODEL_ID': hf_model_id,
    'HF_MODEL_QUANTIZE': json.dumps(use_quantization),
    'SM_NUM_GPUS': json.dumps(number_of_gpu)
    }
)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [9]:
print(llm_model)

<sagemaker.huggingface.model.HuggingFaceModel object at 0x7f8aa2e68a30>


In [11]:
# model data will be loaded after the endpoint created 
print(llm_model.model_data)

None
