## Example 2 - Using Llama-Index with Amazon SageMaker

![](./images/sage-riding-llama.png)

Before being able to use Llama-Index with Amazon SageMaker, we will have to deploy a model for embedding and a model for generating the text. In my case, I will be using a model from the HuggingFace Hub for the embeddings ([intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct)), and a model from SageMaker JumpStart for the generative LLM ([mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)). Consistency between model family is not required, but it is a nice touch :)

In [1]:
%pip install boto3 sagemaker -qU

Note: you may need to restart the kernel to use updated packages.


In [None]:
#### SAGEMAKER SETUP ####
import sagemaker
import boto3

try:
	role = sagemaker.get_execution_role()
except ValueError:
	iam = boto3.client('iam')
	role = iam.get_role(RoleName='MyExecutionRole')['Role']['Arn']
	
session = sagemaker.Session()
default_bucket = session.default_bucket()

sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/dggallit/Library/Application Support/sagemaker/config.yaml


Couldn't call 'get_role' to get Role ARN from role name AMZN-MAC-User to get Role path.


In [None]:
#### EMBEDDING MODEL DEPLOYMENT ####
from sagemaker.huggingface import HuggingFaceModel

# Hub Model configuration. <https://huggingface.co/models>
hub = {
  'HF_MODEL_ID':'intfloat/multilingual-e5-large-instruct', # model_id from hf.co/models
  'HF_TASK':'feature-extraction', # NLP task you want to use for predictions
  'SM_NUM_GPUS': '1',
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    env=hub, # configuration for loading model from Hub
    role=role, # iam role with permissions to create an Endpoint
    py_version='py310',
    transformers_version="4.37.0", # transformers version used
    pytorch_version="2.1.0", # pytorch version used
)

embedding_predictor = huggingface_model.deploy(
    endpoint_name=sagemaker.utils.name_from_base("intfloat-multilingual-e5-large-instruct"),
    initial_instance_count=1,
    instance_type="ml.g5.4xlarge"
)

--------!

In [None]:
#### GENERATIVE MODEL DEPLOYMENT ####
from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id = "huggingface-llm-mixtral-8x7b-instruct", role=role)
predictor = model.deploy(endpoint_name=sagemaker.utils.name_from_base("llm-mixtral-8x7b-instruct"))

----------!

In [None]:
# Setting up SageMaker Endpoints as embedding model and LLM
from llama_index.core import Settings
from llama_index.llms.sagemaker_endpoint import SageMakerLLM
from llama_index.embeddings.sagemaker_endpoint import SageMakerEmbedding

# Model that will be used to generate the embeddings
Settings.embed_model = SageMakerEmbedding(endpoint_name=embedding_predictor.endpoint_name, embed_batch_size=2)
# Model that will be used to generate the answer given the most relevant chunks
Settings.llm = SageMakerLLM(endpoint_name=predictor.endpoint_name)

In [None]:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Step 1 - Load Data
documents = SimpleDirectoryReader("data").load_data()
# Step 2 - Index Data
index = VectorStoreIndex.from_documents(
    documents=documents,
    transformations=[
        SentenceSplitter(chunk_size=400, chunk_overlap=50), # Max tokens size is 512 for this model
    ], 
    show_progress=True,
)
# Step 3 - Query
query_engine = index.as_query_engine()
response = query_engine.query("Chi è Pinocchio?")
print(response)

Secondo le informazioni fornite nel contesto, Pinocchio è un ragazzaccio, un vagabondo e un vero rompicollo, come descritto dalle parole di un suo compagno di scuola. Tuttavia, quando il burattino protagonista cerca di difendere Pinocchio, dicendo che è un gran buon figliuolo, pieno di voglia di studiare, obbediente e affezionato al suo babbo, il suo naso si allunga, rivelando che in realtà sta mentendo. Quindi Pinocchio sembra essere un ragazzo disubbidiente e svogliato, che invece di andare a scuola, va in giro con i compagni a fare lo sbarazzino.


# Clean-up

In [None]:
# Delete Predictors
embedding_predictor.delete_predictor()
predictor.delete_predictor()