**Resource:** https://github.com/aws-samples/Meta-Llama-on-AWS/blob/main/RAG-recipes/llama3-rag-langchain-smjs.ipynb

* Embedding Model Used: *HuggingFace BGE Large EN Embedding model*
* LLM Used: *HuggingFace Llama 3 8b Instruct LLM model*

In [83]:
%%writefile requirements.txt
langchain==0.1.14
pypdf==4.1.0
faiss-cpu==1.8.0
boto3==1.34.58
sqlalchemy==2.0.29

Overwriting requirements.txt


In [84]:
import sqlalchemy
print(sqlalchemy.__version__)

2.0.30


In [85]:
pip install nvidia-ml-py3==7.352.0

Note: you may need to restart the kernel to use updated packages.


In [86]:
pip install sqlparse==0.5.0

Note: you may need to restart the kernel to use updated packages.


In [87]:
pip install scikit-learn==1.3.0

Note: you may need to restart the kernel to use updated packages.


In [88]:
pip install omegaconf==2.2.3

Note: you may need to restart the kernel to use updated packages.


In [89]:
pip install gluonts==0.15.1

Note: you may need to restart the kernel to use updated packages.


In [90]:
pip install langchain==0.1.14

Note: you may need to restart the kernel to use updated packages.


In [91]:
pip install boto3==1.34.58

Note: you may need to restart the kernel to use updated packages.


In [92]:
!pip install -U -r requirements.txt

Collecting sqlalchemy==2.0.29 (from -r requirements.txt (line 5))
  Using cached SQLAlchemy-2.0.29-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.6 kB)
Using cached SQLAlchemy-2.0.29-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
Installing collected packages: sqlalchemy
  Attempting uninstall: sqlalchemy
    Found existing installation: SQLAlchemy 2.0.32
    Uninstalling SQLAlchemy-2.0.32:
      Successfully uninstalled SQLAlchemy-2.0.32
Successfully installed sqlalchemy-2.0.29


In [93]:
import langchain
print(langchain.__version__)

0.1.14


In [94]:
try:
    import sagemaker
except ImportError:
    !pip install sagemaker

In [95]:
# Import the JumpStartModel class from the SageMaker JumpStart library
from sagemaker.jumpstart.model import JumpStartModel

In [96]:
# Specify the model ID for the HuggingFace Llama 3 8b Instruct LLM model
model_id = "meta-textgeneration-llama-3-8b-instruct"
accept_eula = True
model = JumpStartModel(model_id=model_id, model_version="2.7.0", instance_type= "ml.g5.2xlarge")

In [15]:
predictor = model.deploy(accept_eula=accept_eula, instance_type="ml.g5.2xlarge")

-----------------!

In [None]:
#predictor_name = predictor.endpoint_name
#print(predictor_name)

In [17]:
# Specify the model ID for the HuggingFace BGE Large EN Embedding model
model_id = "huggingface-sentencesimilarity-bge-large-en-v1-5"
text_embedding_model = JumpStartModel(model_id=model_id,model_version="1.1.1")

In [18]:
embedding_predictor = text_embedding_model.deploy(instance_type="ml.g5.2xlarge")

----------!

In [None]:
#embedding_name = embedding_predictor.endpoint_name
#print(embedding_name)

In [97]:
import json
import sagemaker

from langchain_core.prompts import PromptTemplate
from langchain_community.llms import SagemakerEndpoint
from langchain_community.embeddings import SagemakerEndpointEmbeddings
from langchain_community.llms.sagemaker_endpoint import LLMContentHandler
from langchain_community.embeddings.sagemaker_endpoint import EmbeddingsContentHandler

In [98]:
sess = sagemaker.session.Session()
region = sess._region_name

In [99]:
llm_endpoint_name = "meta-textgeneration-llama-3-8b-instruct-2024-10-19-10-49-32-778"
embedding_endpoint_name = "hf-sentencesimilarity-bge-large-en-v1-5-2024-10-19-11-02-46-711"

In [100]:
# testing out my endpoint
import boto3
runtime_client = boto3.client('sagemaker-runtime', region_name=region)

input_prompt = {
    "inputs": "Where is the capital of Australia?"
}

response = runtime_client.invoke_endpoint(
    EndpointName=llm_endpoint_name,
    ContentType='application/json',
    Body=json.dumps(input_prompt)
)

In [23]:
# read and print the output
output = json.loads(response['Body'].read().decode())
print("LLM Response:", output)

LLM Response: {'generated_text': ' The capital of Australia is Canberra. It is located in the Australian Capital Territory (ACT) and is situated about 150 miles (240 kilometers) inland'}


In [101]:
from typing import Dict

class Llama38BContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs: dict) -> bytes:
        payload = {
            "inputs": prompt,
            # default parameters
            "parameters": {
                # Controls the maximum number of tokens the model can generate
                "max_new_tokens": 1000,
                # lower value makes the model more deterministic / higher value allows for more diverse responses
                "top_p": 0.9,
                # randomness
                "temperature": 0.6,
                "stop": ["<|eot_id|>"],
            },
        }
        input_str = json.dumps(
            payload,
        )
        print(input_str)
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        print(response_json)
        content = response_json["generated_text"].strip()
        return content

In [102]:
# Instantiate the content handler for Llama3-8B
llama_content_handler = Llama38BContentHandler()

# Setup for using the Llama3-8B model with SageMaker Endpoint
llm = SagemakerEndpoint(
     endpoint_name=llm_endpoint_name,
     region_name=region,
     model_kwargs={"max_new_tokens": 1024, "top_p": 0.9, "temperature": 0.7},
     content_handler=llama_content_handler
 )

In [103]:
from typing import List

class BGEContentHandlerV15(EmbeddingsContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, text_inputs: List[str], model_kwargs: dict) -> bytes:
        """
        Transforms the input into bytes that can be consumed by SageMaker endpoint.
        Args:
            text_inputs (list[str]): A list of input text strings to be processed.
            model_kwargs (Dict): Additional keyword arguments to be passed to the endpoint.
               Possible keys and their descriptions:
               - mode (str): Inference method. Valid modes are 'embedding', 'nn_corpus', and 'nn_train_data'.
               - corpus (str): Corpus for Nearest Neighbor. Required when mode is 'nn_corpus'.
               - top_k (int): Top K for Nearest Neighbor. Required when mode is 'nn_corpus'.
               - queries (list[str]): Queries for Nearest Neighbor. Required when mode is 'nn_corpus' or 'nn_train_data'.
        Returns:
            The transformed bytes input.
        """
        input_str = json.dumps(
            {
                "text_inputs": text_inputs,
                **model_kwargs
            }
        )
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> List[List[float]]:
        """
        Transforms the bytes output from the endpoint into a list of embeddings.
        Args:
            output: The bytes output from SageMaker endpoint.
        Returns:
            The transformed output - list of embeddings
        Note:
            The length of the outer list is the number of input strings.
            The length of the inner lists is the embedding dimension.
        """
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json["embedding"]

In [104]:
bge_content_handler = BGEContentHandlerV15()
sagemaker_embeddings = SagemakerEndpointEmbeddings(
    endpoint_name=embedding_endpoint_name,
    region_name=region,
    model_kwargs={"mode": "embedding"},
    content_handler=bge_content_handler,
)

In [None]:
#import os
#contents = os.listdir()
#pdf_files = [item for item in contents if item.endswith('.pdf')]

#print("Contents of the current directory:")
#for item in pdf_files:
    #print(item)

In [105]:
!pip install openpyxl



In [106]:
import os

# Set the directory path to the correct folder
directory_path = 'user-default-efs'

# List the contents of the directory
contents = os.listdir(directory_path)

# Filter the list to get only PDF files
xlsx_files = [item for item in contents if item.endswith('.xlsx')]

# Print the PDF files
print("Excel files in the specified directory:")
for item in xlsx_files:
    print(item)

Excel files in the specified directory:
recipe_final1_split_1.xlsx


In [108]:
import numpy as np
import pandas as pd
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document  # Import the Document class

# Initialize an empty list to hold the documents
documents = []
metadata = [{} for _ in xlsx_files]  # Initialize metadata for each Excel file

# Load and process each Excel file
for idx, file in enumerate(xlsx_files):
    file_path = os.path.join(directory_path, file)
    
    # Load the Excel file using pandas
    xlsx_data = pd.read_excel(file_path, sheet_name=None)  # Load all sheets
    
    for sheet_name, sheet_data in xlsx_data.items():
        # Convert each sheet's content to a string representation
        document_content = sheet_data.to_string(index=False)  # Convert sheet data to string
        
        # Create a Document object
        document = Document(
            page_content=document_content,
            metadata={"file": file, "sheet": sheet_name}
        )
        
        documents.append(document)

# Set a chunk size for splitting documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100,
)

# Split the documents
docs = text_splitter.split_documents(documents)

# Example: Print the 50th document chunk
print(docs[50])

page_content='6 fresh strawberries, hulled, 5 fresh basil leaves, 2 limes, sliced, 1 1/2 cups good quality bourbon, Ginger ale, 6 fresh strawberries, hulled, 5 fresh basil leaves, 2 limes, sliced, 1 1/2 cups good quality bourbon, Ginger ale' metadata={'file': 'recipe_final1_split_1.xlsx', 'sheet': 'Sheet1'}


In [109]:
avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)

print(f'Average length among {len(documents)} documents loaded is {avg_doc_length(documents)} characters.')
print(f'After the split we have {len(docs)} documents as opposed to the original {len(documents)}.')
print(f'Average length among {len(docs)} documents (after split) is {avg_doc_length(docs)} characters.')

Average length among 1 documents loaded is 52370471 characters.
After the split we have 20457 documents as opposed to the original 1.
Average length among 20457 documents (after split) is 346 characters.


In [110]:
sample_embedding = np.array(sagemaker_embeddings.embed_query(docs[0].page_content))
print("Sample embedding of a document chunk: ", sample_embedding)
print("Size of the embedding: ", sample_embedding.shape)

Sample embedding of a document chunk:  [-0.04397764 -0.01705203 -0.02316795 ...  0.02471263  0.00195924
 -0.01479686]
Size of the embedding:  (1024,)


In [111]:
# using FAISS for building a vector store
from langchain_community.vectorstores import FAISS
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

vectorstore_faiss = FAISS.from_documents(
    docs,
    sagemaker_embeddings,
)
wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss)

In [114]:
prompt_template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant.
<|eot_id|><|start_header_id|>user<|end_header_id|>
{query}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["query"]
)

In [115]:
# querying the FAISS vector store for documents relevant to the user's question.
query = "How to cook a dish with pork tenderloin?"

In [116]:
answer = wrapper_store_faiss.query(question=PROMPT.format(query=query), llm=llm)
print(answer)

{"inputs": "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nPreheat oven to 350 degrees F.\\nIn a large bowl mix the vinegar, soy sauce, and sesame oil together and pour over the tenderloin, turning every 10 minutes, for 30 minutes. Pat dry. Season with salt and pepper.\\nHeat 2 tablespoons of olive oil in a medium saute pan over high heat. Sear the tenderloin until evenly colored and transfer to the oven. Cook until the pork reaches the internal temperature of 145 degrees F. Remove from the pan and allow to rest 15 minutes. Thinly slice.\\nWhile tenderloin is in the marinade, in a small pot over high heat add the remaining 2 tablespoons olive oil and saute the onions, fennel, garlic and thyme until just colored. Reduce the heat to low and deglaze with the sake. Cover with a lid and cook for 30 minutes, stirring periodically until the onions are translucent and soft. Re

In [117]:
query_2 = "I have brussel sprouts and carrots, can you give me a healthy dairy-free recipe to make."

In [118]:
answer = wrapper_store_faiss.query(question=PROMPT.format(query=query_2), llm=llm)
print(answer)

{"inputs": "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nWatch how to make this recipe.\\nTo make the Pickled Cauliflower and Carrots:\\nPlace the cauliflower, carrots, onion, and oregano in a shallow heat-proof glass or nonreactive bowl. Bring the sugar, salt, vinegar, and water to a boil in a small pot, stirring until the sugar and salt are dissolved. Pour the hot brine over the vegetables, stir to combine and cover with plastic wrap. Let the vegetables stand, stirring occasionally, 30 minutes. Serve at room temperature or place in the refrigerator and chill for 1 hour or up to 2 days.\\nTo make the Grilled Peppers:\\nPreheat a grill, broiler or oven on high heat. Grill, broil or roast the peppers until the skin is black and blistered. Place in a small bowl and cover with plastic wrap. Let stand about 15 minutes, or until cool enough to handle, but still warm. Rub 

In [119]:
from langchain.chains import RetrievalQA

prompt_template = """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

This is a conversation between an AI assistant and a Human.

<|eot_id|><|start_header_id|>user<|end_header_id|>

Use the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
#### Context ####
{context}
#### End of Context ####

Question: {question}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore_faiss.as_retriever(
        # how many relevant documents should be retrieved
        search_type="similarity", search_kwargs={"k": 3}
    ),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)

In [120]:
query = "I have a dairy allergy. Can you provide a recipe that's dairy-free?"
result = qa.invoke({"query": query})
print(result['result'][0])

# Print the source documents
print(f"\n{result['source_documents']}")

{"inputs": "\n<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nThis is a conversation between an AI assistant and a Human.\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nUse the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n#### Context ####\nWatch how to make this recipe.\\nIn a non-reactive bowl, whisk together the eggs and milk. In a separate bowl mix\n\nCheesecake: Lightly butter 4 small microwave-safe mugs. Microwave 8 ounces cream cheese in a large microwave-safe bowl until soft, about 30 seconds. Whisk in 1 large egg, 1/4 cup confectioners' sugar (make sure your brand is gluten-free), 2 tablespoons cornstarch, 1 teaspoon each lemon juice and vanilla, and a pinch of salt. Pour 1/4 cup batter into each mug; stir 1 teaspoon dried blueberries into each. Microwave, one at a time, until puffed, about 2 minutes. (Continue microwaving

In [121]:
query = "I have some blueberries at home. Can you provide me a recipe?"
result = qa.invoke({"query": query})
print(result['result'][0])

# Print the source documents
print(result['source_documents'])

{"inputs": "\n<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nThis is a conversation between an AI assistant and a Human.\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nUse the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n#### Context ####\nWatch how to make this recipe.\\nStir the sugar and water in a large saucepan over medium heat until the sugar dissolves, about 5 minutes. Cool completely.\\nPuree the peaches and orange peel in a blender with 1/2 cup of the sugar syrup until smooth. Strain through a fine-meshed strainer and into a bowl. Cover and refrigerate. In a clean blender puree the strawberries with 1/3 cup of the sugar syrup until smooth. Strain through a clean fine-meshed strainer and into another bowl. Discard the seeds. Puree the blueberries in a clean blender with 1/3 cup of the sugar syrup until smooth. Strain\n\nPu