**Resource:** https://github.com/aws-samples/Meta-Llama-on-AWS/blob/main/RAG-recipes/llama3-rag-langchain-smjs.ipynb

* Embedding Model Used: *HuggingFace BGE Large EN Embedding model*
* LLM Used: *HuggingFace Llama 3 8b Instruct LLM model*

In [1]:
%%writefile requirements.txt
langchain==0.1.14
pypdf==4.1.0
faiss-cpu==1.8.0
boto3==1.34.58
sqlalchemy==2.0.29

Overwriting requirements.txt


In [2]:
import sqlalchemy
print(sqlalchemy.__version__)

2.0.29


In [3]:
pip install nvidia-ml-py3==7.352.0

Note: you may need to restart the kernel to use updated packages.


In [4]:
pip install sqlparse==0.5.0

Note: you may need to restart the kernel to use updated packages.


In [5]:
pip install scikit-learn==1.3.0

Note: you may need to restart the kernel to use updated packages.


In [6]:
pip install omegaconf==2.2.3

Note: you may need to restart the kernel to use updated packages.


In [7]:
pip install gluonts==0.15.1

Note: you may need to restart the kernel to use updated packages.


In [8]:
pip install langchain==0.1.14

Note: you may need to restart the kernel to use updated packages.


In [9]:
pip install boto3==1.34.58

Note: you may need to restart the kernel to use updated packages.


In [10]:
!pip install -U -r requirements.txt

Collecting sqlalchemy==2.0.29 (from -r requirements.txt (line 5))
  Using cached SQLAlchemy-2.0.29-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.6 kB)
Using cached SQLAlchemy-2.0.29-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
Installing collected packages: sqlalchemy
  Attempting uninstall: sqlalchemy
    Found existing installation: SQLAlchemy 2.0.32
    Uninstalling SQLAlchemy-2.0.32:
      Successfully uninstalled SQLAlchemy-2.0.32
Successfully installed sqlalchemy-2.0.29


In [11]:
import langchain
print(langchain.__version__)

0.1.14


In [12]:
try:
    import sagemaker
except ImportError:
    !pip install sagemaker

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


In [13]:
# Import the JumpStartModel class from the SageMaker JumpStart library
from sagemaker.jumpstart.model import JumpStartModel

In [14]:
# Specify the model ID for the HuggingFace Llama 3 8b Instruct LLM model
model_id = "meta-textgeneration-llama-3-8b-instruct"
accept_eula = True
model = JumpStartModel(model_id=model_id, model_version="2.7.0", instance_type= "ml.g5.2xlarge")

In [15]:
predictor = model.deploy(accept_eula=accept_eula, instance_type="ml.g5.2xlarge")

------------------!

In [16]:
# Specify the model ID for the HuggingFace BGE Large EN Embedding model
model_id = "huggingface-sentencesimilarity-bge-large-en-v1-5"
text_embedding_model = JumpStartModel(model_id=model_id,model_version="1.1.1")

In [17]:
embedding_predictor = text_embedding_model.deploy(instance_type="ml.g5.2xlarge")

---------!

In [18]:
import json
import sagemaker

from langchain_core.prompts import PromptTemplate
from langchain_community.llms import SagemakerEndpoint
from langchain_community.embeddings import SagemakerEndpointEmbeddings
from langchain_community.llms.sagemaker_endpoint import LLMContentHandler
from langchain_community.embeddings.sagemaker_endpoint import EmbeddingsContentHandler

In [19]:
sess = sagemaker.session.Session()
region = sess._region_name

In [20]:
llm_endpoint_name = "meta-textgeneration-llama-3-8b-instruct-2024-10-22-12-08-33-266"
embedding_endpoint_name = "hf-sentencesimilarity-bge-large-en-v1-5-2024-10-22-12-18-46-896"

In [21]:
from typing import Dict

class Llama38BContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs: dict) -> bytes:
        payload = {
            "inputs": prompt,
            # default parameters
            "parameters": {
                # Controls the maximum number of tokens the model can generate
                "max_new_tokens": 700, #started at 1000
                # lower value makes the model more deterministic / higher value allows for more diverse responses
                "top_p": 0.9,
                # randomness
                "temperature": 0.4, #started at 0.6
                "stop": ["<|eot_id|>"],
            },
        }
        input_str = json.dumps(
            payload,
        )
        print(input_str)
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        print(response_json)
        content = response_json["generated_text"].strip()
        return content

In [22]:
# Instantiate the content handler for Llama3-8B
llama_content_handler = Llama38BContentHandler()

# Setup for using the Llama3-8B model with SageMaker Endpoint
llm = SagemakerEndpoint(
     endpoint_name=llm_endpoint_name,
     region_name=region,
     model_kwargs={"max_new_tokens": 700, "top_p": 0.9, "temperature": 0.4},
     content_handler=llama_content_handler
 )

In [23]:
from typing import List

class BGEContentHandlerV15(EmbeddingsContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, text_inputs: List[str], model_kwargs: dict) -> bytes:
        """
        Transforms the input into bytes that can be consumed by SageMaker endpoint.
        Args:
            text_inputs (list[str]): A list of input text strings to be processed.
            model_kwargs (Dict): Additional keyword arguments to be passed to the endpoint.
               Possible keys and their descriptions:
               - mode (str): Inference method. Valid modes are 'embedding', 'nn_corpus', and 'nn_train_data'.
               - corpus (str): Corpus for Nearest Neighbor. Required when mode is 'nn_corpus'.
               - top_k (int): Top K for Nearest Neighbor. Required when mode is 'nn_corpus'.
               - queries (list[str]): Queries for Nearest Neighbor. Required when mode is 'nn_corpus' or 'nn_train_data'.
        Returns:
            The transformed bytes input.
        """
        input_str = json.dumps(
            {
                "text_inputs": text_inputs,
                **model_kwargs
            }
        )
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> List[List[float]]:
        """
        Transforms the bytes output from the endpoint into a list of embeddings.
        Args:
            output: The bytes output from SageMaker endpoint.
        Returns:
            The transformed output - list of embeddings
        Note:
            The length of the outer list is the number of input strings.
            The length of the inner lists is the embedding dimension.
        """
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json["embedding"]

In [24]:
bge_content_handler = BGEContentHandlerV15()
sagemaker_embeddings = SagemakerEndpointEmbeddings(
    endpoint_name=embedding_endpoint_name,
    region_name=region,
    model_kwargs={"mode": "embedding"},
    content_handler=bge_content_handler,
)

In [45]:
import os
contents = os.listdir()
pdf_files = [item for item in contents if item.endswith('.pdf')]

print("Contents of the current directory:")
for item in pdf_files:
    print(item)

Contents of the current directory:
recipe-data-sagemaker.pdf
recipe-data-sagemaker2.pdf


In [26]:
#!pip install openpyxl

In [27]:
#import os

# Set the directory path to the correct folder
#directory_path = 'user-default-efs'

#contents = os.listdir()
# Filter the list to get only PDF files
#xlsx_files = [item for item in contents if item.endswith('.xlsx')]

# Print the PDF files
#print("Excel files in the specified directory:")
#for item in xlsx_files:
    #print(item)

In [60]:
import json
from typing import List, Dict
from langchain import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain.schema import HumanMessage
from langchain.document_loaders import PyPDFLoader
import os
from pathlib import Path

# Define the optimized prompt template with specific formatting for recipes
prompt_template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant specialized in providing recipes. When answering questions about recipes, always structure your response as follows:
1. Recipe Name
2. Ingredients (as a numbered list)
3. Instructions (as a numbered list)
4. Any additional notes or tips
Ensure that ingredients and instructions are always presented as numbered lists for clarity and ease of use.
Use the following context to answer the user's query:
{context}
<|eot_id|><|start_header_id|>user<|end_header_id|>
{query}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Provide a complete recipe following the structure outlined above.
"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "query"]
)

class DocumentProcessor:
    def __init__(self, chunk_size=1000, chunk_overlap=100):
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=self.chunk_size,
            chunk_overlap=self.chunk_overlap,
            length_function=len,
            is_separator_regex=False,
        )

    def load_pdf_documents(self, pdf_directory: str = "."):
        """
        Load and process PDF files from a directory
        """
        documents = []
        # Get list of PDF files in the directory
        contents = os.listdir(pdf_directory)
        pdf_files = [item for item in contents if item.endswith('.pdf')]
        
        for pdf_file in pdf_files:
            try:
                file_path = os.path.join(pdf_directory, pdf_file)
                loader = PyPDFLoader(file_path)
                pdf_documents = loader.load()
                
                # Add metadata about source file
                for doc in pdf_documents:
                    doc.metadata['source'] = pdf_file
                
                documents.extend(pdf_documents)
                print(f"Successfully processed: {pdf_file}")
                
            except Exception as e:
                print(f"Error processing {pdf_file}: {str(e)}")
        
        return documents

    def split_documents(self, documents):
        """
        Split documents into chunks
        """
        return self.text_splitter.split_documents(documents)

def process_documents(pdf_directory: str):
    """
    Process documents with specified chunk parameters
    """
    processor = DocumentProcessor(chunk_size=1000, chunk_overlap=100)
    documents = processor.load_pdf_documents(pdf_directory)
    docs = processor.split_documents(documents)
    
    # Create Vector Store
    vectorstore_faiss = FAISS.from_documents(
        docs,
        sagemaker_embeddings,
    )
    return VectorStoreIndexWrapper(vectorstore=vectorstore_faiss)

def load_allergens(allergens_file: str) -> Dict:
    """
    Load allergens from JSON file
    """
    try:
        with open(allergens_file, 'r') as f:
            return json.load(f)
    except FileNotFoundError:
        print(f"Warning: Allergens file {allergens_file} not found. Creating empty allergens dict.")
        return {}

def filter_documents_by_allergen(docs, allergen):
    """
    Filter out documents containing specified allergen
    """
    if allergen:
        return [doc for doc in docs if allergen.lower() not in doc.page_content.lower()]
    return docs

def query_with_allergen_filter(wrapper_store_faiss, llm, query: str, allergen: str = None):
    """
    Query documents with allergen filtering
    """
    try:
        retrieved_docs = wrapper_store_faiss.vectorstore.similarity_search(query)
        filtered_docs = filter_documents_by_allergen(retrieved_docs, allergen)
        context = "\n".join([doc.page_content for doc in filtered_docs])
        formatted_prompt = PROMPT.format(context=context, query=query)
        response = llm.invoke([HumanMessage(content=formatted_prompt)])
        
        if isinstance(response, str):
            return response
        elif hasattr(response, 'content'):
            return response.content
        else:
            return str(response)
    except Exception as e:
        print(f"Error in query processing: {str(e)}")
        return f"Error processing query: {str(e)}"

def process_query(wrapper_store_faiss, llm, user_query: str):
    """
    Process user query and handle allergen extraction
    """
    allergen = None
    if "allergic to" in user_query.lower():
        allergen = user_query.lower().split("allergic to")[-1].split(",")[0].strip()
        query = f"Please provide a recipe that doesn't include {allergen}. {user_query}"
    else:
        query = user_query
    
    return query_with_allergen_filter(wrapper_store_faiss, llm, query, allergen)

def initialize_recipe_system(pdf_directory: str, allergens_file: str):
    """
    Initialize the recipe system with PDF documents and allergen information
    """
    try:
        wrapper_store_faiss = process_documents(pdf_directory)
        allergens = load_allergens(allergens_file)
        return wrapper_store_faiss, allergens
    except Exception as e:
        print(f"Error initializing recipe system: {str(e)}")
        raise

In [61]:
# allergy = Nut & Treenut / cousine = Thai
# Example Usage
query = "I am allergic to nuts and tree nuts. Can you suggest a Thai curry recipe that is completely nut-free?"
result = process_query(wrapper_store_faiss, llm, query)
print(result)

{"inputs": "Human: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\nYou are a helpful assistant specialized in providing recipes. When answering questions about recipes, always structure your response as follows:\n1. Recipe Name\n2. Ingredients (as a numbered list)\n3. Instructions (as a numbered list)\n4. Any additional notes or tips\nEnsure that ingredients and instructions are always presented as numbered lists for clarity and ease of use.\nUse the following context to answer the user's query:\nturnovers\nbriefly.\nSprinkle\nwith \nconfectioners'\nsugar,\nif\ndesired,\nand \nserve\nimmediately.\nMama's\nCollards\n2\nlarge\nbunches\ncollards,\nstemmed, \nwashed\nvery\nwell\nand\ntorn\ninto\npieces, \n1\nto\n2\ncups\nwater,\n2\nto\n3\nchicken \nthighs,\n1/4\ncup\ncorn\noil,\nSalt\nand \npepper\nto\ntaste\nBoil\nchicken\nabout\n20\nminutes.\nAdd \ncollards\nand\nseasoning.\nLower\nheat, \ncook\nabout\n2\nhours\nor\nuntil\ngreens\nare \ntender.\nAloo\nPie\nwith\nApple-Mango\

In [46]:
# allergy = Gluten (Wheat) / cousine = Italian
# Example Usage
query = "I have celiac disease. Please suggest an Italian polenta-based recipe that's naturally gluten-free."
result = process_query(wrapper_store_faiss, llm, query)
print(result)

{"inputs": "Human: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\nYou are a helpful assistant specialized in providing recipes. When answering questions about recipes, always structure your response as follows:\n1. Recipe Name\n2. Ingredients (as a numbered list)\n3. Instructions (as a numbered list)\n4. Any additional notes or tips\nEnsure that ingredients and instructions are always presented as numbered lists for clarity and ease of use.\nUse the following context to answer the user's query:\nglass.\nDust\nwith\ncinnamon. \nPhotograph\nby\nLisa\nShin\nHot\nPotato\nBlintzes\n12\nmedium\npotatoes,\npeeled,\ncut\ninto \nlarge\nchunks,\n4\ncarrots,\npeeled,\ncut \ninto\nlarge\nchunks,\n3\ntablespoons\nextra \nvirgin\nolive\noil,\n2\ncups\nyellow\nonion, \nchopped,\n1\ncup\nscallions,\nchopped,\n3 \ncloves\ngarlic,\nminced,\n1\ncan\ngreen \nchile\nstrips,\nchopped,\n1/2\ncup\nbutter, \n1/2\ncup\nsour\ncream,\nSalt\nand\npepper, \nMilk,\nhalf-and-half,\nor\nmocha\nmix,\nas \

In [49]:
# allergy = Milk & Egg / cousine = Dessert
# Example Usage
query = "Looking for an ice cream alternative - I'm allergic to milk and eggs. Can you suggest a frozen dessert recipe?"
result = process_query(wrapper_store_faiss, llm, query)
print(result)

{"inputs": "Human: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\nYou are a helpful assistant specialized in providing recipes. When answering questions about recipes, always structure your response as follows:\n1. Recipe Name\n2. Ingredients (as a numbered list)\n3. Instructions (as a numbered list)\n4. Any additional notes or tips\nEnsure that ingredients and instructions are always presented as numbered lists for clarity and ease of use.\nUse the following context to answer the user's query:\nflavored\ncarbonated\nbeverage,\nchilled, \n1.75\nquart\nvanilla\nor\nvanilla\nbean\nice \ncream\nscoop,\nspoon\nhalf\nof\nthe\nice\ncream \nover\nfruit\nmixture;\nstir\nuntil\nfrothy. \nTop\nwith\nremaining\nscoops\nof\nice \ncream\nforming\n\"ice\ncream\nsnowballs\" \natop\nthe\nfroth.\nAmorphic\nSugar\nCenterpiece\nwith \nFlowers\n5\ncups\nor\n1\nkilo\nsugar,\n1\n1/4\ncup\nor \n400\ngrams\ncorn\nsyrup,\n1\ncup\nplus\n2 \ntablespoons\nor\n250\ngrams\nwater,\nFood \ncolor\npaste\

In [50]:
# allergy = Soy / cousine = Vegan
# Example Usage
query = "I am allergic to soy and eat vegan. Can you share a recipe for a protein-rich main dish using legumes instead of soy products?"
result = process_query(wrapper_store_faiss, llm, query)
print(result)

{"inputs": "Human: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\nYou are a helpful assistant specialized in providing recipes. When answering questions about recipes, always structure your response as follows:\n1. Recipe Name\n2. Ingredients (as a numbered list)\n3. Instructions (as a numbered list)\n4. Any additional notes or tips\nEnsure that ingredients and instructions are always presented as numbered lists for clarity and ease of use.\nUse the following context to answer the user's query:\ntablespoon\nsalt,\n1\ntablespoon\nsesame \noil,\n1\npackage\ngyoza\nwrappers,\nBlack \npepper,\n3/4\npound\nflank\nsteak,\n3 \nquarts\nwater,\n1/2\nteaspoon\nsalt,\n1 \nteaspoon\nsoy\nsauce,\n1\nenvelope \nwonton\nsoup\nbase,\n1\nscallion\nsliced \nthinly\non\nthe\ndiagonal,\n1\nbeaten\negg, \ncooked\nin\none\nlayer\nand\njulienned,\n1 \nsheet\nnori\n(seaweed)\ncut\ninto\n2-inch \nlong\njulienne\nwater\nfor\n1\nminute\nuntil\nwilted,\nthen \nshock\nin\nice\nwater.\nRoughly\nchop\n

In [51]:
# allergy = Fish & Crustacean / cousine = Japanese
# Example Usage
query = "I have seafood allergies (fish and shellfish). Can you provide a recipe for vegetable tempura with a dipping sauce that doesn't contain dashi or fish sauce?"
result = process_query(wrapper_store_faiss, llm, query)
print(result)

{"inputs": "Human: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\nYou are a helpful assistant specialized in providing recipes. When answering questions about recipes, always structure your response as follows:\n1. Recipe Name\n2. Ingredients (as a numbered list)\n3. Instructions (as a numbered list)\n4. Any additional notes or tips\nEnsure that ingredients and instructions are always presented as numbered lists for clarity and ease of use.\nUse the following context to answer the user's query:\ntablespoon\nsalt,\n1\ntablespoon\nsesame \noil,\n1\npackage\ngyoza\nwrappers,\nBlack \npepper,\n3/4\npound\nflank\nsteak,\n3 \nquarts\nwater,\n1/2\nteaspoon\nsalt,\n1 \nteaspoon\nsoy\nsauce,\n1\nenvelope \nwonton\nsoup\nbase,\n1\nscallion\nsliced \nthinly\non\nthe\ndiagonal,\n1\nbeaten\negg, \ncooked\nin\none\nlayer\nand\njulienned,\n1 \nsheet\nnori\n(seaweed)\ncut\ninto\n2-inch \nlong\njulienne\nwater\nfor\n1\nminute\nuntil\nwilted,\nthen \nshock\nin\nice\nwater.\nRoughly\nchop\n

## Metric

* ## Blue

In [56]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /home/sagemaker-
[nltk_data]     user/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [57]:
import random
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
import nltk
from typing import List, Dict

# Download required NLTK resources
def setup_nltk():
    """
    Download required NLTK resources
    """
    try:
        nltk.download('punkt')
        nltk.download('punkt_tab')
        print("Successfully downloaded NLTK resources")
    except Exception as e:
        print(f"Error downloading NLTK resources: {str(e)}")
        raise

def generate_random_recipes(wrapper_store_faiss, llm, num_recipes: int = 10) -> List[Dict]:
    """
    Generate random recipes using different queries with allergy considerations
    """
    # List of common allergies
    allergies = [
        "nuts",
        "gluten",
        "dairy",
        "eggs",
        "soy",
        "fish",
        "shellfish",
        "wheat"
    ]
    
    # List of cuisines and dishes
    cuisines = [
        "Italian",
        "Japanese",
        "Thai",
        "Mexican",
        "Indian",
        "French",
        "Chinese",
        "Greek",
        "Spanish",
        "Vietnamese",
        "Mediterranean",
        "Korean"
    ]
    
    # Dish types
    dish_types = [
        "main course",
        "dessert",
        "appetizer",
        "soup",
        "salad",
        "curry",
        "stir-fry",
        "noodle dish",
        "rice dish"
    ]
    
    # Query templates with allergy considerations
    query_templates = [
        "I am allergic to {}. Can you give me a {} {} recipe that's safe for my allergy?",
        "Looking for a {} {} recipe - I have a {} allergy.",
        "Can you suggest a {} {} dish without any {}?",
        "I need a {} recipe for {} that doesn't contain any {} or its derivatives."
    ]
    
    generated_recipes = []
    print("Generating random recipes with allergy considerations...")
    
    for i in range(num_recipes):
        try:
            # Randomly decide whether to include allergy consideration
            has_allergy = random.choice([True, True, False])  # 2/3 chance of having allergy
            
            if has_allergy:
                # Generate query with allergy
                allergy = random.choice(allergies)
                cuisine = random.choice(cuisines)
                dish_type = random.choice(dish_types)
                template = random.choice(query_templates)
                query = template.format(allergy, cuisine, dish_type)
            else:
                # Generate regular query
                cuisine = random.choice(cuisines)
                dish_type = random.choice(dish_types)
                query = f"Can you give me a {cuisine} {dish_type} recipe?"
            
            # Generate recipe
            result = process_query(wrapper_store_faiss, llm, query)
            generated_recipes.append({
                'query': query,
                'recipe': result,
                'has_allergy': has_allergy
            })
            print(f"Generated recipe {i+1}/10: {'(Allergy-aware) ' if has_allergy else ''}{query}")
            
        except Exception as e:
            print(f"Error generating recipe {i+1}: {str(e)}")
    
    return generated_recipes

def calculate_average_bleu(generated_recipes: List[Dict]) -> Dict:
    """
    Calculate average BLEU scores for generated recipes
    """
    smoothing = SmoothingFunction().method1
    bleu_scores = {
        'bleu-1': [],
        'bleu-2': [],
        'bleu-3': [],
        'bleu-4': [],
        'allergy_aware_bleu': []  # Separate scoring for allergy-aware recipes
    }
    
    # Weight schemes for different BLEU scores
    weights = {
        'bleu-1': (1.0, 0.0, 0.0, 0.0),
        'bleu-2': (0.5, 0.5, 0.0, 0.0),
        'bleu-3': (0.33, 0.33, 0.33, 0.0),
        'bleu-4': (0.25, 0.25, 0.25, 0.25)
    }
    
    print("\nCalculating BLEU scores...")
    for i, recipe_data in enumerate(generated_recipes):
        # Use other recipes as references
        references = [other['recipe'] for j, other in enumerate(generated_recipes) if j != i]
        
        # Tokenize references and candidate
        reference_tokens = [nltk.word_tokenize(ref.lower()) for ref in references]
        candidate_tokens = nltk.word_tokenize(recipe_data['recipe'].lower())
        
        # Calculate BLEU scores with different weights
        for bleu_type, weight in weights.items():
            try:
                score = sentence_bleu(reference_tokens,
                                    candidate_tokens,
                                    weights=weight,
                                    smoothing_function=smoothing)
                bleu_scores[bleu_type].append(score)
                
                # Add to allergy_aware_bleu if recipe has allergy consideration
                if recipe_data['has_allergy'] and bleu_type == 'bleu-4':
                    bleu_scores['allergy_aware_bleu'].append(score)
                    
            except Exception as e:
                print(f"Error calculating {bleu_type} for recipe {i+1}: {str(e)}")
    
    # Calculate averages
    avg_scores = {
        metric: sum(scores) / len(scores) if scores else 0
        for metric, scores in bleu_scores.items()
    }
    
    return avg_scores

def run_random_recipe_evaluation(wrapper_store_faiss, llm):
    """
    Run complete evaluation process
    """
    try:
        # First ensure NLTK resources are downloaded
        setup_nltk()
        
        # Generate 10 random recipes
        recipes = generate_random_recipes(wrapper_store_faiss, llm, num_recipes=10)
        
        # Calculate BLEU scores
        bleu_results = calculate_average_bleu(recipes)
        
        # Print results
        print("\nAverage BLEU Scores for 10 Random Recipes:")
        print("-" * 50)
        for metric, score in bleu_results.items():
            print(f"{metric}: {score:.4f}")
        
        # Print detailed recipe analysis
        print("\nGenerated Recipes Analysis:")
        print("-" * 50)
        allergy_recipes = sum(1 for r in recipes if r['has_allergy'])
        print(f"Total Recipes: 10")
        print(f"Allergy-Aware Recipes: {allergy_recipes}")
        print(f"Regular Recipes: {10 - allergy_recipes}")
        
        return bleu_results, recipes
        
    except Exception as e:
        print(f"Error in evaluation process: {str(e)}")
        return None, None

# Run the evaluation
if __name__ == "__main__":
    try:
        print("Starting random recipe BLEU score evaluation...")
        bleu_results, generated_recipes = run_random_recipe_evaluation(wrapper_store_faiss, llm)
        
    except Exception as e:
        print(f"Error running evaluation: {str(e)}")

Starting random recipe BLEU score evaluation...
Successfully downloaded NLTK resources
Generating random recipes with allergy considerations...


[nltk_data] Downloading package punkt to /home/sagemaker-
[nltk_data]     user/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/sagemaker-
[nltk_data]     user/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


{"inputs": "Human: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\nYou are a helpful assistant specialized in providing recipes. When answering questions about recipes, always structure your response as follows:\n1. Recipe Name\n2. Ingredients (as a numbered list)\n3. Instructions (as a numbered list)\n4. Any additional notes or tips\nEnsure that ingredients and instructions are always presented as numbered lists for clarity and ease of use.\nUse the following context to answer the user's query:\ncups\ncake\nflour\n(not\nself-rising), \nsifted,\n1\ntablespoon\nbaking\npowder, \n1/4\nteaspoon\nsalt,\n1\n1/2\ncups\nsugar, \n1/2\ncup\nmilk,\n1/2\ncup\nwater,\n1 \nteaspoon\npure\nvanilla\nextract,\n1/4 \nteaspoon\npure\nalmond\nextract,\n3\nlarge \negg\nwhites,\nLady\nBaltimore\nFrosting, \nrecipe\nfollows,\n6\ndried\nfigs,\nfinely \nchopped,\n1/2\ncup\nraisins,\nchopped,\n1/2 \ncup\npecans\nor\nwalnuts,\nchopped\n1/2\nteaspoon\nlight\ncorn\nsyrup \n2/3\ncup\nwater \n2\nlarge\

* ## ROUGE

In [63]:
!pip install rouge-score

Collecting rouge-score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: rouge-score
  Building wheel for rouge-score (setup.py) ... [?25ldone
[?25h  Created wheel for rouge-score: filename=rouge_score-0.1.2-py3-none-any.whl size=24935 sha256=480a1aa97c75ac1d5c89beeae0f57e49d1c5a33454f52aed091c2f9710051fc6
  Stored in directory: /home/sagemaker-user/.cache/pip/wheels/1e/19/43/8a442dc83660ca25e163e1bd1f89919284ab0d0c1475475148
Successfully built rouge-score
Installing collected packages: rouge-score
Successfully installed rouge-score-0.1.2


In [64]:
from rouge_score import rouge_scorer
from typing import List, Dict
import numpy as np

In [65]:
def calculate_rouge_scores(generated_recipes: List[Dict]) -> Dict:
    """
    Calculate ROUGE scores for generated recipes
    """
    # Initialize ROUGE scorer with different ROUGE variants
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    
    # Initialize dictionaries to store scores
    rouge_scores = {
        'rouge1': {'precision': [], 'recall': [], 'fmeasure': []},
        'rouge2': {'precision': [], 'recall': [], 'fmeasure': []},
        'rougeL': {'precision': [], 'recall': [], 'fmeasure': []}
    }
    
    print("\nCalculating ROUGE scores...")
    for i, recipe_data in enumerate(generated_recipes):
        # Use other recipes as references
        references = [other['recipe'] for j, other in enumerate(generated_recipes) if j != i]
        
        # Calculate scores against each reference and take the maximum
        for reference in references:
            try:
                scores = scorer.score(reference, recipe_data['recipe'])
                
                # Store scores for each ROUGE variant
                for rouge_type, score in scores.items():
                    rouge_scores[rouge_type]['precision'].append(score.precision)
                    rouge_scores[rouge_type]['recall'].append(score.recall)
                    rouge_scores[rouge_type]['fmeasure'].append(score.fmeasure)
                    
            except Exception as e:
                print(f"Error calculating ROUGE scores for recipe {i+1}: {str(e)}")
    
    # Calculate averages
    avg_scores = {}
    for rouge_type in rouge_scores:
        avg_scores[rouge_type] = {
            'precision': np.mean(rouge_scores[rouge_type]['precision']),
            'recall': np.mean(rouge_scores[rouge_type]['recall']),
            'fmeasure': np.mean(rouge_scores[rouge_type]['fmeasure'])
        }
    
    return avg_scores

def run_rouge_evaluation(wrapper_store_faiss, llm):
    """
    Run complete ROUGE evaluation process
    """
    try:
        # Generate 10 random recipes using the existing function
        recipes = generate_random_recipes(wrapper_store_faiss, llm, num_recipes=10)
        
        # Calculate ROUGE scores
        rouge_results = calculate_rouge_scores(recipes)
        
        # Print results
        print("\nROUGE Scores for 10 Random Recipes:")
        print("-" * 50)
        for rouge_type, metrics in rouge_results.items():
            print(f"\n{rouge_type.upper()}:")
            for metric_name, value in metrics.items():
                print(f"  {metric_name}: {value:.4f}")
        
        # Print recipe analysis
        print("\nGenerated Recipes Analysis:")
        print("-" * 50)
        allergy_recipes = sum(1 for r in recipes if r['has_allergy'])
        print(f"Total Recipes: 10")
        print(f"Allergy-Aware Recipes: {allergy_recipes}")
        print(f"Regular Recipes: {10 - allergy_recipes}")
        
        return rouge_results, recipes
        
    except Exception as e:
        print(f"Error in ROUGE evaluation process: {str(e)}")
        return None, None

# Combined evaluation function for both BLEU and ROUGE
def run_combined_evaluation(wrapper_store_faiss, llm):
    """
    Run both BLEU and ROUGE evaluations
    """
    try:
        print("Starting combined BLEU and ROUGE evaluation...")
        
        # Generate recipes once and use for both evaluations
        recipes = generate_random_recipes(wrapper_store_faiss, llm, num_recipes=10)
        
        # Calculate BLEU scores
        bleu_results = calculate_average_bleu(recipes)
        
        # Calculate ROUGE scores
        rouge_results = calculate_rouge_scores(recipes)
        
        # Print combined results
        print("\nCombined Evaluation Results:")
        print("=" * 50)
        
        print("\nBLEU Scores:")
        print("-" * 30)
        for metric, score in bleu_results.items():
            print(f"{metric}: {score:.4f}")
            
        print("\nROUGE Scores:")
        print("-" * 30)
        for rouge_type, metrics in rouge_results.items():
            print(f"\n{rouge_type.upper()}:")
            for metric_name, value in metrics.items():
                print(f"  {metric_name}: {value:.4f}")
        
        return {
            'bleu': bleu_results,
            'rouge': rouge_results,
            'recipes': recipes
        }
        
    except Exception as e:
        print(f"Error in combined evaluation: {str(e)}")
        return None

# Example usage
if __name__ == "__main__":
    try:
        # Install required package if not already installed
        import subprocess
        import sys
        
        def install_rouge():
            subprocess.check_call([sys.executable, "-m", "pip", "install", "rouge-score"])
            print("Successfully installed rouge-score package")
        
        try:
            import rouge_score
        except ImportError:
            print("Installing rouge-score package...")
            install_rouge()
        
        # Run combined evaluation
        results = run_combined_evaluation(wrapper_store_faiss, llm)
        
    except Exception as e:
        print(f"Error running evaluation: {str(e)}")

Starting combined BLEU and ROUGE evaluation...
Generating random recipes with allergy considerations...
{"inputs": "Human: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\nYou are a helpful assistant specialized in providing recipes. When answering questions about recipes, always structure your response as follows:\n1. Recipe Name\n2. Ingredients (as a numbered list)\n3. Instructions (as a numbered list)\n4. Any additional notes or tips\nEnsure that ingredients and instructions are always presented as numbered lists for clarity and ease of use.\nUse the following context to answer the user's query:\ncrumbs,\n3\nhandfuls,\n1/3\ncup\ngrated \nParmigiano-Reggiano,\n2\nhandfuls,\n1 \negg,\nbeaten,\n1/4\nteaspoon\nfreshly \ngrated\nnutmeg,\neyeball\nit,\nCoarse\nsalt \nand\nfreshly\nground\npepper,\n2\npackages \nfrozen\nchopped\nspinach,\n10\nounces \neach,\n3\ntablespoons\nextra-virgin\nolive \noil,\n1\nfor\nonion,\n2\ntablespoons\nto\nfry \ncakes,\n1\nsmall\nonion,\nchopped,\

* ## F1

In [67]:
from sklearn.metrics import precision_recall_fscore_support
import numpy as np
from typing import List, Dict
import json

def calculate_f1_scores(wrapper_store_faiss, llm, num_recipes: int = 10):
    """
    Calculate F1 scores for recipe generation using existing allergens file
    """
    try:
        # Load allergens from json file
        with open('allergens.json', 'r') as f:
            allergens = json.load(f)
        
        # Store results
        results = {
            'recipes': [],
            'predictions': [],
            'expected': []
        }
        
        print("Generating recipes and calculating F1 scores...")
        
        # Generate 10 random recipes using existing setup
        for i in range(num_recipes):
            try:
                # Randomly decide whether to include allergy
                has_allergy = random.choice([True, True, False])  # 2/3 chance of having allergy
                
                if has_allergy:
                    # Select random allergen from loaded allergens
                    allergen = random.choice(list(allergens.keys()))
                    query = f"I am allergic to {allergen}. Please provide a safe recipe."
                else:
                    allergen = None
                    query = "Please provide a recipe."
                
                # Generate recipe
                recipe = process_query(wrapper_store_faiss, llm, query)
                
                # Store results
                results['recipes'].append({
                    'query': query,
                    'recipe': recipe,
                    'allergen': allergen
                })
                
                # For F1 calculation
                results['expected'].append(1)  # We expect the recipe to match the query
                results['predictions'].append(1 if allergen is None or allergen.lower() not in recipe.lower() else 0)
                
                print(f"Processed recipe {i+1}/10: {'(Allergy-aware) ' if allergen else ''}{query[:50]}...")
                
            except Exception as e:
                print(f"Error processing recipe {i+1}: {str(e)}")
        
        # Calculate F1 scores
        precision, recall, f1, _ = precision_recall_fscore_support(
            results['expected'],
            results['predictions'],
            average='binary',
            zero_division=0
        )
        
        # Calculate accuracy
        accuracy = sum(1 for e, p in zip(results['expected'], results['predictions']) if e == p) / len(results['expected'])
        
        # Print results
        print("\nF1 Score Evaluation Results:")
        print("=" * 50)
        print(f"Total Recipes Evaluated: {num_recipes}")
        print(f"Allergen-Aware Recipes: {sum(1 for r in results['recipes'] if r['allergen'])}")
        print(f"Regular Recipes: {sum(1 for r in results['recipes'] if not r['allergen'])}")
        print("\nMetrics:")
        print(f"Precision: {precision:.4f}")
        print(f"Recall: {recall:.4f}")
        print(f"F1 Score: {f1:.4f}")
        print(f"Accuracy: {accuracy:.4f}")
        
        # Print detailed recipe analysis
        print("\nDetailed Recipe Analysis:")
        print("-" * 50)
        for i, recipe in enumerate(results['recipes']):
            print(f"\nRecipe {i+1}:")
            print(f"Query: {recipe['query']}")
            print(f"Allergen: {recipe['allergen'] if recipe['allergen'] else 'None'}")
            print(f"Prediction matched expected: {'Yes' if results['predictions'][i] == results['expected'][i] else 'No'}")
        
        return {
            'precision': precision,
            'recall': recall,
            'f1_score': f1,
            'accuracy': accuracy,
            'results': results
        }
        
    except FileNotFoundError:
        print("Error: allergen.json file not found. Please ensure it exists in the current directory.")
        return None
    except Exception as e:
        print(f"Error in F1 calculation: {str(e)}")
        return None

# Run the evaluation
if __name__ == "__main__":
    try:
        print("Starting F1 score evaluation...")
        f1_results = calculate_f1_scores(wrapper_store_faiss, llm)
        
    except Exception as e:
        print(f"Error running evaluation: {str(e)}")

Starting F1 score evaluation...
Generating recipes and calculating F1 scores...
{"inputs": "Human: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\nYou are a helpful assistant specialized in providing recipes. When answering questions about recipes, always structure your response as follows:\n1. Recipe Name\n2. Ingredients (as a numbered list)\n3. Instructions (as a numbered list)\n4. Any additional notes or tips\nEnsure that ingredients and instructions are always presented as numbered lists for clarity and ease of use.\nUse the following context to answer the user's query:\ntablespoon\nsalt,\n1\ntablespoon\nsesame \noil,\n1\npackage\ngyoza\nwrappers,\nBlack \npepper,\n3/4\npound\nflank\nsteak,\n3 \nquarts\nwater,\n1/2\nteaspoon\nsalt,\n1 \nteaspoon\nsoy\nsauce,\n1\nenvelope \nwonton\nsoup\nbase,\n1\nscallion\nsliced \nthinly\non\nthe\ndiagonal,\n1\nbeaten\negg, \ncooked\nin\none\nlayer\nand\njulienned,\n1 \nsheet\nnori\n(seaweed)\ncut\ninto\n2-inch \nlong\njulienne\nwater

* ## Embedding Quality:

Cosine similarity: Compare the embeddings of similar recipes to ensure they are close in the vector space

In [69]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from typing import List, Dict
import random

def calculate_embeddings(text: str, sagemaker_embeddings) -> np.ndarray:
    """
    Calculate embeddings for a given text using SageMaker embeddings
    """
    try:
        embedding = sagemaker_embeddings.embed_query(text)
        return np.array(embedding).reshape(1, -1)
    except Exception as e:
        print(f"Error calculating embeddings: {str(e)}")
        return None

def calculate_cosine_similarities(wrapper_store_faiss, llm, sagemaker_embeddings, num_recipes: int = 10):
    """
    Calculate cosine similarities for generated recipes
    """
    try:
        # Load allergens
        with open('allergens.json', 'r') as f:
            allergens = json.load(f)
            
        results = {
            'recipes': [],
            'similarities': [],
            'query_recipe_similarities': [],  # Similarity between query and recipe
            'pairwise_similarities': []       # Similarities between different recipes
        }
        
        print("Generating recipes and calculating cosine similarities...")
        
        # Generate recipes and calculate embeddings
        for i in range(num_recipes):
            try:
                # Randomly decide whether to include allergy
                has_allergy = random.choice([True, True, False])
                
                if has_allergy:
                    allergen = random.choice(list(allergens.keys()))
                    query = f"I am allergic to {allergen}. Please provide a safe recipe."
                else:
                    allergen = None
                    query = "Please provide a recipe."
                
                # Generate recipe
                recipe = process_query(wrapper_store_faiss, llm, query)
                
                # Calculate embeddings
                query_embedding = calculate_embeddings(query, sagemaker_embeddings)
                recipe_embedding = calculate_embeddings(recipe, sagemaker_embeddings)
                
                if query_embedding is not None and recipe_embedding is not None:
                    # Calculate cosine similarity between query and recipe
                    query_recipe_similarity = cosine_similarity(query_embedding, recipe_embedding)[0][0]
                    
                    results['recipes'].append({
                        'query': query,
                        'recipe': recipe,
                        'allergen': allergen,
                        'embedding': recipe_embedding
                    })
                    results['query_recipe_similarities'].append(query_recipe_similarity)
                
                print(f"Processed recipe {i+1}/10: {'(Allergy-aware) ' if allergen else ''}{query[:50]}...")
                
            except Exception as e:
                print(f"Error processing recipe {i+1}: {str(e)}")
        
        # Calculate pairwise similarities between recipes
        num_processed = len(results['recipes'])
        pairwise_matrix = np.zeros((num_processed, num_processed))
        
        for i in range(num_processed):
            for j in range(i + 1, num_processed):
                similarity = cosine_similarity(
                    results['recipes'][i]['embedding'],
                    results['recipes'][j]['embedding']
                )[0][0]
                pairwise_matrix[i][j] = similarity
                pairwise_matrix[j][i] = similarity
        
        # Calculate average similarities
        avg_query_recipe_similarity = np.mean(results['query_recipe_similarities'])
        avg_pairwise_similarity = np.mean(pairwise_matrix[np.triu_indices(num_processed, k=1)])
        
        # Print results
        print("\nCosine Similarity Results:")
        print("=" * 50)
        print(f"Total Recipes Evaluated: {num_processed}")
        print(f"Allergen-Aware Recipes: {sum(1 for r in results['recipes'] if r['allergen'])}")
        print(f"Regular Recipes: {sum(1 for r in results['recipes'] if not r['allergen'])}")
        
        print("\nSimilarity Metrics:")
        print(f"Average Query-Recipe Similarity: {avg_query_recipe_similarity:.4f}")
        print(f"Average Pairwise Recipe Similarity: {avg_pairwise_similarity:.4f}")
        
        # Print detailed recipe analysis
        print("\nDetailed Recipe Analysis:")
        print("-" * 50)
        for i, recipe in enumerate(results['recipes']):
            print(f"\nRecipe {i+1}:")
            print(f"Query: {recipe['query']}")
            print(f"Allergen: {recipe['allergen'] if recipe['allergen'] else 'None'}")
            print(f"Query-Recipe Similarity: {results['query_recipe_similarities'][i]:.4f}")
            if i < num_processed - 1:
                next_recipe_similarity = pairwise_matrix[i][i+1]
                print(f"Similarity to Next Recipe: {next_recipe_similarity:.4f}")
        
        # Create similarity heatmap data
        similarity_data = {
            'pairwise_matrix': pairwise_matrix.tolist(),
            'query_recipe_similarities': results['query_recipe_similarities'],
            'avg_query_recipe_similarity': avg_query_recipe_similarity,
            'avg_pairwise_similarity': avg_pairwise_similarity
        }
        
        return {
            'metrics': {
                'avg_query_recipe_similarity': avg_query_recipe_similarity,
                'avg_pairwise_similarity': avg_pairwise_similarity,
            },
            'detailed_results': results,
            'similarity_matrix': similarity_data
        }
        
    except FileNotFoundError:
        print("Error: allergen.json file not found. Please ensure it exists in the current directory.")
        return None
    except Exception as e:
        print(f"Error in cosine similarity calculation: {str(e)}")
        return None

# Run the evaluation
if __name__ == "__main__":
    try:
        print("Starting cosine similarity evaluation...")
        similarity_results = calculate_cosine_similarities(wrapper_store_faiss, llm, sagemaker_embeddings)
        
    except Exception as e:
        print(f"Error running evaluation: {str(e)}")

Starting cosine similarity evaluation...
Generating recipes and calculating cosine similarities...
{"inputs": "Human: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\nYou are a helpful assistant specialized in providing recipes. When answering questions about recipes, always structure your response as follows:\n1. Recipe Name\n2. Ingredients (as a numbered list)\n3. Instructions (as a numbered list)\n4. Any additional notes or tips\nEnsure that ingredients and instructions are always presented as numbered lists for clarity and ease of use.\nUse the following context to answer the user's query:\ncups\ncake\nflour\n(not\nself-rising), \nsifted,\n1\ntablespoon\nbaking\npowder, \n1/4\nteaspoon\nsalt,\n1\n1/2\ncups\nsugar, \n1/2\ncup\nmilk,\n1/2\ncup\nwater,\n1 \nteaspoon\npure\nvanilla\nextract,\n1/4 \nteaspoon\npure\nalmond\nextract,\n3\nlarge \negg\nwhites,\nLady\nBaltimore\nFrosting, \nrecipe\nfollows,\n6\ndried\nfigs,\nfinely \nchopped,\n1/2\ncup\nraisins,\nchopped,\n1/2 \n

* Allergen Compliance

Allergen avoidance rate: Calculate the percentage of recipes that successfully avoid specified allergens when requested.

* Response Diversity:

Unique recipe rate: Measure the percentage of unique recipes generated across multiple similar queries to ensure variety.




* Retrieval Efficiency:

Retrieval time: Measure the time taken to retrieve relevant documents from the FAISS index.

* End-to-End Performance:

Query-to-recipe time: Measure the total time from receiving a query to generating a complete recipe