# Integrated the trained conversational model with RAG and TTS

- **Authors:** Riyaadh Gani and Damilola Ogunleye
- **Project:** Food Recognition & Recipe LLM  
- **Purpose:** Creating VectorDB of recipe data and combining with RAG for the model

---

## Overview

Data location: https://drive.google.com/drive/folders/1kyBOrcHf6-pKnBNcq70CZ66Ul5qSpX0D?usp=drive_link

This notebook is used for inference of our conversational model with our RAG pipeline

**Output:** Functional model for recipe support: based on Recipe NLG data

In [1]:
%pip install pandas numpy faiss-cpu sentence_transformers transformers torch peft==0.11.1 tqdm openai python-dotenv

Collecting faiss-cpu
  Downloading faiss_cpu-1.13.1-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.6 kB)
Collecting peft==0.11.1
  Downloading peft-0.11.1-py3-none-any.whl.metadata (13 kB)
Downloading peft-0.11.1-py3-none-any.whl (251 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.6/251.6 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading faiss_cpu-1.13.1-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (23.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m68.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu, peft
  Attempting uninstall: peft
    Found existing installation: peft 0.18.0
    Uninstalling peft-0.18.0:
      Successfully uninstalled peft-0.18.0
Successfully installed faiss-cpu-1.13.1 peft-0.11.1


In [2]:
import pandas as pd
import numpy as np
import faiss
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
from dotenv import load_dotenv
from google.colab import drive, userdata
from openai import OpenAI
from IPython.display import Audio, display
import torch
import tqdm
from pathlib import Path
import os
from datetime import datetime

## Load the Model
Memory management is not easy! so load the model and then change to GPU to free up CPU RAM --> then load the data and the index

In [3]:
# Use colab resources if available
usingColab = True
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

if usingColab:
    drive.mount('/content/drive', force_remount=True, timeout_ms=240000)
    print("Google Colab connected to Google Drive")

    # Base project directory in Google Drive
    PROJECT_DIR = Path("/content/drive/MyDrive/LLM_Models/cooking-assistant-project")

    # change working directory
    os.chdir(PROJECT_DIR)

    # Verify structure
    print("\nDirectory structure:")
    for path in [PROJECT_DIR / "rag" / "datasets",
                PROJECT_DIR / "models" / "base",
                PROJECT_DIR / "models" / "gpt2-conversational-v1",
                PROJECT_DIR / "rag" / "vectordb"]:
        print(f"  {'✓' if path.exists() else '✗'} {path}")

Mounted at /content/drive
Google Colab connected to Google Drive

Directory structure:
  ✓ /content/drive/MyDrive/LLM_Models/cooking-assistant-project/rag/datasets
  ✓ /content/drive/MyDrive/LLM_Models/cooking-assistant-project/models/base
  ✓ /content/drive/MyDrive/LLM_Models/cooking-assistant-project/models/gpt2-conversational-v1
  ✓ /content/drive/MyDrive/LLM_Models/cooking-assistant-project/rag/vectordb


In [4]:
# Load base GPT-2 model
model_path = PROJECT_DIR / "models" / "base" / "gpt2-medium"
tokenizer = AutoTokenizer.from_pretrained(model_path)
base_model = AutoModelForCausalLM.from_pretrained(
    model_path,
    dtype=torch.float16,  # Half precision
    low_cpu_mem_usage=True
)

Have to load the base model + the adapter to actually access the model

In [5]:
adapter_path = PROJECT_DIR / "models" / "gpt2-conversational-v1" / "final"
print(f"Loading adapter from: {adapter_path}")
conversational_model = PeftModel.from_pretrained(base_model, adapter_path)

Loading adapter from: /content/drive/MyDrive/LLM_Models/cooking-assistant-project/models/gpt2-conversational-v1/final


In [6]:
# Set pad token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "left"  # Pad on the left for generation

    print(f"✓ Tokenizer loaded")
    print(f"  Vocab size: {len(tokenizer):,}")
    print(f"  Special tokens: {tokenizer.special_tokens_map}")

✓ Tokenizer loaded
  Vocab size: 50,257
  Special tokens: {'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}


In [7]:
# Move to GPU
device = 'cuda' if torch.cuda.is_available() else 'cpu'
conversational_model = conversational_model.to(device)
conversational_model.eval()

print(f"Model loaded on {device}")

Model loaded on cuda


In [8]:
# Load the recipe data
df = pd.read_csv('./rag/datasets/clean_recipes_10000.csv')
print(f"Loaded {len(df)} recipes")

Loaded 10000 recipes


In [9]:
# Load the FAISS index
index = faiss.read_index('./rag/vectordb/recipe_index_10000.faiss')
print(f"Loaded index with {index.ntotal} vectors")

Loaded index with 10000 vectors


In [10]:
# Load embedding model
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
print("Loaded embedding model")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Loaded embedding model


Define functions for rag implementation

In [11]:
def retrieve_recipes(query, k=3):
    """Retrieve top-k similar recipes"""
    q_emb = embedding_model.encode([query]).astype('float32')
    faiss.normalize_L2(q_emb)
    scores, indices = index.search(q_emb, k)

    results = []
    for idx, score in zip(indices[0], scores[0]):
        results.append({
            'response': df.iloc[idx]['response'],
            'similarity': float(score)
        })
    return results

In [12]:
def rag_answer(query, context="", k=2, max_new_tokens=256):
    """Generate answer using RAG"""

    # Retrieve
    retrieved = retrieve_recipes(query, k=k)

    # Build context
    context += "\n Similar recipes:\n"
    for i, rec in enumerate(retrieved, 1):
        context += f"{i}. {rec['response']}\n"

    # print("Context: ", context)


    # Create prompt
    prompt = f"""The following is a conversation between a user and a helpful cooking assistant. Use the added context to support the user query in a conversational manner"

{context}

User: {query}
Assistant:"""

    # Tokenize and generate
    inputs = tokenizer(
        prompt,
        return_tensors='pt',
        max_length=1024,
        truncation=True,
        padding=True
    ).to(device)

    with torch.no_grad():
        outputs = conversational_model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.7,      # Determines model randomness (closer to 0 is more deterministic, closer to 1 can be more creative)
            do_sample=True,       # Enables probablistic sampling
            top_p=0.9,            # Controls quality of next token generation
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract answer
    if "Assistant:" in response:
        answer = response.split("Assistant:")[-1].strip()
        if "User:" in answer:
            answer = answer.split("User:")[0].strip()
    else:
        answer = response

    return answer

# Load Text-To-Speech

In [13]:
# Load the api-key
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

# initialise OpenAI Client
client = OpenAI(api_key=OPENAI_API_KEY)

In [20]:
# Define the speaking style
RAMSAY_STYLE = (
    "Speak in the energetic, sharp, no-nonsense style of a fiery celebrity chef."
    "Be passionate, intense, and brutally honest, but keep it friendly and fun."
    "Maintain a playful, over-the-top chef persona."
)

In [16]:
# Use OpenAI to summarise the recipe text into a single keyword
def keyword_summary(text):
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "Extract a single keyword that best describes the main dish or concept from the following recipe text."},
            {"role": "user", "content": text}
        ],
        max_tokens=5,
        temperature=0
    )
    keyword = response.choices[0].message.content.strip()
    # Clean keyword for filename
    keyword = "".join(c for c in keyword if c.isalnum() or c in ("_", "-"))
    return keyword

In [23]:
def text_to_speech(text):
  """Convert text to speech"""

  keyword = keyword_summary(text)
  now = datetime.now().strftime("%Y%m%d_%H%M%S")
  speech_file_path = f"./audio/{keyword}_{now}.mp3"

  with client.audio.speech.with_streaming_response.create(
      model="gpt-4o-mini-tts",
      voice="onyx",
      input=text,
      instructions=RAMSAY_STYLE,
  ) as response:
      response.stream_to_file(speech_file_path)

  return speech_file_path

# Test the pipeline

## Single Turn

In [None]:
# Take user input and save as query
# query = input("How can I help you today?: ")
# print(f"\nQuery: {query}\n")
# answer = rag_answer(query, context="", k=1)
# print(f"\nAnswer: {answer}")

## Multi-Turn Conversation


In [26]:
# Loop until user quits or for max 3 turns
counter = 1
convo_history = []
query = ""
context = ""

while 'quit' not in query or counter < 3:
    if counter == 1:
      # get user input
      query = input("How can I help you today?: ")
      print(f"\nQuery {counter}: {query}\n")
      answer = rag_answer(query, context="", k=1)

      audio_file = text_to_speech(answer)
      display(Audio(audio_file))
      print(f"\nAnswer: {answer}")

      convo_history.append(f"User: {query}")
      convo_history.append(f"Assistant: {answer}")

    else:
      query = input("Further questions: ")
      if query == "quit":
        break
      print(f"\nQuery {counter}: {query}\n")

      # append convo history as string
      if context == "":
        context = "\n Previous conversation:\n"
      else:
        context += "\n"

      context += "\n".join(convo_history)

      answer = rag_answer(query, context=context, k=1)
      # convert to tts
      audio_file = text_to_speech(answer)
      display(Audio(audio_file))

      # print string
      print(f"\nAnswer: {answer}")

      convo_history.append(f"User: {query}")
      convo_history.append(f"Assistant: {answer}")

    counter += 1


How can I help you today?: I have minced meat, onions and tomatoes. What can i make for dinner?

Query 1: I have minced meat, onions and tomatoes. What can i make for dinner?




Answer: Yes, there are several ways to make it even more creamy and delicious. Here are some suggestions:


1. Creamy pasta sauce:
1. Place a heavy bottomed pot on a stove top. Add 1 cup of cream cheese and stir until smooth. Add remaining cream cheese and stir until smooth
2. Creamy sauce:
1. Add 1 cup of heavy cream and stir until smooth. Add remaining cream cheese and stir until smooth
3. Stir in remaining garlic and stir until combined
4. Add remaining pasta and stir until coated
5.
Further questions: Is this a healthy meal?

Query 2: Is this a healthy meal?




Answer: It depends on your dietary needs. I recommend cooking with a low carb diet to get the most nutrients. Here are some tips to help you make it more healthy:

- Eat more fruit: Some fruits and vegetables are high in fiber, vitamins, and minerals.
- Avoid processed foods: Some processed foods can cause digestive problems and may contribute to weight gain.

- Limit alcohol consumption: Alcohol is known to increase blood pressure and heart disease risk.

- Limit red meat: Red meat can cause inflammation and insulin resistance, which can lead to diabetes.

- Limit processed foods: Many processed foods contain added sugars, fat, and sodium. These can contribute to obesity, heart disease, and cancer.

- Limit red wine: Many red wine producers are using preservatives and other additives to add color, flavor, and texture. These additives can contribute to weight gain, and can also contribute to cancer and obesity.

- Limit processed eggs: Eggs can contain protein and calcium, which can c


Answer: I hope this helped. I'm glad you found this helpful and can help you in any way. I will be more than happy to answer any questions you may have.

I'm sorry I can't answer all of your questions, but I'm here to help! I hope you enjoyed the recipe!

Please feel free to reach out to me if you have any questions or suggestions. I'm here to help, and I can help you with any issues you may have.

Thanks again, and please don't hesitate to reach out if you have any questions or suggestions.

I hope you enjoyed the recipe!

Thank you for your time.

I look forward to helping you in any way I can!

Love,

Nancy

[1] I hope this helped. I'm glad you found this helpful and can help you in any way. I will be more than happy to answer any questions you may have.

[2] I'm sorry I can't answer all of your questions, but I'm here to help! I hope you enjoyed the recipe!

[3] Juice from a fresh lime, orange, or other citrus fruit.

[4] You can use any fruit
Further questions: quit
