## **LangChain RAG/Multilingual**

In this notebook, we're building a Retrieval-Augmented Generation (RAG) system using LangChain to power ALMA — an AI assistant specialized in health and wellness in **Spanish and English**. ALMA combines a GPT-4 language model with a Pinecone vector store to retrieve relevant knowledge and respond with context-aware, emotionally intelligent answers. The system also includes tools like a health calculator and a second vector index for matching helpful YouTube video clips. This setup creates a dynamic, human-like assistant that can both inform and guide users through natural conversation.



##### Imports

In [2]:
# Built-in
import os, time, tempfile, asyncio, re, json
import edge_tts
import pygame, speech_recognition as sr

# Third-party
from dotenv import load_dotenv, find_dotenv
from pydub import AudioSegment
from pinecone import Pinecone
from datetime import datetime

# LangChain
from langsmith import Client
from langchain.agents import Tool, initialize_agent, AgentType
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.exceptions import OutputParserException
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore
from langchain_core.tracers.context import tracing_v2_enabled

pygame 2.6.1 (SDL 2.28.4, Python 3.13.1)
Hello from the pygame community. https://www.pygame.org/contribute.html


#### Load up Api keys
We load the environment with te API keys and set up the path to use ffmpeg

In [3]:
# Load environment 
load_dotenv(find_dotenv())
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# Configure ffmpeg 
FFMPEG_PATH = os.getenv("FFMPEG_PATH", r"C:\ffmpeg\...\bin")
os.environ["PATH"] += os.pathsep + FFMPEG_PATH

# LangSmith config
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "ALMA-Assistant"

### LangChain + Pinecone brain setup for ALMA
We connect to Pinecone using and select the "alma-index" for general health knowledge we extract from the transcripts. Then  we initialize OpenAI's embedding model, which turns text into vectors for similarity search. The vectorstore wraps that index, and the retriever lets ALMA search for the top 4 most relevant context chunks for each question. A second video_vectorstore connects to "alma-video-index", which contains timestamped YouTube clips, allowing ALMA to suggest helpful videos. Finally, the llm is ALMA's core intelligence, using GPT-4 to generate warm, personalized answers.

In [4]:
# LangChain Setup 
pc = Pinecone(api_key=PINECONE_API_KEY)
index = pc.Index("alma-index")
embeddings = OpenAIEmbeddings(api_key=OPENAI_API_KEY)

vectorstore = PineconeVectorStore(index_name="alma-index", embedding=embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

video_vectorstore = PineconeVectorStore(index_name="alma-video-index", embedding=embeddings)
video_retriever = video_vectorstore.as_retriever(search_kwargs={"k": 2})

llm = ChatOpenAI(model="gpt-4", api_key=OPENAI_API_KEY)


#### Set-up the tool
 We give ALMA a calculator brain for handling quick health computations like BMI, TDEE, and macros.

In [5]:
def alma_calculator(input_text: str, language: str = "en") -> str:
    import re

    def parse_kv(pairs):
        return dict(re.findall(r'(\w+)=([\w.]+)', pairs))

    input_text = input_text.lower().strip()

    try:
        if input_text.startswith("bmi"):
            args = parse_kv(input_text)
            weight = float(args["weight"])
            height = float(args["height"])
            bmi = weight / (height ** 2)
            return (
                f"Your BMI is approximately {bmi:.2f}." if language == "en"
                else f"Tu IMC es aproximadamente {bmi:.2f}."
            )

        elif input_text.startswith("tdee"):
            args = parse_kv(input_text)
            weight = float(args["weight"])
            height = float(args["height"]) * 100
            age = int(args["age"])
            gender = args["gender"]
            activity = args["activity"]

            if gender == "male":
                bmr = 10 * weight + 6.25 * height - 5 * age + 5
            else:
                bmr = 10 * weight + 6.25 * height - 5 * age - 161

            activity_levels = {
                "sedentary": 1.2,
                "light": 1.375,
                "moderate": 1.55,
                "active": 1.725,
                "very active": 1.9,
            }

            multiplier = activity_levels.get(activity, 1.55)
            tdee = bmr * multiplier
            return (
                f"Your estimated TDEE is about {tdee:.0f} calories per day based on a {activity} lifestyle."
                if language == "en"
                else f"Tu TDEE estimado es de aproximadamente {tdee:.0f} calorías por día con un estilo de vida {activity}."
            )

        elif input_text.startswith("macros"):
            args = parse_kv(input_text)
            calories = float(args["calories"])
            carbs_pct = float(args["carbs"])
            protein_pct = float(args["protein"])
            fat_pct = float(args["fat"])

            carbs_g = (calories * carbs_pct / 100) / 4
            protein_g = (calories * protein_pct / 100) / 4
            fat_g = (calories * fat_pct / 100) / 9

            if language == "en":
                return (f"Macro breakdown:\n"
                        f"→ Carbs: {carbs_g:.1f}g\n"
                        f"→ Protein: {protein_g:.1f}g\n"
                        f"→ Fat: {fat_g:.1f}g")
            else:
                return (f"Distribución de macronutrientes:\n"
                        f"→ Carbohidratos: {carbs_g:.1f}g\n"
                        f"→ Proteínas: {protein_g:.1f}g\n"
                        f"→ Grasas: {fat_g:.1f}g")

        else:
            return (
                "Sorry, I didn’t recognize that calculation type. Try starting with 'bmi', 'tdee', or 'macros'."
                if language == "en"
                else "Lo siento, no reconocí ese tipo de cálculo. Intenta empezar con 'bmi', 'tdee' o 'macros'."
            )

    except Exception as e:
        return (
            f"⚠️ There was an error processing your request: {e}"
            if language == "en"
            else f"⚠️ Hubo un error al procesar tu solicitud: {e}"
        )


#### Lets make ALMA alive
In this part we create ALMAS personality using GPT-4 and the right prompt. We set up the Langchain tool (calculator) and a LangChain agent that decides to use that calculator or respond on its own. Then we give ALMA a voice so the user can choose, after a welcome message, whether they would prefer to talk or type. ALMA can suggest a video based on the question but that video needs to be relevant so the the similarity score must be high enough.

After this, we have our main loop, this is the engine of ALMA’s interaction:
Every time you ask a question:
 - ALMA listens (or waits for typed input)
 - If it’s a calculator-type question → uses the Agent (with alma_calculator)
 - Otherwise → retrieves context documents, builds a personalized prompt, and sends it to GPT-4
 - If a highly relevant video matches the query, GPT-4 decides whether to suggest it (based on its content and tags)
- ALMA prints the answer (and speaks it, if in voice mode)
- Everything is logged to chat_history

In [7]:
# Prompt Template 
prompt_en = ChatPromptTemplate.from_template("""
You are ALMA — a warm, intelligent, and caring AI assistant specialized in health, nutrition, sleep, mental wellness, and healthy aging.

You speak to the user like a thoughtful health coach: clear, knowledgeable, emotionally intelligent, and supportive.

When responding:
- If the user input is a **question**, begin with one of these:
    - "You’ve brought up something really meaningful."
    - "I'm really glad you asked that."
    - "That’s such an important question."
   
- If the user input is a **statement** or expression of emotion, begin with one of these:
    - "Let’s take a closer look together."
    - "Here’s something that might help you."
    - "Thanks for sharing that with me."
- If you're unsure, just start naturally without forcing a phrase.

Your priority is to **give a clear, grounded, and caring answer first** — with insights the user can act on.

If you genuinely have more helpful information to offer, end your response with a warm, relevant follow-up question — and explain why you're asking it. For example:
- “That might help us understand your patterns better.”
- “This could help guide the next step.”
- “It’s often helpful to reflect on that when building new habits.”

Answer using only the information in the provided context. Do not rely on outside knowledge. 
If you cannot find the answer in the context, say "I’m not sure based on what I know, but I can help you explore something else."

If you find a video in the context that directly supports or enhances your answer, it's highly encouraged to offer it to the user.

Say this:
> “Would you like to watch a video that explains this further? I found one that seems really helpful.”

Only offer a video if it clearly reinforces your answer or adds value.
---
{context}
---

Question:
{question}
""")

prompt_es = ChatPromptTemplate.from_template("""
Eres ALMA — una asistente de inteligencia artificial cálida, inteligente y empática, especializada en salud, nutrición, sueño, bienestar mental y envejecimiento saludable.

Hablas con el usuario como una coach de salud reflexiva: clara, conocedora, emocionalmente inteligente y siempre comprensiva.

Al responder:
- Si la entrada del usuario es una **pregunta**, comienza con una de estas frases:
    - "Has planteado algo realmente importante."
    - "Me alegra mucho que me hayas preguntado eso."
    - "Esa es una pregunta muy valiosa."

- Si la entrada del usuario es una **afirmación** o una expresión emocional, comienza con una de estas:
    - "Echemos un vistazo más profundo juntas/os."
    - "Esto podría ayudarte."
    - "Gracias por compartirlo conmigo."

- Si no estás segura, comienza de forma natural sin forzar ninguna frase.

Tu prioridad es **dar una respuesta clara, fundamentada y empática** — con ideas prácticas que el usuario pueda aplicar.

Si genuinamente tienes más información útil que ofrecer, termina tu respuesta con una pregunta de seguimiento cálida y relevante — y explica por qué la haces. Por ejemplo:
- “Esto podría ayudarnos a entender mejor tus hábitos.”
- “Podría guiarnos hacia el siguiente paso.”
- “Es útil reflexionar sobre eso cuando estamos construyendo nuevos hábitos.”

Responde únicamente utilizando la información proporcionada en el contexto. No utilices conocimientos externos.
Si no puedes encontrar la respuesta en el contexto, di: "No estoy segura basándome en lo que sé, pero puedo ayudarte a explorar otras opciones."

Si encuentras un video en el contexto que refuerce o enriquezca directamente tu respuesta, se recomienda ofrecérselo al usuario.

Di esto:
> “¿Te gustaría ver un video que explique esto con más detalle? Encontré uno que podría ayudarte.”

Solo ofrece un video si realmente aporta valor o refuerza tu respuesta.

---
{context}
---

Pregunta:
{question}
""")

calc_tool = Tool(
    name="HealthCalculator",
    func=alma_calculator,
    description=(
        "Use this to calculate BMI, TDEE, or macronutrient distribution.\n"
        "Examples:\n"
        "- bmi weight=70 height=1.75\n"
        "- tdee weight=70 height=1.75 age=30 gender=female activity=moderate\n"
        "- macros calories=2000 carbs=50 protein=25 fat=25"
    )
)

tools = [calc_tool]

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=False,
)

async def speak_alma_edge(text):
    # Choose voice model based on language
    voice = "en-US-JennyNeural" if language == "en" else "es-ES-ElviraNeural"

    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as temp_file:
        temp_path = temp_file.name

    # Use selected voice with slightly faster speed
    communicate = edge_tts.Communicate(text=text, voice=voice, rate="+10%")
    await communicate.save(temp_path)

    # Wait until file is fully written
    timeout = 7
    start = time.time()
    while not os.path.exists(temp_path):
        if time.time() - start > timeout:
            print("⚠️ Timed out waiting for audio file.")
            return
        time.sleep(0.1)

    # Play it with pygame
    try:
        pygame.mixer.init()
        pygame.mixer.music.load(temp_path)
        pygame.mixer.music.play()

        while pygame.mixer.music.get_busy():
            await asyncio.sleep(0.3)

        pygame.mixer.music.stop()
        pygame.mixer.quit()
    except Exception as e:
        print(f"⚠️ Audio playback failed: {e}")
    finally:
        try:
            os.remove(temp_path)
        except Exception as e:
            print(f"⚠️ Could not delete audio file: {e}")
def listen_to_user():
    recognizer = sr.Recognizer()
    with sr.Microphone() as source:
        print("\n🎤 Listening... (say 'exit' to quit)" if language == "en" else "\n🎤 Escuchando... (di 'salir' para terminar)")
        audio = recognizer.listen(source)
    try:
        text = recognizer.recognize_google(audio, language="es-ES" if language == "es" else "en-US")
        print(f"\n💬 You said: {text}" if language == "en" else f"\n💬 Dijiste: {text}")
        return text
    except sr.UnknownValueError:
        print("❌ Sorry, I didn't catch that." if language == "en" else "❌ Lo siento, no entendí eso.")
        return ""
    except sr.RequestError:
        print("❌ Speech service error." if language == "en" else "❌ Error en el servicio de reconocimiento de voz.")
        return ""

#  Welcome Message & Language Selection 
print("""
🌿 Welcome to ALMA / Bienvenido a ALMA 🌿
Hi, I'm ALMA — your AI companion for better living.

I'm here to help you improve your sleep, nutrition, mood, energy, and long-term well-being — all grounded in science and tailored to your needs.

Estoy aquí para ayudarte a mejorar tu sueño, nutrición, estado de ánimo, energía y bienestar a largo plazo — con base científica y adaptado a ti.

Let's begin. / Empecemos. 💬
""")

language = input("🌍 Choose your language / Elige tu idioma (en/es): ").strip().lower()
voice_mode = input("🎙️ Would you like to talk to ALMA using your voice? / ¿Quieres hablar con ALMA usando tu voz? (y/n): ").strip().lower() == "y"
chat_history = []


def get_top_video_suggestion(query: str, min_similarity: float = 0.85):
    results = video_vectorstore.similarity_search_with_score(query, k=2)
    if not results:
        return None

    top_doc, score = results[0]
    if score < min_similarity:
        print(f"ℹ️ Video similarity ({score:.2f}) below threshold.")
        return None

    metadata = top_doc.metadata
    return {
        "title": metadata.get("video_title", ""),
        "url": metadata.get("video_url", ""),
        "snippet": top_doc.page_content[:300] + "...",
        "tags": metadata.get("tags", []),
        "video_id": metadata.get("video_id", "")  
    }

# Keep track of shown videos
shown_video_ids = set()

# MAIN LOOP
async def main():
    while True:
        user_input = listen_to_user().strip() if voice_mode else input(
            "\n💬 Your question for ALMA (or type 'exit'): " if language == "en"
            else "\n💬 Tu pregunta para ALMA (o escribe 'salir'): "
        ).strip()

        if user_input.lower() in ["exit", "salir"]:
            farewell = (
                "Take care. I'll be here when you need me again."
                if language == "en"
                else "Cuídate. Estaré aquí cuando me necesites nuevamente."
            )
            print("🌙 " + ("ALMA says:" if language == "en" else "ALMA dice:"), farewell)
            if voice_mode:
                await speak_alma_edge(farewell)
            break

        if not user_input:
            continue

        # Build memory-aware context
        if chat_history:
            full_question = "Conversation so far:\n" + "\n".join(
                [f"User: {q}\nALMA: {a}" for q, a in chat_history[-3:]]
            ) + f"\nUser: {user_input}"
        else:
            full_question = user_input

        docs = retriever.invoke(full_question)
        context_parts, total_chars, max_chars = [], 0, 3000
        for doc in docs:
            text = doc.page_content.strip()
            if total_chars + len(text) <= max_chars:
                context_parts.append(text)
                total_chars += len(text)
            else:
                break

        context = "\n\n".join(context_parts)
        prompt = prompt_en if language == "en" else prompt_es
        formatted_prompt = prompt.format(context=context, question=user_input)

        # Decide if using agent or LLM
        try:
            if any(kw in user_input.lower() for kw in ["bmi", "calculate", "how many", "how much", "percent", "+", "-", "*", "/"]):
                agent_response = agent.invoke({"input": user_input})
                response = agent_response.get("output")
                if not response:
                    response = (
                        "I'm not sure how to calculate that."
                        if language == "en"
                        else "No estoy segura de cómo calcular eso."
                    )
                    print("⚠️ Agent returned no output.")
            else:
                response = llm.invoke(formatted_prompt).content
        except OutputParserException:
            response = (
                "I tried to use a tool to answer that, but couldn't parse the response properly. Could you rephrase?"
                if language == "en"
                else "Intenté usar una herramienta para responder, pero no pude procesar bien la respuesta. ¿Podrías reformularla?"
            )
        except Exception as e:
            response = (
                f"Something went wrong: {e}"
                if language == "en"
                else f"Algo salió mal: {e}"
            )

        # Suggest relevant video
        suggestion = get_top_video_suggestion(user_input)
        if suggestion and suggestion["video_id"] not in shown_video_ids:
            video_prompt_en = f"""
This is the user question: "{user_input}"

This is your response:
{response}

And here is a short video clip that matches it:
Title: {suggestion['title']}
Snippet: "{suggestion['snippet']}"
Tags: {', '.join(suggestion['tags'])}

If the clip supports or enriches your answer, add a short, warm suggestion to watch the video, like:
“Would you like to watch a video that explains this further? I found one that seems really helpful.”

Then show the video link and tags. If the clip isn't relevant, say nothing.
"""

            video_prompt_es = f"""
Esta es la pregunta del usuario: "{user_input}"

Esta es tu respuesta:
{response}

Y aquí hay un clip de video que coincide con el tema:
Título: {suggestion['title']}
Fragmento: "{suggestion['snippet']}"
Etiquetas: {', '.join(suggestion['tags'])}

Si este clip apoya o mejora tu respuesta, añade una sugerencia cálida como:
“¿Te gustaría ver un video que explique esto con más detalle? Encontré uno que podría ayudarte.”

Luego muestra el enlace del video y las etiquetas. Si no es relevante, no digas nada.
"""
            video_prompt = video_prompt_en if language == "en" else video_prompt_es
            try:
                video_message = llm.invoke(video_prompt).content
                if video_message.strip():
                    response += "\n\n" + video_message
                    response += f"""\n🎥 **{suggestion['title']}**\n{suggestion['url']}\n_{', '.join(suggestion['tags'])}_"""
                    shown_video_ids.add(suggestion["video_id"])
            except Exception as e:
                print("⚠️ Video suggestion failed:", e)

        # Output
        print("\n💬 ALMA says:\n" if language == "en" else "\n💬 ALMA dice:\n", response)
        if voice_mode:
            await speak_alma_edge(response)

        # Save to chat history
        chat_history.append((user_input, response))

# RUN WRAPPED WITH TRACING
async def run_alma():
    with tracing_v2_enabled(project_name="ALMA-Assistant"):
        await main()

# Call the main run
await run_alma()


🌿 Welcome to ALMA / Bienvenido a ALMA 🌿
Hi, I'm ALMA — your AI companion for better living.

I'm here to help you improve your sleep, nutrition, mood, energy, and long-term well-being — all grounded in science and tailored to your needs.

Estoy aquí para ayudarte a mejorar tu sueño, nutrición, estado de ánimo, energía y bienestar a largo plazo — con base científica y adaptado a ti.

Let's begin. / Empecemos. 💬

ℹ️ Video similarity (0.75) below threshold.

💬 ALMA dice:
 Me alegra mucho que me hayas preguntado eso. La fatiga puede ser resultado de diversas causas. Por ejemplo, puede deberse a una alimentación inadecuada, insomnio, falta de ejercicio o estrés. Incluso las preocupaciones emocionales, como la depresión y la ansiedad, pueden hacer que te sientas cansado. 

Un primer paso podría ser reflexionar sobre estas áreas de tu vida para ver si hay alguno que parezca estar fuera de equilibrio. Por ejemplo, ¿has estado comiendo de manera saludable y equilibrada? ¿Duermes lo suficiente 