# Multimodal YouTube RAG Agent - Step by Step Tutorial

This notebook demonstrates how to build the **ReAct Agent** used in the Value Investing Chatbot. 
I will combine **LangChain**, **OpenAI**, and **Pinecone** to create an agent that can:
1. Answer questions based on video content (RAG).
2. Ingest new videos on demand.
3. Retrieve raw documents.
4. Transcribe audio files.

## 1. Setup and Imports
First, we import the necessary libraries and load environment variables.

In [None]:
import os
import sys
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Ensure we can import from our local modules
sys.path.append(os.getcwd())

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore
from langchain.chains import RetrievalQA
from langchain.agents import AgentExecutor, create_react_agent, Tool
from langchain.memory import ConversationBufferWindowMemory
from langchain import hub

# Import our custom tools
import ingest_videos
from backend import speech_to_text

## 2. Configuration
Set up the API keys and Pinecone index name.

In [None]:
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
INDEX_NAME = "youtube-rag-index"

if not PINECONE_API_KEY:
    print("Error: PINECONE_API_KEY not found. Please check your .env file.")

## 3. Initialize Vector Store & LLM
We use `OpenAIEmbeddings` for vectorization and `ChatOpenAI` (GPT-4o) as our reasoning engine.

In [None]:
# 1. Embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# 2. Vector Store (Pinecone)
vector_store = PineconeVectorStore(index_name=INDEX_NAME, embedding=embeddings)

# 3. Retriever
retriever = vector_store.as_retriever(search_kwargs={"k": 5})

# 4. LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)

## 4. Define Tools
The agent needs tools to interact with the world. We define 4 tools:
- **RAG Answer**: Uses `RetrievalQA` to answer questions.
- **Ingestion**: Calls `ingest_videos.process_video`.
- **STT**: Calls `speech_to_text.transcribe_audio`.
- **Retriever**: Fetches raw docs.

In [None]:
# Tool 1: RAG Answer Tool
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
)

# Tool 2: YouTube Ingestion Tool
def ingest_video_func(url: str):
    try:
        ingest_videos.process_video(url, vector_store)
        return f"Successfully ingested video: {url}"
    except Exception as e:
        return f"Error ingesting video: {str(e)}"

# Tool 3: Speech to Text Tool
def speech_to_text_func(file_path: str):
    try:
        text = speech_to_text.transcribe_audio(file_path)
        return text if text else "No transcription available."
    except Exception as e:
        return f"Error transcribing audio: {str(e)}"

# Tool 4: Retriever Tool
def retriever_func(query: str):
    docs = retriever.get_relevant_documents(query)
    return "\n\n".join([f"Content: {d.page_content}\nSource: {d.metadata.get('source', 'Unknown')}" for d in docs])

# Create Tool Objects
tools = [
    Tool(
        name="rag_answer_tool",
        func=qa_chain.run,
        description="Use this to answer questions about value investing based on the video knowledge base. Input should be a fully formed question."
    ),
    Tool(
        name="youtube_ingestion_tool",
        func=ingest_video_func,
        description="Use this to ingest a new YouTube video into the knowledge base. Input should be a valid YouTube URL."
    ),
    Tool(
        name="speech_to_text_tool",
        func=speech_to_text_func,
        description="Use this to transcribe an audio file to text. Input should be a local file path."
    ),
    Tool(
        name="retriever_tool",
        func=retriever_func,
        description="Use this to retrieve raw documents/context from the vector store without generating an answer. Input is a search query."
    )
]

## 5. Create ReAct Agent
We pull the standard ReAct prompt from LangSmith Hub and initialize the agent.

In [None]:
# Pull Prompt
prompt = hub.pull("hwchase17/react-chat")

# Create Agent
agent = create_react_agent(llm, tools, prompt)

# Setup Memory
memory = ConversationBufferWindowMemory(
    memory_key="chat_history",
    k=5,
    return_messages=True
)

# Create Executor
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    memory=memory,
    verbose=True,
    handle_parsing_errors=True
)

## 6. Run the Agent
Now we can ask questions!

In [None]:
response = agent_executor.invoke({"input": "What is the most important rule of value investing?"})
print(response["output"])