### This notebook demonstrates a full RAG pipeline:

- Downloading and transcribing YouTube audio

- Chunking and embedding the transcript

- Building a semantic search index

- Enabling strict, reference-based conversational QA and summarization

## 1. Import RAG Pipeline and Agent Utilities

In this cell, we import all the necessary functions and classes from our custom modules (`rag_pipeline.py` and `agent.py`).  
These utilities handle environment setup, audio downloading, transcription, document splitting, embedding, vector search, and agent construction for conversational QA and summarization.


In [None]:
# Import RAG pipeline utilities
from rag_pipeline import (

    load_environment,
    download_audio,
    process_audio,
    load_and_split_documents,
    create_vectorstore,
    build_qa_chain
)

# Import agent utilities
from agent import build_agent

## 2. Download Audio from YouTube

Here, we specify the YouTube URL for the IELTS video we want to process.  
The `download_audio` function downloads the audio track of the video and saves it as an MP3 file in the output directory.  
The resulting path is displayed for confirmation.


In [None]:
youtube_url = "https://www.youtube.com/watch?v=kHTnAx6f-j0&list=PLWWR_9t3vo3OfJ62HL-nnwRaSAiLnaMSM&index=2"

In [None]:
audio_path = download_audio(youtube_url)

In [None]:
audio_path

## 3. Load Environment and Transcribe Audio

This cell loads your OpenAI API key and initializes the LLM and OpenAI client.  
Then, it processes the downloaded audio:  
- Splits it into manageable chunks  
- Transcribes each chunk using OpenAI Whisper  
- Saves the transcript to a text file  
This is the core step for converting video content into searchable text.


In [None]:
api_key, llm, clinet =load_environment()

In [None]:
process_audio(client=clinet)

## 4. Load and Split Transcript Documents

After transcription, we need to load the transcript files and split them into smaller text chunks.  
This helps with efficient embedding and retrieval, as semantic search works best on small, focused pieces of text.  
The resulting list of document chunks is displayed.


In [None]:
docs = load_and_split_documents()

In [None]:
docs

## 5. Assign Source Metadata to Document Chunks

For traceability, each chunk is annotated with metadata indicating its source (the original video or audio file).  
This is useful for providing references when answering questions and for debugging.


In [None]:
import os
for chunk in docs:
    video_id = os.path.splitext(os.path.basename(chunk.metadata["source"]))[0]
    chunk.metadata["source"] = video_id

In [None]:
docs

## 6. Generate Embeddings and Build FAISS Vector Store

Now, we generate vector embeddings for each document chunk using OpenAI embeddings.  
These embeddings are stored in a FAISS vector index, enabling fast semantic search and retrieval for question answering.


In [None]:
vectorstore = create_vectorstore(api_key=api_key, docs=docs)

## 7. Create a Strict QA Prompt and Build QA Chain (Retriever)

We define a strict prompt template to ensure the model only answers questions based on the transcript, not external knowledge.  
Then, we build a RetrievalQA chain that uses the vector store and the prompt to answer questions, returning both the answer and the supporting source documents.


In [None]:
qa_chain = build_qa_chain(vector_store=vectorstore, chat_model=llm, return_source_documents=True)

## 8. Initialize and Test the Conversational Agent

Finally, we build a conversational agent that can answer questions and provide summaries, strictly using the transcript content.  
We test the agent with a series of queries to check that it responds appropriately, refusing to answer questions not covered in the transcript.


In [None]:
agent = build_agent(chat_model=llm , chain= qa_chain)

In [None]:
from langchain_core.tracers.context import tracing_v2_enabled
with tracing_v2_enabled():
    print(agent.invoke("What is the key difference between how band 5–6 students and band 7–9 students answer in Part 1 of the test? Provide one example that illustrates this difference."))  # Should store name
    print(agent.invoke("What does the teacher mean by “test mode,” and why does adopting that mode negatively affect students’ scores?"))  # Should return "Your name is Layla"
    print(agent.invoke("Which strategy should candidates avoid when addressing each bullet point on the cue card, and what do band 9 candidates do instead?"))  
    print(agent.invoke("If I face a Part 3 question on a topic I know little about, what steps are recommended to still give an acceptable answer?")) 
    print(agent.invoke("When answering a question like “What skills does a person need to be a great chef?”, what four stages do band 9 students go through in building their response?"))
    print(agent.invoke("Summarize the content of video transcript "))
    print(agent.invoke("what was the first question I asked"))
