# Build your own NotebookLM

This notebook walks you through all the core functionalities of our NotebookLM implementation, from document processing and content extraction to memory integration and podcast generation. Each section includes both explanations and code examples to demonstrate how the system works under the hood.

## Table of Contents
1. [Document Processing](#1-document-processing)
2. [Audio Transcription](#2-audio-transcription)
3. [Web Scraping](#3-web-scraping)
4. [Embedding Generation](#4-embedding-generation)
5. [Vector Database Storage](#5-vector-database-storage)
6. [Memory Layer Integration](#6-memory-layer-integration)
7. [RAG Query Processing](#7-rag-query-processing)
8. [Memory Storage for Conversations](#8-memory-storage-for-conversations)
9. [Podcast Script Generation](#9-podcast-script-generation)
10. [Text-to-Speech Audio Generation](#10-text-to-speech-audio-generation)

In [1]:
# Necessary imports
import os
import datetime
import tempfile
from pathlib import Path
from dotenv import load_dotenv

load_dotenv()

True

## 1. Document Processing

The document processing module handles various file formats (PDF, TXT, Markdown) and converts them into structured chunks with metadata for citations and retrieval.

### Key features:
- **Multi-format support**: PDF, TXT, MD files
- **Page tracking**: Maintains page numbers for accurate citations
- **Character positions**: Tracks start/end positions for precise references
- **Metadata preservation**: Source file, type, and processing timestamp

In [2]:
from src.document_processing.doc_processor import DocumentProcessor, DocumentChunk

# Initialize the document processor
doc_processor = DocumentProcessor()

# Process a PDF document
pdf_path = "data/raft.pdf"
doc_chunks = doc_processor.process_document(pdf_path)

# Each chunk contains rich metadata for citations
for chunk in doc_chunks[:3]:
    print(f"Chunk ID: {chunk.chunk_id}")
    print(f"Source: {chunk.source_file}")
    print(f"Page: {chunk.page_number}")
    print(f"Content: {chunk.content[:100]}...")
    print(f"Citation Info: {chunk.get_citation_info()}")
    print("-" * 50)

# Process a text file
txt_path = "data/notes.txt"
text_chunks = doc_processor.process_document(txt_path)
print(f"Processed {len(text_chunks)} chunks from text file")
print(f"Citation Info: {text_chunks[0].get_citation_info()}")

INFO:src.document_processing.doc_processor:Processing document: raft.pdf
INFO:src.document_processing.doc_processor:Processed PDF: 53 chunks from 11 pages
INFO:src.document_processing.doc_processor:Processing document: notes.txt
INFO:src.document_processing.doc_processor:Processed text file: 5 chunks


Chunk ID: pdf_0_2c889c2c
Source: raft.pdf
Page: 1
Content: RAFT: Adapting Language Model to Domain Specific RAG
Tianjun Zhang Shishir G. Patil Naman Jain Sheng...
Citation Info: {'source': 'raft.pdf', 'type': 'pdf', 'chunk_id': 'pdf_0_2c889c2c', 'chunk_index': 0, 'page': 1, 'char_range': '0-964', 'total_pages': 11, 'page_width': 612.0, 'page_height': 792.0, 'processed_at': '2025-10-06T22:11:10.290567'}
--------------------------------------------------
Chunk ID: pdf_1_62caaf00
Source: raft.pdf
Page: 1
Content: help in answering the question, which we call,
distractor documents. RAFT accomplishes this
by citin...
Citation Info: {'source': 'raft.pdf', 'type': 'pdf', 'chunk_id': 'pdf_1_62caaf00', 'chunk_index': 1, 'page': 1, 'char_range': '965-1942', 'total_pages': 11, 'page_width': 612.0, 'page_height': 792.0, 'processed_at': '2025-10-06T22:11:10.290567'}
--------------------------------------------------
Chunk ID: pdf_2_90a18e35
Source: raft.pdf
Page: 1
Content: specific document collec

## 2. Audio Transcription

Our system supports two types of audio transcription: direct audio files and YouTube videos, both using AssemblyAI for high-accuracy transcription with speaker diarization.

### Audio Processing Features:
- **Speaker diarization**: Automatically identifies different speakers
- **Timestamp preservation**: Maintains timing information for each utterance
- **High accuracy**: Uses AssemblyAI's advanced speech recognition
- **YouTube integration**: Audio download with yt-dlp + audio transcription with AssemblyAI

### 2.1 Direct Audio Transcription

In [3]:
from src.audio_processing.audio_transcriber import AudioTranscriber

# Initialize with AssemblyAI API key
# audio_transcriber = AudioTranscriber("your_assemblyai_api_key")
audio_transcriber = AudioTranscriber(os.getenv("ASSEMBLYAI_API_KEY"))

# Transcribe an audio file with speaker diarization
audio_path = "data/harvard.wav"
audio_chunks = audio_transcriber.transcribe_audio(
    audio_path,
    enable_speaker_diarization=True,
    enable_auto_punctuation=True
)

# Each chunk represents a speaker utterance
for chunk in audio_chunks[:5]:
    print(f"\nSpeaker: {chunk.metadata.get('speakers', [])}")
    print(f"Time: {chunk.metadata.get('start_timestamp', 0)/1000:.2f}s - {chunk.metadata.get('end_timestamp', 0)/1000:.2f}s")
    print(f"Content: {chunk.content}")
    print(f"Confidence: {chunk.metadata.get('confidence', 'N/A'):.3f}")
    print("-" * 40)

INFO:src.audio_processing.audio_transcriber:AudioTranscriber initialized with AssemblyAI
INFO:src.audio_processing.audio_transcriber:Starting transcription for: harvard.wav
INFO:httpx:HTTP Request: POST https://api.assemblyai.com/v2/upload "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.assemblyai.com/v2/transcript "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.assemblyai.com/v2/transcript/7ef032fe-8d9c-4c33-91c5-17ef9671fa7b "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.assemblyai.com/v2/transcript/7ef032fe-8d9c-4c33-91c5-17ef9671fa7b "HTTP/1.1 200 OK"
INFO:src.audio_processing.audio_transcriber:Transcription completed for: harvard.wav
INFO:src.audio_processing.audio_transcriber:Created 1 chunks from transcript



Speaker: ['Speaker B', 'Speaker A']
Time: 1.36s - 17.68s
Content: [00:01] Speaker A: The stale smell of old beer lingers.
[00:04] Speaker B: It takes heat to bring out the odor.
[00:06] Speaker A: A cold dip restores health and zest.
[00:09] Speaker B: A salt pickle tastes fine with ham.
[00:12] Speaker A: Tacos al pastor are my favorite.
[00:14] Speaker B: A zestful food is the hot cross bun.
Confidence: 0.958
----------------------------------------


### 2.2 YouTube Video Transcription

In [4]:
from src.audio_processing.youtube_transcriber import YouTubeTranscriber

# Initialize YouTube transcriber
youtube_transcriber = YouTubeTranscriber(os.getenv("ASSEMBLYAI_API_KEY"))

# Extract and transcribe YouTube video
youtube_url = "https://www.youtube.com/watch?v=D26sUZ6DHNQ"
youtube_chunks = youtube_transcriber.transcribe_youtube_video(youtube_url)

print(f"Transcribed {len(youtube_chunks)} utterances from YouTube video")

# Show speaker-separated content
for chunk in youtube_chunks[:3]:
    print(f"Video ID: {chunk.metadata.get('video_id')}")
    print(f"Speaker: {chunk.metadata.get('speaker')}")
    print(f"Timestamp: {chunk.metadata.get('start_time')}s")
    print(f"Content: {chunk.content}")
    print("-" * 40)

INFO:src.audio_processing.youtube_transcriber:YouTubeTranscriber initialized
INFO:src.audio_processing.youtube_transcriber:Downloading audio from: https://www.youtube.com/watch?v=D26sUZ6DHNQ


                                                           

INFO:src.audio_processing.youtube_transcriber:Audio downloaded successfully: /var/folders/l7/_rny0yhj7yzcq_bjc7gzw4140000gn/T/youtube_transcriber/D26sUZ6DHNQ.m4a
INFO:src.audio_processing.youtube_transcriber:Starting transcription with speaker diarization...
INFO:httpx:HTTP Request: POST https://api.assemblyai.com/v2/upload "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.assemblyai.com/v2/transcript "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.assemblyai.com/v2/transcript/99f074b1-86d2-48ab-a3ef-bd39ca2cee31 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.assemblyai.com/v2/transcript/99f074b1-86d2-48ab-a3ef-bd39ca2cee31 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.assemblyai.com/v2/transcript/99f074b1-86d2-48ab-a3ef-bd39ca2cee31 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.assemblyai.com/v2/transcript/99f074b1-86d2-48ab-a3ef-bd39ca2cee31 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.assemblyai.com/v2/transcri

Transcribed 1 utterances from YouTube video
Video ID: D26sUZ6DHNQ
Speaker: A
Timestamp: 240s
Content: Speaker A: 99% of developers don't get sockets. What actually is a socket? You've probably seen this kind of socket, but I'm referring to a completely different kind of socket. In the software industry or computer science universe, we simply do not give justice to certain concepts like sockets, anonymous pipes, ephemeral ports, file descriptors, etc. These concepts end up being amorphous hand wavy constructs in most developers minds. And there is a whole host of additional concepts and jargon that are not taught in nearly enough detail and that needs to be addressed. So in this video I'm going to explain to you exactly what you need to know about sockets and give you an introductory taste of what sockets are. So the next time someone asks you, hey, could you explain to me what a socket is? How does TCP or UDP leverage sockets, and what layer of the OSI model do sockets operate in? You 

## 3. Web Scraping

The web scraping module uses Firecrawl to extract clean, structured content from websites while preserving the context needed for citations.

### Web Scraping Features:
- **Clean extraction**: Removes ads, navigation, and irrelevant content
- **Metadata preservation**: Preserves title, URL, and page structure
- **Citation-ready**: URL fragments for precise source linking
- **Chunking**: Intelligent text segmentation with overlap

In [5]:
from src.web_scraping.web_scraper import WebScraper

# Initialize with Firecrawl API key
# web_scraper = WebScraper("your_firecrawl_api_key")
web_scraper = WebScraper(os.getenv("FIRECRAWL_API_KEY"))
# Scrape a website
url = "https://blog.dailydoseofds.com/p/5-chunking-strategies-for-rag"
web_chunks = web_scraper.scrape_url(url)

print(f"Scraped {len(web_chunks)} chunks from {url}")

# Each chunk contains web-specific metadata
for chunk in web_chunks[:2]:
    print(f"URL: {chunk.metadata.get('original_url', 'N/A')}")
    print(f"Title: {chunk.metadata.get('title', 'No title')}")
    print(f"Chunk: {chunk.chunk_index}")
    print(f"Content: {chunk.content[:200]}...")
    print(f"Citation: [Source: {chunk.source_file}, Type: {chunk.source_type}]")
    print(f"URL Fragment: {chunk.metadata.get('url_fragment')}")
    print("-" * 50)

INFO:src.web_scraping.web_scraper:WebScraper initialized with Firecrawl
INFO:src.web_scraping.web_scraper:Scraping URL: https://blog.dailydoseofds.com/p/5-chunking-strategies-for-rag
INFO:src.web_scraping.web_scraper:Successfully scraped https://blog.dailydoseofds.com/p/5-chunking-strategies-for-rag: 24 chunks created


Scraped 24 chunks from https://blog.dailydoseofds.com/p/5-chunking-strategies-for-rag
URL: https://blog.dailydoseofds.com/p/5-chunking-strategies-for-rag
Title: 5 Chunking Strategies For RAG - by Avi Chawla
Chunk: 0
Content: [![Daily Dose of Data Science](https://substackcdn.com/image/fetch/$s_!heKx!,w_80,h_80,c_fill,f_auto,q_auto:good,fl_progressive:steep,g_auto/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic...
Citation: [Source: 5 Chunking Strategies For RAG - by Avi Chawla, Type: web]
URL Fragment: https://blog.dailydoseofds.com/p/5-chunking-strategies-for-rag#chunk-0
--------------------------------------------------
URL: https://blog.dailydoseofds.com/p/5-chunking-strategies-for-rag
Title: 5 Chunking Strategies For RAG - by Avi Chawla
Chunk: 1
Content: By subscribing, I agree to Substack's [Terms of Use](https://substack.com/tos), and acknowledge its [Information Collection Notice](https://substack.com/ccpa#personal-data-collected) and [Privacy Poli...
Citation: [Sour

## 4. Embedding Generation

After processing all modalities (documents, audio, web), we generate semantic embeddings for each chunk to enable similarity search and retrieval.

### Embedding Features:
- **Efficient embedding model**: BGE (BAAI General Embedding) for high-quality embeddings
- **Batch processing**: Efficient embedding generation for large document sets
- **Query optimization**: Specialized query embedding for better retrieval

In [7]:
from src.embeddings.embedding_generator import EmbeddingGenerator

# Initialize embedding generator with BGE model
embedding_generator = EmbeddingGenerator(model_name="BAAI/bge-small-en-v1.5")

# Generate embeddings for all processed chunks
all_chunks = doc_chunks + text_chunks + audio_chunks + youtube_chunks + web_chunks
# all_chunks = doc_chunks + text_chunks + audio_chunks + web_chunks
embedded_chunks = embedding_generator.generate_embeddings(all_chunks)

print(f"Generated embeddings for {len(embedded_chunks)} chunks")
print(f"Embedding dimension: {embedded_chunks[0].embedding.shape[0]}")

# Show embedding information
for i, embedded_chunk in enumerate(embedded_chunks[:3]):
    print(f"\nChunk {i+1}:")
    print(f"Source: {embedded_chunk.chunk.source_file}")
    print(f"Type: {embedded_chunk.chunk.source_type}")
    print(f"Embedding shape: {embedded_chunk.embedding.shape}")
    print(f"Content preview: {embedded_chunk.chunk.content[:100]}...")
    print(f"Embedding vector: {embedded_chunk.embedding[:50]}...")

# Generate query embedding for search
query = "What is the main topic of research?"
query_embedding = embedding_generator.generate_query_embedding(query)
print(f"\nQuery embedding shape: {query_embedding.shape}")
print(f"Query embedding vector: {query_embedding[:50]}...")

INFO:src.embeddings.embedding_generator:Initializing embedding model: BAAI/bge-small-en-v1.5
INFO:src.embeddings.embedding_generator:Model initialized successfully. Embedding dimension: 384
INFO:src.embeddings.embedding_generator:Generating embeddings for 84 chunks
INFO:src.embeddings.embedding_generator:Successfully generated 84 embeddings


Generated embeddings for 84 chunks
Embedding dimension: 384

Chunk 1:
Source: raft.pdf
Type: pdf
Embedding shape: (384,)
Content preview: RAFT: Adapting Language Model to Domain Specific RAG
Tianjun Zhang Shishir G. Patil Naman Jain Sheng...
Embedding vector: [-4.3315459e-02  3.0148618e-02 -1.9829737e-02 -1.0716271e-02
  2.3525460e-02 -2.3313519e-02 -5.7224091e-02  1.4782891e-02
  3.9421041e-02 -3.9844923e-02  1.4756397e-02 -4.6600543e-02
  1.9644288e-02  3.7566558e-02  3.8175888e-02  3.7036702e-02
  1.2093888e-02  5.1872578e-02  2.7578833e-02  2.7658312e-02
  8.9942496e-03  1.5821070e-03 -6.4824166e-04 -1.4120574e-02
 -5.1528174e-02  6.2688198e-03 -2.5406437e-02 -1.3140347e-02
 -2.7181443e-02 -2.3928148e-01  4.2123292e-03  7.4709230e-03
  1.7114243e-02  2.9168392e-02  1.7402349e-03 -8.1382063e-04
 -3.5261698e-02 -2.1194108e-02  2.6068753e-02  1.9246899e-02
 -2.2787806e-04 -7.8815585e-03  3.6228679e-03 -8.4047886e-03
  2.2571726e-02 -7.2218925e-02 -1.6292971e-02 -1.4067589e-02
 -7.2960

## 5. Vector Database Storage

All embedded chunks are stored in Milvus vector database with comprehensive metadata for fast similarity search and precise citations.

### Vector Database Features:
- **Milvus Lite**: Embedded vector database for fast deployment
- **IVF_FLAT indexing**: Optimized for accuracy and speed
- **Rich metadata**: Complete citation information stored with vectors
- **Flexible search**: Score thresholding and result limiting
- **Precise retrieval**: Get exact chunks by ID for citations

In [8]:
from src.vector_database.milvus_vector_db import MilvusVectorDB

# Initialize Milvus vector database
vector_db = MilvusVectorDB(
    db_path="./milvus_lite.db",
    collection_name="notebookLM_collection"
)

# Create index for fast similarity search
vector_db.create_index()

# Insert all embedded chunks
vector_db.insert_embeddings(embedded_chunks)
print(f"Stored {len(embedded_chunks)} embeddings in vector database")

# Perform similarity search
query_embedding = embedding_generator.generate_query_embedding(
    "Most prominent quantum computing algorithms"
)

search_results = vector_db.search(
    query_embedding,
    limit=5,
)

print(f"\nFound {len(search_results)} relevant chunks:")
for result in search_results:
    print(f"Score: {result['score']:.3f}")
    print(f"Source: {result['citation']['source_file']}")
    print(f"Type: {result['citation']['source_type']}")
    print(f"Content: {result['content'][:100]}...")
    print("-" * 40)

# Retrieve specific chunk by ID
chunk_id = search_results[0]['id']
chunk_data = vector_db.get_chunk_by_id(chunk_id)
print(f"\nRetrieved chunk: {chunk_data['source_file']}")

  from pkg_resources import DistributionNotFound, get_distribution
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
INFO:src.vector_database.milvus_vector_db:Milvus client initialized with database: ./milvus_lite.db
INFO:src.vector_database.milvus_vector_db:Collection 'notebookLM_collection' created successfully
INFO:src.vector_database.milvus_vector_db:Creating IVF_FLAT index with nlist=1024
INFO:src.vector_database.milvus_vector_db:Index created successfully
INFO:src.vector_database.milvus_vector_db:Inserted 84 embeddings into database
INFO:src.vector_database.milvus_vector_db:Search completed: 5 results found
INFO:src.vector_database.milvus_vector_db:Attempting to retrieve chunk with ID: txt_2_55a5ca52
INFO:src.vector_database.milvus_vector_db:Query ret

Stored 84 embeddings in vector database

Found 5 relevant chunks:
Score: 0.465
Source: notes.txt
Type: txt
Content: Quantum Circuit: A sequence of quantum gates applied to an initial state of qubits, represented by a...
----------------------------------------
Score: 0.537
Source: notes.txt
Type: txt
Content: Developing robust Quantum Error Correction (QEC) codes is necessary for building a fault-tolerant, l...
----------------------------------------
Score: 0.539
Source: notes.txt
Type: txt
Content: Quantum Computing Fundamentals
I. Core Concepts
Qubit (Quantum Bit): The basic unit of information i...
----------------------------------------
Score: 0.645
Source: notes.txt
Type: txt
Content: Measurement/Collapse: The act of observing a qubit forces it out of its superposition and into a def...
----------------------------------------
Score: 0.688
Source: notes.txt
Type: txt
Content: Superconducting Qubits: Qubits built using superconducting circuits (like Josephson junctions) opera...


## 6. Memory Layer Integration

The memory layer uses Zep to maintain conversation context and user sessions, enabling personalized and contextual interactions.

### Memory Features:
- **Session management**: Persistent user sessions across interactions
- **Context preservation**: Maintains conversation history and context
- **Source tracking**: Remembers which documents were referenced
- **User personalization**: Adapts responses based on user preferences

In [9]:
from src.memory.memory_layer import NotebookMemoryLayer

# Initialize memory layer with Zep
memory = NotebookMemoryLayer(
    user_id="demo_user",
    session_id="demo_session",
    create_new_session=True
)

# Memory automatically tracks conversations
print(f"Memory initialized for user: {memory.user_id}")
print(f"Session ID: {memory.session_id}")

# Memory will be used by RAG system to:
# 1. Maintain conversation context
# 2. Store user preferences
# 3. Track document interactions
# 4. Enable follow-up questions

# Memory stores all conversation context for future reference
print("Memory layer ready for conversation tracking")

INFO:httpx:HTTP Request: GET https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.getzep.com/api/v2/users/demo_user "HTTP/1.1 200 OK"
INFO:src.memory.memory_layer:Using existing user: demo_user
INFO:httpx:HTTP Request: DELETE https://api.getzep.com/api/v2/threads/demo_session "HTTP/1.1 200 OK"
INFO:src.memory.memory_layer:Deleted previous session: demo_session
INFO:httpx:HTTP Request: POST https://api.getzep.com/api/v2/threads "HTTP/1.1 201 Created"
INFO:src.memory.memory_layer:Created new session: demo_session
INFO:src.memory.memory_layer:NotebookMemoryLayer initialized for user demo_user, session demo_session


Memory initialized for user: demo_user
Session ID: demo_session
Memory layer ready for conversation tracking


## 7. Intelligent Response generation

The RAG (Retrieval-Augmented Generation) system combines vector search with LLM generation to provide accurate, cited responses grounded in our source documents.

### RAG Features:
- **Semantic retrieval**: Finds most relevant chunks using vector similarity
- **Citation integration**: Automatic inline citations with source tracking
- **Context optimization**: Intelligent chunk selection and formatting
- **LLM integration**: Uses powerful LLMs for natural response generation
- **Source transparency**: Complete traceability of information sources

In [10]:
from src.generation.rag import RAGGenerator

# Initialize RAG system
rag_generator = RAGGenerator(
    embedding_generator=embedding_generator,
    vector_db=vector_db,
    openai_api_key=os.getenv("OPENAI_API_KEY"),
    model_name="gpt-4o-mini"
)

# Process a user query
user_query = "What is RAFT and how does it differ from RAG and fine tuning?"

# RAG system performs:
# 1. Query embedding generation
# 2. Vector similarity search
# 3. Context formatting with citations
# 4. LLM response generation
# 5. Citation integration

rag_result = rag_generator.generate_response(user_query)

print(f"Query: {rag_result.query}")
print(f"Response: {rag_result.response}")
print(f"Sources used: {len(rag_result.sources_used)}")

# Show citation details
for i, source in enumerate(rag_result.sources_used, 1):
    print(f"\n[{i}] {source['source_file']}")
    if source.get('page_number'):
        print(f"    Page: {source['page_number']}")
    print(f"    Type: {source['source_type']}")
    print(f"    Relevance: {source['relevance_score']:.3f}")

# The response includes inline citations like [1], [2], etc.
print(f"\nCitation summary:")
print(rag_result.get_citation_summary())

INFO:src.generation.rag:RAG Generator initialized with gpt-4o-mini
INFO:src.generation.rag:Generating response for: 'What is RAFT and how does it differ from RAG and f...'
INFO:src.vector_database.milvus_vector_db:Search completed: 10 results found
[92m22:26:48 - LiteLLM:INFO[0m: utils.py:3258 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= gpt-4o-mini; provider = openai
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[92m22:26:55 - LiteLLM:INFO[0m: utils.py:1260 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
INFO:src.generation.rag:Response generated successfully using 4 sources


Query: What is RAFT and how does it differ from RAG and fine tuning?
Response: RAFT, which stands for "Adapting Language Model to Domain Specific RAG," is a training strategy designed to enhance the performance of language models in answering questions within specific domains, particularly in "open-book" settings where the model can reference a set of documents during inference [4]. 

RAFT differs from the traditional RAG (Retrieval-Augmented Generation) approach in that it adapts large language models (LLMs) to read solutions from a curated set of positive and negative documents, rather than relying solely on the outputs from a retriever that mixes memorization and reading [2]. This adaptation allows RAFT to better process context and extract relevant information, which is crucial for answering questions accurately.

In comparison to standard fine-tuning methods, RAFT incorporates both instructional tuning and context comprehension into its training dataset. This dual approach helps p

## 8. Memory Storage for Conversations

Each RAG interaction is stored in memory for context preservation and conversation continuity. Here's what a sample conversation turn looks like:

```
conversation_turn = {
    "user_message": user_query,
    "assistant_response": rag_result.response,
    "sources_used": rag_result.sources_used,
    "timestamp": datetime.now().isoformat(),
    "session_metadata": {
        "retrieval_count": rag_result.retrieval_count,
        "generation_tokens": rag_result.generation_tokens
    }
}
```

### Memory Storage Features:
- **Automatic storage**: Every conversation turn is preserved
- **Rich metadata**: Includes sources, timestamps, and generation stats
- **Context continuity**: Enables natural follow-up conversations
- **Session management**: Clean session boundaries for different topics

In [11]:
# Save the conversation turn to memory
memory.save_conversation_turn(rag_result)
memory.wait_for_indexing()
# Get the conversation context
context = memory.get_conversation_context()
print(f"\nConversation Context:\n{context}")

INFO:httpx:HTTP Request: POST https://api.getzep.com/api/v2/threads/demo_session/messages "HTTP/1.1 201 Created"
INFO:httpx:HTTP Request: POST https://api.getzep.com/api/v2/threads/demo_session/messages "HTTP/1.1 201 Created"
INFO:httpx:HTTP Request: POST https://api.getzep.com/api/v2/graph "HTTP/1.1 202 Accepted"
INFO:src.memory.memory_layer:Saved conversation turn with 4 sources
INFO:src.memory.memory_layer:Waiting 10s for Zep indexing...
INFO:httpx:HTTP Request: GET https://api.getzep.com/api/v2/threads/demo_session/context "HTTP/1.1 200 OK"



Conversation Context:

FACTS and ENTITIES represent relevant context to the current conversation.

# These are the most relevant facts and their valid date ranges
# format: FACT (Date range: from - to)
<FACTS>
  - RAFT is a specialized training method that enhances the capabilities of language models in specific domains. (2025-10-06 12:10:00 - present)
  - RAFT differs from traditional RAG. (2025-10-06 12:01:23 - present)
  - RAFT has an advantage in processing context more effectively than DSF models. (2025-10-06 12:01:23 - present)
  - RAFT involves organizing the training dataset in a way that includes distractor documents and ensures some portions lack oracle documents in their context. (2025-10-06 12:01:23 - present)
  - RAFT differs from standard fine-tuning methods. (2025-10-06 12:01:23 - present)
</FACTS>

# These are the most relevant entities
# Name: ENTITY_NAME
# Label: entity_label (if present)
# Attributes: (if present)
#   attr_name: attr_value
# Summary: entity summary


Memory enables:
1. Follow-up questions with context
2. Reference to previous discussions
3. Personalized responses based on history
4. Source preference learning

In [12]:
# Example follow-up query
followup_query = "Can you elaborate more on the distractor and oracle document part?"

# Memory provides context from previous conversation
followup_result = rag_generator.generate_response(followup_query + f"Conversation Context:\n{context}")
# followup_result = rag_generator.generate_response(followup_query)
print(f"\nFollow-up response: {followup_result.response}")

# Clear session when needed
memory.clear_session()
print("\nSession cleared - fresh conversation context")

INFO:src.generation.rag:Generating response for: 'Can you elaborate more on the distractor and oracl...'
INFO:src.vector_database.milvus_vector_db:Search completed: 10 results found
[92m22:28:47 - LiteLLM:INFO[0m: utils.py:3258 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= gpt-4o-mini; provider = openai
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[92m22:28:57 - LiteLLM:INFO[0m: utils.py:1260 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
INFO:src.generation.rag:Response generated successfully using 4 sources



Follow-up response: In the context of the RAFT training method, the use of oracle and distractor documents plays a crucial role in enhancing the performance of language models in specific domains. 

1. **Oracle Documents**: These are the documents that contain the relevant information needed to answer specific questions. In the RAFT framework, for a certain fraction (P) of the questions in the dataset, the oracle document is retained alongside distractor documents. This means that the model is trained with access to the correct context that directly relates to the questions being asked [1].

2. **Distractor Documents**: These are documents that do not contain answer-relevant information but are included in the training process to challenge the model. For the remaining fraction (1 - P) of the questions, only distractor documents are included, without any oracle document. This approach is designed to help the model learn to discern relevant information from irrelevant information, there

INFO:httpx:HTTP Request: DELETE https://api.getzep.com/api/v2/threads/demo_session "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.getzep.com/api/v2/threads "HTTP/1.1 201 Created"
INFO:src.memory.memory_layer:Session demo_session cleared and recreated



Session cleared - fresh conversation context


## 9. Podcast Script Generation

Transform your documents into engaging podcast conversations using AI-powered script generation.

In [None]:
from src.podcast.script_generator import PodcastScriptGenerator

# Initialize script generator
script_generator = PodcastScriptGenerator(
    openai_api_key=os.getenv("OPENAI_API_KEY"),
    model_name="gpt-4o-mini"
)

# Generate podcast from document content
url = "https://thinkingmachines.ai/blog/lora/"
web_chunks = web_scraper.scrape_url(url)
# document_content = "\n\n".join([chunk.content for chunk in doc_chunks[:10]])

podcast_script = script_generator.generate_script_from_website(
    website_chunks=web_chunks,
    source_url=url,
    podcast_style="conversational",  # or "educational", "interview", "debate"
    target_duration="10 minutes"
)

print(f"\nGenerated podcast script:")
print(f"Source: {podcast_script.source_document}")
print(f"Total lines: {podcast_script.total_lines}")
# print(f"Estimated duration: {podcast_script.estimated_duration}")

# Display the conversation
for i, line_dict in enumerate(podcast_script.script, 1):
    speaker, dialogue = next(iter(line_dict.items()))
    print(f"\n{i}. {speaker}: {dialogue}")

# Save script as JSON
script_json = podcast_script.to_json()

if not os.path.exists("outputs"):
    os.makedirs("outputs")

with open("outputs/generated_podcast.json", "w") as f:
    f.write(script_json)
print("\nScript saved to outputs/generated_podcast.json")

INFO:src.podcast.script_generator:Podcast script generator initialized with gpt-4o-mini
INFO:src.web_scraping.web_scraper:Scraping URL: https://thinkingmachines.ai/blog/lora/
INFO:src.web_scraping.web_scraper:Successfully scraped https://thinkingmachines.ai/blog/lora/: 46 chunks created
INFO:src.podcast.script_generator:Generating podcast script from website: https://thinkingmachines.ai/blog/lora/
[92m22:30:26 - LiteLLM:INFO[0m: utils.py:3258 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= gpt-4o-mini; provider = openai
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[92m22:30:46 - LiteLLM:INFO[0m: utils.py:1260 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
INFO:src.podcast.script_generator:Generated website script with 20 lines



Generated podcast script:
Source: https://thinkingmachines.ai/blog/lora/
Total lines: 20

1. Speaker 1: Welcome everyone to today's episode of our podcast! I'm thrilled to have you join us as we explore some intriguing insights from a recent document titled 'LoRA Without Regret.'.

2. Speaker 2: Thanks for tuning in! This topic is super relevant in the world of AI and machine learning, especially when it comes to fine-tuning large language models. So, what exactly is LoRA?

3. Speaker 1: Great question! LoRA stands for Low-Rank Adaptation, and it's a method used to fine-tune large language models more efficiently. Instead of adjusting the entire model, it focuses on a smaller set of parameters, making the process faster and less resource-intensive.

4. Speaker 2: Exactly! The document mentions that today's language models can have trillions of parameters. It just doesn’t make sense to use all of that when you only need to make specific updates, right?

5. Speaker 1: Absolutely! Using 

FileNotFoundError: [Errno 2] No such file or directory: 'outputs/generated_podcast.json'

### Podcast Script Features:
- **Multiple styles**: Conversational, educational, interview, debate formats
- **Duration control**: 5, 10, 15, or 20-minute target lengths
- **Natural dialogue**: AI-generated conversations between two speakers
- **JSON export**: Structured format for further processing

## 10. Text-to-Speech Audio Generation

Convert podcast scripts into high-quality multi-speaker audio using Kokoro TTS.

In [39]:
!pip install "kokoro>=0.9.4"

[31mERROR: Ignored the following versions that require a different python version: 0.8.1 Requires-Python >=3.10,<3.13; 0.8.2 Requires-Python >=3.10,<3.13; 0.8.3 Requires-Python >=3.10,<3.13; 0.8.4 Requires-Python >=3.10,<3.13; 0.9.2 Requires-Python >=3.10,<3.13; 0.9.4 Requires-Python >=3.10,<3.13[0m[31m
[0m[31mERROR: Could not find a version that satisfies the requirement kokoro>=0.9.4 (from versions: 0.2.1, 0.2.2, 0.2.3, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.7.4, 0.7.6, 0.7.8, 0.7.9, 0.7.11, 0.7.12, 0.7.13, 0.7.14, 0.7.15, 0.7.16)[0m[31m
[0m[31mERROR: No matching distribution found for kokoro>=0.9.4[0m[31m
[0m

In [38]:
from src.podcast.text_to_speech import PodcastTTSGenerator

# Initialize TTS generator
tts_generator = PodcastTTSGenerator()

# Generate audio from podcast script
output_dir = "outputs/"

audio_files = tts_generator.generate_podcast_audio(
    podcast_script=podcast_script,
    output_dir=output_dir,
    combine_audio=True
)

print(f"Generated {len(audio_files)} audio files:")
for audio_file in audio_files:
    file_name = Path(audio_file).name
    print(f"  - {file_name}")
    
    # Show file info
    if "complete_podcast" in file_name:
        print(f"    📻 Complete podcast ready!")
    elif "speaker_1" in file_name:
        print(f"    🎤 Speaker 1 segment")
    elif "speaker_2" in file_name:
        print(f"    🎤 Speaker 2 segment")

print(f"\nPodcast generation complete!")
print(f"Total segments: {podcast_script.total_lines}")
print(f"Output directory: {output_dir}")

# The complete podcast is ready for playback or distribution
complete_podcast = next(f for f in audio_files if "complete_podcast" in f)
print(f"Complete podcast: {complete_podcast}")

Kokoro not installed. Install with: pip install kokoro>=0.9.4


ImportError: Kokoro TTS not available. Install with: pip install kokoro>=0.9.4 soundfile

### Text-to-Speech Features:
- **Multi-speaker support**: Distinct voices for Speaker 1 and Speaker 2
- **Natural speech**: Kokoro TTS for high-quality, natural-sounding audio
- **Segment generation**: Individual files for each dialogue segment
- **Audio combining**: Automatic creation of complete podcast file
- **Professional quality**: 24kHz sample rate for clear audio

## Complete Workflow Integration

Here's how all components work together in the main application:

## Summary

This NotebookLM implementation provides a complete pipeline from raw content to interactive AI conversations and podcast generation:

1. **Multi-modal processing**: Documents, audio, video, and web content
2. **Semantic search**: Vector embeddings for intelligent retrieval
3. **Cited responses**: Transparent source attribution with interactive citations
4. **Memory integration**: Contextual conversations with history
5. **Podcast generation**: AI-powered script creation and text-to-speech

The system is designed to be modular, scalable, and production-ready, with each component handling specific aspects of the content-to-conversation pipeline.