In [None]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Open AI API Key:")
open_ai_model = "gpt-5-nano-2025-08-07"

Open AI API Key:¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


# Retrieval Augmented Generation (RAG)

[Meta AI introduced the RAG method](https://ai.meta.com/blog/retrieval-augmented-generation-streamlining-the-creation-of-intelligent-natural-language-processing-models/), emphasizing its potential for knowledge-intensive tasks.

## üîç **1. What is RAG?**

- RAG, or Retrieval Augmented Generation, boosts large language models (LLMs) by tapping into external knowledge sources.

- Meta AI pioneered RAG to tackle knowledge-heavy tasks efficiently.

- Combines info retrieval with text generation, enabling LLMs to access fresh, reliable info.

- Ideal for tasks needing accurate, current data.

## ü§î **2. Why RAG was developed?**

- LLMs excel in mimicking human text but face limitations.

- High training/fine-tuning costs.

- Knowledge is static, outdated post-training.

- "Hallucinating" issue: confidently giving wrong info.

- RAG overcomes these by merging LLM prowess with real-time data access.

## 3. üõ†Ô∏è **3. How RAG Works?**

- On receiving a query (like a question), RAG fetches relvant documents/passages from external sources (like Wikipedia).

- Blends these retrieved docs with the query to create an enriched context. This is then processed by a text generator (e.g., GPT-3) to generate the final answer.

<img src="https://docs.aws.amazon.com/images/sagemaker/latest/dg/images/jumpstart/jumpstart-fm-rag.jpg">

## üåü **4. Key Features of RAG:**

- RAG stays current, accessing the latest info, unlike static-knowledge LLMs.

- Integrates fresh info without the high cost of retraining the whole LLM.

- Sources reliable info, reducing wrong answers or "hallucinations."

## üõ† **5. Practical Implementations:**

- Answering evolving topic questions.

- Useful in domains needing real-time accuracy (e.g., medical, legal).

- Boosts chatbots/virtual assistants with factual, updated replies.

## üìå **In Summary:**

- RAG: Marrying vast LLM knowledge with the latest real-world info.

- Ensures models knowledgeable, up-to-date, and accurate.

# üõ†Ô∏è **RAG Implementation in LangChain**

1. üß† **LLM**: The brain of the system, generating human-like text.

2. üåê **Vector Store**: The heart of retrieval - stores text embeddings for quick, efficient access.

3. üîç **Vector Store Retriever**: The system's "search engine," finding relevant documents via vector similarities.

4. üîÑ **Embedder**: Transforms text into vectors, making it readable for the system.

5. üí¨ **Prompt**: Captures the initial user query or statement, kicking off the process.

6. üìö **Document Loader**: Manages the import and preparation of documents for processing.

7. üß© **Document Chunker**: Breaks down large documents into smaller segments for better efficiency.

8. üë§ **User Input**: The starting point, where the user's query activates the RAG workflow.


# üåê **The RAG System and Its Subsystems**

### 1. üóÇÔ∏è **Index Subsystem**
<img src="https://python.langchain.com/assets/images/rag_indexing-8160f90a90a33253d0154659cf7d453f.png">

   - **Components**: Embedder, Vector Store, Document Loader, Document Chunker.

   - **Function**: Processes and organizes data into an accessible format.

   - **Role**: Creates a searchable database of vectorized information.

### 2. üîé **Retrieval Subsystem**:

   - **Components**: User Input, Prompt, Vector Store Retriever.

   - **Function**: Matches user queries with relevant data.

   - **Role**: Fetches the most pertinent information from the index based on user input.

### 3. ü§ñ **Augment Subsystem**:

   - **Components**: LLM, User Input, Retrieved Data.

   - **Function**: Integrates user queries with retrieved data.

   - **Role**: Generates accurate and context-rich responses, blending human-like text generation with factually correct information.


[Build a RAG agent with LangChain](https://docs.langchain.com/oss/python/langchain/rag)



<img src="https://mintcdn.com/langchain-5e9cc07a/I6RpA28iE233vhYX/images/rag_indexing.png?w=1100&fit=max&auto=format&n=I6RpA28iE233vhYX&q=85&s=675f55e100bab5e2904d27db01775ccc">

<img src="https://mintcdn.com/langchain-5e9cc07a/I6RpA28iE233vhYX/images/rag_retrieval_generation.png?w=1100&fit=max&auto=format&n=I6RpA28iE233vhYX&q=85&s=d390a6a758e688ec36352d30b22249b0">


Together, these subsystems form a seamless flow, transforming user queries into comprehensive and reliable responses.

# Load documents
There are various document loaders in LangChain

Check out [the documentation](https://docs.langchain.com/oss/python/integrations/document_loaders/index#common-file-types) to see how many are available.

## üìÇ **Understanding Document Loaders in LangChain**

- üìö LangChain document loaders load data from various sources into Document objects.

- üìÑ A Document is text with metadata.

- üåê Loaders fetch data from text files, web pages, video transcripts, etc.

- üîÑ Main role: Retrieve data for further processing.

- üõ†Ô∏è Method: Use `load` to fetch data and return it as a Document.

- üß† Some loaders support lazy loading (data loads into memory only when needed).

## üîß **How to Use Document Loaders**

1. üì• Import the loader class from `langchain.document_loaders`.

2. üèóÔ∏è Create an instance of your chosen class with the directory path.

3. üöÄ Use `load()` to load files in the directory into Document format.


In [22]:
from langchain_community.document_loaders import WebBaseLoader

coursera_2026_trends = WebBaseLoader("https://www.coursera.org/articles/ai-trends").load()

google_trends = WebBaseLoader("https://trends.withgoogle.com/trends/us/artificial-intelligence-search-trends/?hl=en-US").load()

forbes_gen_ai_trends = WebBaseLoader("https://www.forbes.com/sites/bernardmarr/2025/10/13/10-generative-ai-trends-in-2026-that-will-transform-work-and-life/").load()

In [24]:
forbes_gen_ai_trends[0]

Document(metadata={'source': 'https://www.forbes.com/sites/bernardmarr/2025/10/13/10-generative-ai-trends-in-2026-that-will-transform-work-and-life/', 'title': '10 Generative AI Trends In 2026 That Will Transform Work And Life', 'description': 'Generative AI is moving into a new phase in 2026, reshaping industries from entertainment to healthcare while creating fresh opportunities and challenges.', 'language': 'en'}, page_content="10 Generative AI Trends In 2026 That Will Transform Work And LifeNewslettersGamesShare a News TipFeaturedFeaturedBreaking NewsWhite House WatchDaily Cover StoriesAmerica's 2025 Top Wealth Management Teams Private Wealth List  | Paid ProgramThe Forbes CIO Next List: 2025 | Paid ProgramAmerica's 2025 Top Wealth Management Teams High Net Worth List When Teens Fundraise To End Blood Cancer, They Change Lives, Including Their Own | Paid ProgramThe Employee Well-Being Imperative | Paid ProgramBest-In-State Top Next-Gen Wealth Advisors 2025AI‚Äôs Nuanced Impact And 

# Chunk documents

üî¢ **Exploring Text Splitters in LangChain**

- üìñ Text splitters divide long texts into
smaller, meaningful parts.

- üß© Aim: Make large texts easier to handle for analysis or processing.

### How Text Splitters Work:

1. ‚úÇÔ∏è Split text into small, meaningful chunks (like sentences).

2. üìè Combine these chunks into a larger one until a certain size is reached.

3. üìå Once the size is reached, start a new chunk with some overlap for context.

### Customization Axes:

1. üõ†Ô∏è How the text is split.

2. üìê How chunk size is measured.

## Getting Started with Text Splitters

- üöÄ Default choice: `RecursiveCharacterTextSplitter`.

- üìã Works by: Splitting text based on a list of characters.

- üîÑ If chunks are too large, it moves to the next character.

- üìå Default split characters: `["\n\n", "\n", " ", ""]`.

### Additional Controls:

- üìè `length_function`: Defines how chunk length is calculated (default: character count, token counter is common).

- üîç `chunk_size`: Sets the maximum chunk size.

- üîÄ `chunk_overlap`: Determines overlap between chunks for continuity.

- üìä `add_start_index`: Option to include each chunk's start position in the original document in metadata.

In [25]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 50,
    length_function = len
)

coursera_2026_trends_chunks = text_splitter.transform_documents(coursera_2026_trends)

google_trends_chunks = text_splitter.transform_documents(google_trends)

forbes_gen_ai_trends_chunks = text_splitter.transform_documents(forbes_gen_ai_trends)

# Index System

- üéØ **Purpose:** Efficiently organize data for easy retrieval.

### Steps in the Index System:
1. üìö **Load Documents (Document Loader):**
   - Import and read large amounts of data.
2. üß© **Chunk Documents (Document Chunker):**
   - Break down documents into smaller parts for better handling.
3. üåê **Embed Documents (Embedder):**
   - Convert text chunks into vector formats for searchability.
4. üíæ **Store Embeddings (Vector Store):**
   - Keep embeddings and their textual counterparts for retrieval.

In [26]:
from langchain_openai import OpenAIEmbeddings
from langchain_classic.embeddings import CacheBackedEmbeddings  
from langchain_community.vectorstores import FAISS
from langchain_classic.storage import LocalFileStore

store = LocalFileStore("./cache/")

# create an embedder
core_embeddings_model = OpenAIEmbeddings()

embedder = CacheBackedEmbeddings.from_bytes_store(
    core_embeddings_model,
    store,
    namespace = core_embeddings_model.model
)

# store embeddings in vector store
vectorstore = FAISS.from_documents(coursera_2026_trends_chunks, embedder)

vectorstore.add_documents(google_trends_chunks)

vectorstore.add_documents(forbes_gen_ai_trends_chunks)

['11c24448-b469-4ac8-8fdc-a668a21cf02a',
 '6fc89884-6880-4758-a6ae-6688fb9a1959',
 '2b82cf86-8735-4210-9b4e-afba08453826',
 '35709ab2-d21e-4ee5-bf8f-b0a72eab6c78',
 'a8c3034b-70fc-462e-a101-56f4cbca58bb',
 'e7b012b9-f791-4ee4-9ae6-c4bfdc82db09',
 '1a6b256e-5d03-446b-827f-1b13a3685de4',
 '222b7c47-ba9f-416b-93a1-ab01289f5b48',
 'db5eaf09-b002-4463-a93e-7f8fa3e2448d',
 'c2dea1f7-9691-4681-9ba1-492a0bdd521f',
 '49f690e3-ae3c-439a-a0f0-efab14379863',
 '1954d6f0-4259-404d-a602-f457dbc4c5ef',
 '00e0097b-d9a5-49b1-a075-78ec8ce97d4b',
 'f5cf8d64-5b9d-4e35-9f28-9075a0b84a90',
 '6770b781-b05a-4c36-b269-dd4fc43d9aa1',
 'c2fd0658-28d4-4f94-8033-3e2d9ef0d4b7',
 'a9c5ec90-dc39-43cd-84fa-4a3d5ad3f59d',
 'd87d3073-ac7b-4d88-8149-892bf7f49e60',
 'acf271d2-809c-4ee6-b4ea-bf6ab8d1ff5a',
 '7fcb85a2-0adc-43b7-ac1c-112faaa2db37',
 '5612bacb-a09e-4698-b7a4-45a3d0fc2674',
 'f8504344-a87c-4d1e-b632-ea24ac8c2c8e',
 '84a133da-7259-47e0-862c-051846e132ca',
 '34dbe127-c44f-4738-b6af-4cfda836e8ba',
 '23272bf7-0182-

# üîç **Retrieval System**

- üéØ **Purpose:** Fetch relevant information based on user queries.

### Steps in the Retrieval System:

1. üí¨ **Obtain User Query (User Input):**
   - Capture the user's question or statement.

2. üîÑ **Embed User Query (Embedder):**
   - Convert the user's query into a vector format, aligning with indexed documents.

3. üîç **Vector Search (Vector Store Retriever):**
   - Search for document embeddings in the Vector Store that closely match the user query.

4. üìÑ **Return Relevant Documents:**
   - Provide the top matching documents, ensuring pertinence to the query.



In [27]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langsmith import Client


In [28]:
# instantiate a retriever
retriever = vectorstore.as_retriever()

In [29]:
llm = ChatOpenAI(model=open_ai_model)

# üîç **Augment System**

- üöÄ **Purpose:** Improve LLM's input with additional context.

### Steps in the Augment System:

1. üåü **Create Initial Prompt (Prompt):**
   - Begin with the user's initial question or statement.

2. üß© **Augment Prompt with Retrieved Context (Context Integration):**
   - Blend the initial prompt with context from the Vector Store for a richer input.

3. ‚ö° **Send Augmented Prompt to LLM (Input Enhancement):**
   - Pass the enhanced prompt to the LLM.

4. üì¨ **Receive LLM's Response (Output Reception):**
   - Obtain the LLM's comprehensive response after processing the augmented prompt.

In [33]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

client = Client(api_url=os.environ["LANGCHAIN_ENDPOINT"])
prompt = client.pull_prompt("moche/rag-prompt")

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [34]:
prompt

PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, metadata={'lc_hub_owner': 'moche', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': 'cbf9a5588c05a829d7924b7865122580e5b45523ae8fdae9513c1c05de280b04'}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\n\nQuestion: {question} \n\nContext: {context} \n\nAnswer:")

In [35]:
prompt.input_variables

['context', 'question']

In [36]:
# This is the entire augment system!
rag_chain.invoke("What are the trends in AI?")

'A major trend is increasing integration of Generative AI (GenAI) into apps and workflows. Generative AI has been the biggest trend and is expected to unlock trillions of dollars in value across industries. Adoption is already widespread (about 73% of US companies use AI in some capacity), with GenAI remaining a central focus into 2026.'

In [37]:
rag_chain.invoke("What areas of AI should students consider?")

'- Healthcare AI: apply foundational AI concepts to health care problems, improve the patient experience, and innovate in AI in Healthcare. \n- Cybersecurity AI: study AI trends in cybersecurity to protect sensitive information and address security challenges. \n- Multimodal AI and AI literacy: explore multimodal models and large language models, and build foundational AI literacy to understand AI‚Äôs impact on work and society.'

## Return sources

In [None]:
from langchain_core.runnables import RunnableParallel

rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain_with_source = RunnableParallel({"context": retriever, "question": RunnablePassthrough()}).assign(answer=rag_chain_from_docs)

In [44]:
result = rag_chain_with_source.invoke("What areas of AI should students consider?")

### **Note**: Display as HTML

In [45]:
from IPython.display import HTML, display
import html

def format_rag_results_html(result):
    """Format RAG results as beautiful HTML"""
    
    question = result.get('question', '')
    answer = result.get('answer', '')
    context_docs = result.get('context', [])
    
    # Process context documents
    processed_docs = []
    for doc in context_docs:
        if hasattr(doc, 'page_content'):  # Document object (from langchain)
            processed_docs.append({
                'source': doc.metadata.get('source', 'Unknown'),
                'title': doc.metadata.get('title', 'Untitled'),
                'content': doc.page_content
            })
        elif isinstance(doc, dict):
            # Handle dict with metadata key (Document-like structure)
            if 'metadata' in doc:
                processed_docs.append({
                    'source': doc['metadata'].get('source', 'Unknown'),
                    'title': doc['metadata'].get('title', 'Untitled'),
                    'content': doc.get('page_content', '')
                })
            else:
                # Simple dict structure
                processed_docs.append({
                    'source': doc.get('source', 'Unknown'),
                    'title': doc.get('title', 'Untitled'),
                    'content': doc.get('content', doc.get('page_content', ''))
                })
    
    # Escape HTML special characters
    def escape_html(text):
        return html.escape(str(text))
    
    # Generate HTML
    html_content = f"""
    <style>
        .rag-container {{
            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;
            max-width: 1200px;
            margin: 20px auto;
            padding: 20px;
            background: #f8f9fa;
            border-radius: 8px;
        }}
        .rag-header {{
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            color: white;
            padding: 20px;
            border-radius: 8px;
            margin-bottom: 20px;
        }}
        .rag-header h1 {{
            margin: 0;
            font-size: 24px;
        }}
        .question-box {{
            background: white;
            padding: 20px;
            border-radius: 8px;
            margin-bottom: 20px;
            border-left: 4px solid #667eea;
            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
        }}
        .question-box h2 {{
            margin: 0 0 10px 0;
            color: #333;
            font-size: 18px;
        }}
        .question-text {{
            font-size: 16px;
            color: #555;
            font-weight: 500;
        }}
        .answer-box {{
            background: #e8f5e9;
            padding: 20px;
            border-radius: 8px;
            margin-bottom: 20px;
            border-left: 4px solid #4caf50;
            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
        }}
        .answer-box h2 {{
            margin: 0 0 10px 0;
            color: #2e7d32;
            font-size: 18px;
        }}
        .answer-text {{
            font-size: 16px;
            color: #333;
            line-height: 1.6;
        }}
        .context-section {{
            background: white;
            padding: 20px;
            border-radius: 8px;
            margin-bottom: 20px;
            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
        }}
        .context-section h2 {{
            margin: 0 0 15px 0;
            color: #333;
            font-size: 18px;
            border-bottom: 2px solid #667eea;
            padding-bottom: 10px;
        }}
        .context-item {{
            background: #f5f5f5;
            padding: 15px;
            border-radius: 6px;
            margin-bottom: 15px;
            border-left: 3px solid #667eea;
        }}
        .context-item:last-child {{
            margin-bottom: 0;
        }}
        .context-source {{
            font-size: 12px;
            color: #666;
            margin-bottom: 8px;
            font-weight: 600;
        }}
        .context-title {{
            font-size: 14px;
            color: #333;
            margin-bottom: 8px;
            font-weight: 600;
        }}
        .context-content {{
            font-size: 14px;
            color: #555;
            line-height: 1.5;
            max-height: 150px;
            overflow-y: auto;
            white-space: pre-wrap;
        }}
        .badge {{
            display: inline-block;
            padding: 4px 8px;
            background: #667eea;
            color: white;
            border-radius: 4px;
            font-size: 11px;
            margin-right: 8px;
        }}
    </style>
    
    <div class="rag-container">
        <div class="rag-header">
            <h1>üîç RAG (Retrieval-Augmented Generation) Results</h1>
        </div>
        
        <div class="question-box">
            <h2>‚ùì Question</h2>
            <div class="question-text">{escape_html(question)}</div>
        </div>
        
        <div class="answer-box">
            <h2>üí° Answer</h2>
            <div class="answer-text">{escape_html(answer)}</div>
        </div>
        
        <div class="context-section">
            <h2>üìö Retrieved Context ({len(processed_docs)} documents)</h2>
    """
    
    for i, doc in enumerate(processed_docs, 1):
        content_preview = doc['content'][:300] + ('...' if len(doc['content']) > 300 else '')
        html_content += f"""
            <div class="context-item">
                <span class="badge">Document {i}</span>
                <div class="context-source">üîó Source: {escape_html(doc['source'])}</div>
                <div class="context-title">üìÑ {escape_html(doc['title'])}</div>
                <div class="context-content">{escape_html(content_preview)}</div>
            </div>
        """
    
    html_content += """
        </div>
    </div>
    """
    
    return html_content

# Example: Use with actual RAG result from cell 22
# Uncomment the line below to use the actual result:
# result = rag_chain_with_source.invoke("What does Neural Architecture Search have to do with how NVidia creates its models?")
# display(HTML(format_rag_results_html(result)))


display(HTML(format_rag_results_html(result)))