<a href="https://colab.research.google.com/github/SandeepKonduruFeb12/aiml/blob/master/aiml/silver/A1LangChain_RAG_PdfSummaryHclKey.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Summary of Notebook Steps

This Google Colab notebook outlines the process of building a Retrieval Augmented Generation (RAG) system using LangChain and ChromaDB for summarizing PDF documents. The steps followed are:

1.  **Install Libraries**: Essential Python packages such as `langchain`, `langchain_openai`, `langchain_community`, `langchain_chroma`, `pypdf`, and `sentence-transformers` were installed.
2.  **Configure LLM**: A `ChatOpenAI` model was configured using a custom API endpoint (`https://aicafe.hcl.com/AICafeService/...`) and an API key fetched from Colab secrets, serving as a placeholder for any compatible large language model.
3.  **Load PDF Document**: A sample PDF document (`transformers_part1.pdf`) was downloaded and loaded into the notebook using `PyPDFLoader`.
4.  **Chunk Text**: The loaded PDF content was split into smaller, overlapping text chunks using `RecursiveCharacterTextSplitter`.
5.  **Generate Embeddings**: `SentenceTransformerEmbeddings` (specifically, the `thenlper/gte-base` model) were initialized to convert these text chunks into numerical vector representations.
6.  **Create Vector Store (ChromaDB)**: A ChromaDB vector store was created from the text chunks and their corresponding embeddings. This vector store was also configured to persist to a local directory.
7.  **Set up Retriever**: A retriever was configured from the ChromaDB vector store to efficiently search and retrieve the most relevant text chunks based on a query.
8.  **Define System Prompt**: A `system_prompt` was defined to guide the AI, instructing it to provide concise and accurate summaries of the document, focusing on key concepts and architecture, and to cite specific parts of the document in its answers.
9.  **Build LangChain RAG Chain**: A `rag_chain` was constructed using LangChain's Expression Language, integrating the `retriever`, a `ChatPromptTemplate` (incorporating the system prompt and retrieved context), and the configured `llm`.
10. **Test Summarization**: The `rag_chain` was invoked with a sample question, and the generated summary, complete with references, was displayed, demonstrating the successful implementation of the RAG system.

In [None]:
!pip install langchain
!pip install langchain_openai

Collecting langchain_openai
  Downloading langchain_openai-1.1.0-py3-none-any.whl.metadata (2.6 kB)
Downloading langchain_openai-1.1.0-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.3/84.3 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: langchain_openai
Successfully installed langchain_openai-1.1.0


In [None]:
!pip install langchain_community langchain_chroma

Collecting langchain_community
  Downloading langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain_chroma
  Downloading langchain_chroma-1.0.0-py3-none-any.whl.metadata (1.9 kB)
Collecting langchain-classic<2.0.0,>=1.0.0 (from langchain_community)
  Downloading langchain_classic-1.0.0-py3-none-any.whl.metadata (3.9 kB)
Collecting requests<3.0.0,>=2.32.5 (from langchain_community)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7.0,>=0.6.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting chromadb<2.0.0,>=1.0.20 (from langchain_chroma)
  Downloading chromadb-1.3.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.2 kB)
Collecting build>=1.0.3 (from chromadb<2.0.0,>=1.0.20->langchain_chroma)
  Downloading build-1.3.0-py3-none-any.whl.metadata (5.6 kB)
Collecting pybase64>=1.4.1 (from chromadb<2.0.0,>=1.0.20->langchain_chroma)
  Downloadi

In [None]:

import os
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, Sequence
from langchain_core.messages import BaseMessage, SystemMessage, HumanMessage, ToolMessage
from operator import add as add_messages
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_core.tools import tool

In [None]:
!pip install pypdf



In [None]:
import os
import requests
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, Sequence
from langchain_core.messages import BaseMessage, SystemMessage, HumanMessage, ToolMessage
from operator import add as add_messages
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_core.tools import tool
from google.colab import userdata
from langchain_openai import ChatOpenAI # Added import for ChatOpenAI

# API KEY: Get your key from https://aicafe.hcl.com/AICafe/#/tutorials/api-docs
API_KEY = userdata.get('hcl')  # <-- fetching key from Colab secrets

# --- Endpoint config ---
API_URL = (
    "https://aicafe.hcl.com/AICafeService/api/v1/subscription/openai/"
    "deployments/gpt-4.1/chat/completions?api-version=2024-12-01-preview"
)

llm = ChatOpenAI(
    base_url=API_URL,
    api_key=API_KEY, # Pass API_KEY here to satisfy the internal client requirement
    model="gpt-4.1",
    temperature=0,
    extra_headers={"api-key": API_KEY} # Explicitly pass API key in 'api-key' header
) # I want to minimize hallucination - temperature = 0 makes the model output more deterministic

pdf_path = "/content/sample_data/transformers_part1.pdf"

!wget -O "{pdf_path}" "https://lauracornei.github.io/assets/other/transformers_part1.pdf"
# Safety measure I have put for debugging purposes :)

if not os.path.exists(pdf_path):
    print(f"ERROR: PDF file not found at: {pdf_path}")
    raise FileNotFoundError(f"PDF file not found: {pdf_path}")
pdf_loader = PyPDFLoader(pdf_path) # This loads the PDF

# Checks if the PDF is there
try:
    pages = pdf_loader.load()
    print(f"PDF has been loaded and has {len(pages)} pages")
except Exception as e:
    print(f"Error loading PDF: {e}")
    raise

                extra_headers was transferred to model_kwargs.
                Please confirm that extra_headers is what you intended.
  if (await self.run_code(code, result,  async_=asy)):


--2025-11-27 09:38:17--  https://lauracornei.github.io/assets/other/transformers_part1.pdf
Resolving lauracornei.github.io (lauracornei.github.io)... 185.199.108.153, 185.199.109.153, 185.199.110.153, ...
Connecting to lauracornei.github.io (lauracornei.github.io)|185.199.108.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1229562 (1.2M) [application/pdf]
Saving to: ‘/content/sample_data/transformers_part1.pdf’


2025-11-27 09:38:17 (41.1 MB/s) - ‘/content/sample_data/transformers_part1.pdf’ saved [1229562/1229562]

PDF has been loaded and has 32 pages


In [None]:
!pip install sentence-transformers

from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import SentenceTransformerEmbeddings
openai_api_key = ""
os.environ.pop("OPEN_AI_KEY", None)
# Chunking Process
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)


pages_split = text_splitter.split_documents(pages) # We now apply this to our pages

persist_directory = r"/content/sample_data/"
collection_name = "stock_market"

# If our collection does not exist in the directory, we create using the os command
if not os.path.exists(persist_directory):
    os.makedirs(persist_directory)

# Local embeddings (no token)
embeddings = SentenceTransformerEmbeddings(model_name = "thenlper/gte-base")


try:
    # Here, we actually create the chroma database using our embeddigns model
    vectorstore = Chroma.from_documents(
        documents=pages_split,
        embedding=embeddings,
        persist_directory=persist_directory,
        collection_name=collection_name
    )

    print(f"Created ChromaDB vector store!")

except Exception as e:
    print(f"Error setting up ChromaDB: {str(e)}")
    raise


# Now we create our retriever
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5} # K is the amount of chunks to return
)




  embeddings = SentenceTransformerEmbeddings(model_name = "thenlper/gte-base")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/385 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/618 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/219M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Created ChromaDB vector store!


In [None]:
system_prompt = """
You are an intelligent AI assistant who answers questions about Stock Market Performance in 2024 based on the PDF document loaded into your knowledge base.
Use the retriever tool available to answer questions about the stock market performance data. You can make multiple calls if needed.
If you need to look up some information before asking a follow up question, you are allowed to do that!
Please always cite the specific parts of the documents you use in your answers.
"""


### Create LangChain Retrieval Chain for Summarization

Now, I'll construct a simple RAG chain using LangChain's Expression Language. This chain will:
1. Retrieve relevant document chunks based on the user's question.
2. Format these chunks along with the `system_prompt` and the original question into a single prompt.
3. Pass this combined prompt to the LLM to generate a summary.

In [None]:
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_core.prompts import ChatPromptTemplate

# Define the prompt template for summarization
prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt + "\n\nContext: {context}"),
    ("human", "{question}")
])

# Create the RAG chain
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
)

print("LangChain RAG summarization chain created.")

LangChain RAG summarization chain created.


### Test the Summarization Chain

Let's test the newly created `rag_chain` with an example question to summarize the document.

In [None]:
question = "Please summarize the key architectural components of Transformer models as described in the document."

print(f"Question: {question}\n")

response = rag_chain.invoke(question)

print("Summary from RAG chain:")
print(response.content)

Question: Please summarize the key architectural components of Transformer models as described in the document.

Summary from RAG chain:
Certainly! The document "Transformers – an in-depth tutorial Part I: Transformer’s architecture" by Laura-Maria Cornei outlines the following key architectural components of Transformer models:

1. **Encoder-Decoder Structure**:  
   The classic Transformer architecture consists of an encoder (a stack of N encoder blocks/layers) and a decoder (a stack of N decoder blocks/layers). Both encoder and decoder process data that has been preprocessed into embeddings with positional encodings (referred to as P.E.E. processed data) [Document, page 2].

2. **Input Preprocessing**:  
   Inputs are converted into word vector representations (embeddings), and positional encodings are added to these embeddings to provide information about the position of each token in the sequence [Document, page 2].

3. **Attention Mechanisms**:  
   There are three main attention