In [1]:
!pip install -qqq langchain langchain-core langchain-community langchain-huggingface langchain-text-splitters pypdf sentence_transformers chromadb huggingface_hub transformers accelerate

import os
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings, HuggingFacePipeline
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

print("Libraries installed and imported successfully!")

dummy_document_content = """
Generative AI (GenAI) is a type of artificial intelligence that can create new content,
including text, images, audio, and synthetic data. Unlike traditional AI that analyzes
existing data, GenAI models learn from vast datasets to generate novel outputs.

One of the most prominent applications of GenAI is in Natural Language Processing (NLP),
where models like GPT (Generative Pre-trained Transformer) can write articles, stories,
code, and even engage in conversational AI. These models are trained on massive text corpora
and can understand context, generate coherent responses, and perform various language-related tasks.

Another significant area is computer vision, where GenAI can create realistic images from texta
descriptions (e.g., DALL-E, Midjourney), perform image-to-image translation, or generate
3D models. This has revolutionary implications for design, entertainment, and digital art.

Retrieval-Augmented Generation (RAG) is an architectural pattern that enhances GenAI models
by providing them with access to external, up-to-date, and domain-specific information. Instead
of relying solely on the knowledge encoded during their training, RAG models first retrieve relevant
documents or data snippets from a knowledge base (e.g., vector database) and then use this
retrieved information to inform their generation process. This approach helps in reducing
hallucinations, improving factual accuracy, and grounding the model's responses in specific sources.

Key components of a RAG system typically include:
1.  **Document Loader**: To ingest data from various sources (PDFs, web pages, databases).
2.  **Text Splitter**: To break down large documents into smaller, manageable chunks.
3.  **Embeddings Model**: To convert text chunks into numerical vector representations.
4.  **Vector Store**: To store and efficiently search these embeddings.
5.  **Retriever**: To fetch relevant chunks based on a query.
6.  **Generative Model (LLM)**: To synthesize the answer using the query and retrieved context.

RAG is particularly useful in enterprise settings for Q&A over internal documents,
legal research, customer support bots, and personalized content generation where accuracy
and traceability to sources are critical.
"""

with open("genai_rag_overview.txt", "w") as f:
    f.write(dummy_document_content)
document_path = "genai_rag_overview.txt"

loader = TextLoader(document_path)
documents = loader.load()

print(f"\nLoaded {len(documents)} document(s).")
print(f"First 200 characters of the document: {documents[0].page_content[:200]}...")

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150
)
texts = text_splitter.split_documents(documents)

print(f"Split into {len(texts)} chunks.")
print(f"First chunk: {texts[0].page_content[:150]}...")

print("\nInitializing HuggingFaceEmbeddings...")
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

print("Creating ChromaDB vector store...")
vectordb = Chroma.from_documents(
    documents=texts,
    embedding=embeddings,
    persist_directory="./chroma_db"
)
print("ChromaDB vector store created and persisted.")

print("\nInitializing local LLM (this may take a minute to download)...")
model_id = "google/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, max_new_tokens=256)
llm = HuggingFacePipeline(pipeline=pipe)
print("LLM initialized.")

print("\nSetting up RAG chain...")
retriever = vectordb.as_retriever()

template = """Use the following context to answer the question.

Context: {context}

Question: {question}

Answer:"""

prompt = PromptTemplate.from_template(template)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)
print("RAG chain ready. You can now ask questions!")

print("\n--- RAG Q&A Examples ---")

question1 = "What is Generative AI?"
print(f"\nQuestion 1: {question1}")
answer1 = rag_chain.invoke(question1)
print(f"Answer 1: {answer1}")

question2 = "How does RAG enhance GenAI models?"
print(f"\nQuestion 2: {question2}")
answer2 = rag_chain.invoke(question2)
print(f"Answer 2: {answer2}")

question3 = "List the key components of a RAG system."
print(f"\nQuestion 3: {question3}")
answer3 = rag_chain.invoke(question3)
print(f"Answer 3: {answer3}")



Libraries installed and imported successfully!

Loaded 1 document(s).
First 200 characters of the document: 
Generative AI (GenAI) is a type of artificial intelligence that can create new content,
including text, images, audio, and synthetic data. Unlike traditional AI that analyzes
existing data, GenAI mod...
Split into 3 chunks.
First chunk: Generative AI (GenAI) is a type of artificial intelligence that can create new content,
including text, images, audio, and synthetic data. Unlike trad...

Initializing HuggingFaceEmbeddings...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Creating ChromaDB vector store...
ChromaDB vector store created and persisted.

Initializing local LLM (this may take a minute to download)...


tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Device set to use cpu
Token indices sequence length is longer than the specified maximum sequence length for this model (525 > 512). Running this sequence through the model will result in indexing errors


LLM initialized.

Setting up RAG chain...
RAG chain ready. You can now ask questions!

--- RAG Q&A Examples ---

Question 1: What is Generative AI?
Answer 1: a type of artificial intelligence that can create new content, including text, images, audio, and synthetic data

Question 2: How does RAG enhance GenAI models?
Answer 2: providing them with access to external, up-to-date, and domain-specific information

Question 3: List the key components of a RAG system.
Answer 3: Document Loader**: To ingest data from various sources (PDFs, web pages, databases). 2. **Text Splitter**: To break down large documents into smaller, manageable chunks. 3. **Embeddings Model**: To convert text chunks into numerical vector representations. 4. **Vector Store**: To store and efficiently search these embeddings. 5. **Retriever**: To fetch relevant chunks based on a query. 6. **Generative Model (LLM)**: To synthesize the answer using the query and retrieved context. RAG is particularly useful in enterpris