<a href="https://colab.research.google.com/github/ShubhamW248/LLM-Practice/blob/main/Retrieval_Augmented_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Retrieval-Augmented Generation (RAG) Pipeline
This notebook implements a RAG system that combines information retrieval with language model generation to provide accurate, context-aware responses.

---

## **Setup & Dependencies**
Installing and importing required libraries for text processing, embeddings, and language models. Setting up GPU acceleration for improved performance on Colab's T4.

## **Text Resource & Chunking**
Using a comprehensive text about NASA's space exploration history as our knowledge base. The text is split into smaller, manageable chunks for efficient processing and retrieval.

## **Embedding & Indexing**
Converting text chunks into numerical vectors using a pre-trained embedding model. These vectors are stored in a ChromaDB database for quick similarity searching.

## **Local Language Model**
Downloading and setting up a compact language model (OPT-350M) that runs directly on Colab's GPU, avoiding the need for external API calls.

## **Retrieval + Generation (RAG)**
When a question is asked:
- Similar text chunks are retrieved from ChromaDB
- The language model uses these chunks as context to generate accurate answers
- Sources are provided alongside answers for transparency

## **Usage Example**
Demonstrating the pipeline with sample questions about NASA's history, showing how the system combines retrieved context with model generation to provide informed responses.

In [14]:
#!pip install langchain langchain-community transformers torch accelerate chromadb sentence-transformers

In [15]:
import torch
print("GPU Available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU Model:", torch.cuda.get_device_name(0))

GPU Available: True
GPU Model: Tesla T4


# Imports

In [16]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

# Model Setup

In [17]:
# Load model and tokenizer
model_name = "facebook/opt-350m"  # This is a smaller model suitable for Colab T4

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,  # Use half precision for memory efficiency
    device_map="auto"  # Automatically handle GPU/CPU placement
)

# Create pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    device_map="auto"
)

# Wrap the pipeline in a format compatible with LangChain
from langchain.llms import HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=pipe)

tokenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/644 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/663M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/662M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

Device set to use cuda:0
  llm = HuggingFacePipeline(pipeline=pipe)


# Sample Data

In [18]:
# Sample text about space exploration
space_data = """
NASA's Space Exploration History

The National Aeronautics and Space Administration (NASA) was established in 1958, marking the beginning of organized space exploration in the United States. The Mercury program (1958-1963) was NASA's first human spaceflight program, putting the first Americans into space.

The Apollo Program (1961-1972) achieved the first human Moon landing with Apollo 11 in 1969. Neil Armstrong and Buzz Aldrin became the first humans to walk on the Moon, while Michael Collins orbited above. Five more successful Apollo missions followed, bringing more astronauts to the Moon's surface.

The Space Shuttle Program (1981-2011) marked a new era in space exploration. The shuttle was the first reusable spacecraft, conducting 135 missions over 30 years. It helped build the International Space Station (ISS), launched and repaired satellites, and conducted crucial scientific research.

The International Space Station, a joint project of five space agencies, began construction in 1998 and has been continuously occupied since 2000. It serves as a microgravity and space environment research laboratory where scientific research is conducted in astrobiology, astronomy, meteorology, and physics.
"""

# Text Processing

In [19]:
# Create text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    length_function=len,
    separators=["\n\n", "\n", " ", ""]
)

# Split text into chunks
text_chunks = text_splitter.create_documents([space_data])
print(f"Split into {len(text_chunks)} chunks")

Split into 4 chunks


# Initialize Embeddings and Vector Store

In [20]:
# Initialize embeddings (this will also download the model locally)
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={'device': 'cuda'}
)

# Create vector store
vectorstore = Chroma.from_documents(
    documents=text_chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

# Create retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# Setup LLM and RAG Chain

In [21]:
# Create custom prompt template
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}

Question: {question}

Answer: """

prompt = PromptTemplate(
    template=template,
    input_variables=["context", "question"]
)

# Create the RAG chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt}
)

# Query Function

In [22]:
def ask_question(question: str):
    """Function to ask questions to the RAG pipeline"""
    result = qa_chain({"query": question})

    print("Question:", question)
    print("\nAnswer:", result["result"])
    print("\nRelevant Source Chunks:")
    for i, doc in enumerate(result["source_documents"], 1):
        print(f"\nChunk {i}:")
        print(doc.page_content)

# Test the pipeline
question = "When was NASA established?"
ask_question(question)



Question: When was NASA established?

Answer: Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: NASA's Space Exploration History

The National Aeronautics and Space Administration (NASA) was established in 1958, marking the beginning of organized space exploration in the United States. The Mercury program (1958-1963) was NASA's first human spaceflight program, putting the first Americans into space.

NASA's Space Exploration History

The National Aeronautics and Space Administration (NASA) was established in 1958, marking the beginning of organized space exploration in the United States. The Mercury program (1958-1963) was NASA's first human spaceflight program, putting the first Americans into space.

The International Space Station, a joint project of five space agencies, began construction in 1998 and has been continuously occupied since 2000. It serves as a mi

# Test the RAG Pipeline

In [9]:
# Test questions
questions = [
    "When was NASA established?",
    "What was the Space Shuttle Program's main achievement?",
    "What is the Artemis program?"
]

for question in questions:
    print("-" * 80)
    ask_question(question)
    print("-" * 80)
    print()

--------------------------------------------------------------------------------


  result = qa_chain({"query": question})


ConnectionError: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7cc2208ad450>: Failed to establish a new connection: [Errno 111] Connection refused'))