Home

Prompt Engineering
RAG (Retrieval Augmented Generation)
Embeddings + Vector
DB Function Calling / Tools

Prompt Engineering

📌 What is it?

Writing smart input (prompt) to get correct output from LLM

🎯 Example

❌ Bad Prompt: Tell me about milk

✅ Good Prompt: You are a shop assistant.

Extract product name and quantity: Input: "2 milk and 1 bread"

Output JSON:

👉 Output becomes structured:

{ "milk": 2, "bread": 1 }

RAG (Retrieval Augmented Generation)

Overview

Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language models (LLMs) by allowing them to retrieve relevant external information before generating a response.

Instead of relying only on pre-trained knowledge, RAG enables models to access up-to-date, domain-specific, and private data sources, making responses more accurate and context-aware.

RAG is widely used in:

Customer support chatbots
Healthcare report summarization
Legal and compliance systems
Financial analysis tools
Enterprise knowledge search systems

Why RAG is Important

Traditional LLMs like GPT-style models generate responses based only on training data. This creates limitations:

Limitations of standard LLMs

Knowledge cutoff (no real-time updates)
Hallucinations (false or made-up answers)
No access to private company data
Lack of personalization/context

RAG solves these problems by:

Fetching real-time relevant data
Grounding answers in actual documents
Reducing hallucinations
Keeping knowledge updated without retraining

Real-Life Example

Imagine two students preparing for an exam:

Student 1 (LLM only)

Reads books once
Answers from memory only
Cannot verify facts

Student 2 (RAG system)

Reads books
Can open books during exam
Verifies answers in real-time

Student 2 performs better because they can retrieve information when needed.

Key Benefits of RAG

Reduces Hallucination

LLMs generate more factual and grounded responses. Some LLM is more overconfitent 
and giving wrong response. In LLM we can not verified data but in RAG we can verified
then respose will be more grounded and real data.

Keeps Knowledge Updated

 Works with real-time and dynamic data sources. In LLM there is a knowlege cutoff date 
 mean till training date all info present. But by using RAG we can use current data or uptodate.

Cost Efficient

 Avoids expensive retraining or fine-tuning of models mean new data then we can give access 
 by using RAG without training and finetuning.

Data Privacy

Sensitive enterprise data stays within controlled systems. Becuase our not access whole data same time for particular query 
it is fetching/access only data. Suppose big company do not want to give full access to model. So he can control by using 
RAG.

Context Awareness

Personalized responses using user-specific data.

Example:

     Airline chatbot knows your booking details (PNR, flight time, delay status)

RAG Architecture Overview

RAG consists of two major pipelines:

1. Ingestion Pipeline

This prepares data for retrieval.

Steps:

Data Collection
- PDFs
- Web pages
- Databases
- Excel files
- APIs
Chunking
- Large documents are split into smaller pieces (chunks)

Types of chunking:

Fixed-size chunking
Hierarchical chunking
Semantic chunking
Embedding Generation
Text is converted into numerical vectors using embedding models.

Popular embedding tools:

OpenAI Embeddings
Google Gemini Embeddings
Sentence Transformers (Hugging Face)
Vector Database Storage
Embeddings are stored in vector databases such as:
Pinecone
ChromaDB
FAISS
Elasticsearch

Vector databases enable semantic search (meaning-based search), not just keyword matching.

2. Retrieval Pipeline

This handles user queries.

Steps:

User Query Input
```
User asks a question.
```
Query Embedding
```
Query is converted into vector form.
```

Similarity Search

System finds the most relevant document chunks from the vector database.

Context Creation
```
Retrieved data is used as context.
```
Augmentation
```
Prompt is enriched with:
```
User query
```
Retrieved context
```

Generation

LLM generates final response using grounded context.

Combined flow

Why It is Called RAG

Retrieval → Fetch relevant data
Augmented → Add context to prompt
Generation → Produce final AI response

RAG Technologies & Tools

Frameworks

1. LangChain

     LangChain = LEGO toolkit for AI apps

What it does
- Connects LLMs
- Connects vector databases
- Handles prompts
- Creates AI agents
- Builds chatbots
Features
- RAG pipelines
- Memory
- Tools
- Agents
- Chains
Best For
- Production AI apps
- Advanced workflows
- AI agents

Example

    from langchain.chains import RetrievalQA

2. LlamaIndex

Focused mainly on connecting data to LLMs.

What it does
- Reads PDFs
- Reads databases
- Creates indexes
- Retrieves data efficiently
Best For
- RAG projects
- Document Q&A
- Fast data ingestion

Example

from llama_index import VectorStoreIndex

3. Haystack

Enterprise-focused RAG framework.

What it does
- Search pipelines
- Question answering
- Document retrieval
Best For
- Enterprise search
- Large-scale systems
- Easy Analogy
  
  Haystack = Enterprise Google Search for AI

Embedding Models

Embeddings convert text into numbers/vectors.

AI understands meaning using vectors.

Easy Example

  "I love cricket"
         ↓
  [0.23, 0.91, 0.44, ...]

  Similar sentences have similar vectors.

1. OpenAI text-embedding models

       Popular cloud embedding models from OpenAI.

Examples
- text-embedding-3-small
- text-embedding-3-large
Features
- High quality
- Accurate semantic search
- API-based
Best For
- Production apps
- High accuracy RAG
Drawback
- Paid API

2. Gemini embeddings

Embedding models from Google.

Features
- Good multilingual support
- Integrated with Gemini ecosystem
Best For
- Google Cloud users

3. Sentence Transformers

Open-source embedding models.

Built using Transformers.

Popular Models
- all-MiniLM-L6-v2
- bge-small
- mpnet
Features
- Free
- Runs locally
- Fast
Best For
- Learning RAG
- Local AI apps
- Offline systems *Easy Analogy
Sentence Transformers = Free local embedding engine

Vector Databases

Stores embeddings/vectors. Used for similarity search.

Easy Analogy

Normal Database:

   Search by exact keyword

Vector Database:

   Search by meaning

Example:

Question:

  "How to reset password?"
   It can also find:
  "Forgot password steps"

because meanings are similar.

Pinecone Cloud vector database.
- Features
  - Fully managed
  - Fast similarity search
  - Scalable
- Best For
  - Production systems
  - Enterprise AI
- Drawback
  - Paid service

Easy Analogy

   Pinecone = Cloud storage for AI memory

FAISS Created by Meta.
- Features
  - Very fast
  - Local vector search
  - Open source
- Best For
  - Learning
  - Local applications
  - High-performance search
  - Drawback
  - No built-in cloud/database features

Easy Analogy

   FAISS = Fast local vector engine

ChromaDB Simple vector database for beginners.
- Features
  - Easy setup
  - Python-friendly
  - Lightweight
- Best For
  - Small projects
  - Beginners
  - Prototypes

Easy Analogy

   ChromaDB = SQLite for vectors

Elasticsearch. Traditional search engine with vector search support.
- Features
  - Keyword search
  - Vector search
  - Hybrid search
- Best For
  - Enterprise search systems
  - Large-scale applications

Easy Analogy

 Elasticsearch = Google-like search engine with AI support

RAG Chunking Strategies

Fixed Size Chunking
- Splits text every fixed number of tokens
- Simple but may break context
Hierarchical Chunking
- Based on paragraphs, sentences, sections
- More structured and production-friendly
Semantic Chunking
- Splits based on meaning/topic changes
- High quality but computationally expensive

Types of RAG Architectures

1. Standard RAG

The baseline Retrieval-Augmented Generation setup.

     A user query is embedded, relevant documents are retrieved from a vector
     database, and the LLM generates an answer grounded in that context.

Example

   User: “What are the side effects of ibuprofen?”

System:

Retrieves medical documents
Feeds them to the LLM
Generates a grounded answer

Usage

FAQ bots
Knowledge base assistants
Documentation search
Customer support

Best when:

Knowledge is static or slow-changing
No multi-step reasoning required

2. Hybrid RAG

Combines neural methods (embeddings/LLMs) with symbolic or rule-based systems (knowledge graphs, logic rules).

It enables:

structured reasoning
better interpretability
improved factual consistency

Example

    User: “Who is the CEO of the company that owns Instagram?”

System:

Uses knowledge graph to resolve relationships
Uses LLM for explanation
Produces accurate, structured answer

Usage

Enterprise knowledge systems
Compliance / regulated domains
Complex relational queries

Best when:

Structured + unstructured data both matter
Logical reasoning is required

3. RAG with Memory

Extends RAG by incorporating past interactions or stored user context into retrieval.

The system remembers:
- conversation history
- user preferences
- past queries
Memory can be:
- short-term (chat history)
- long-term (stored embeddings or profiles)
Example
- User:
  - Q1: “I’m a vegetarian.”
  - Q2: “Suggest high-protein foods.”
System:
- Retrieves nutrition docs
- Also recalls user preference (vegetarian)
- Filters answer accordingly
Usage
- Personal assistants
- Chatbots with continuity
- Recommendation systems
- Learning companions
Best when:
- Personalization matters
- Conversations span multiple turns
- Context evolves over time

4. Graph RAG

Uses knowledge graphs instead of chunks

Nodes = entities Edges = relationships

Best for:

Fraud detection
Legal systems
Research platforms

5. Agentic RAG

RAG + autonomous decision-making. The system behaves like an agent that

can decide:
- what to retrieve
- whether to retrieve
- which tools to call
- how many steps to take

It’s iterative and dynamic rather than being a single pass.

Example

User: “Compare Tesla and BYD revenue growth over the last 3 years.”

Agent:
1. Decides to fetch financial data
2. Calls APIs / retrieves reports
3. May refine query multiple times
4. Aggregates + reasons
5. Produces final comparison
Usage
- Research assistants
- Financial analysis
- Complex Q&A requiring multiple sources
- Autonomous workflows
Best when:
- Multi-step reasoning & Query decomposition is needed
- External tools/APIs are involved

6. Multimodal RAG

Works with multiple data types:

Text
Images
Audio
Video

Use cases:

Medical imaging (X-rays)
Surveillance systems
Audio transcription systems

7. Self-RAG (Reflective RAG)

The model evaluates its own output and retrieval quality.

It can:
- decide whether retrieval is needed
- critique retrieved documents
- refine or re-retrieve before answering
Adds a self-feedback loop inside RAG.

Example

User: “Explain quantum entanglement simply.”
System:
- Retrieves docs
- Generates answer
- Checks: “Is this clear? sufficient?”
- If not → retrieves better sources → regenerates
Usage
- High-accuracy QA systems
- Research assistants
- Domains needing hallucination control
- Scientific / legal applications
Best when:
- Answer quality matters more than speed
- You want built-in validation

8. Adaptive RAG

The system adapts retrieval strategy based on query complexity.

Not every query is treated the same:

simple => no retrieval or light retrieval
complex => multi-step retrieval

Example

User:
- “Capital of France?” => direct answer (no retrieval)
- “Impact of inflation on emerging markets?” => deep retrieval
Usage
- Cost-optimized systems
- Scalable chatbots
- Mixed workloads (simple + complex queries)
Best when:
- You want efficiency + intelligence
- Handling large query volumes

9. Corrective RAG

Focuses on fixing bad retrievals.

The system:
- detects irrelevant or low-quality documents
- filters or re-ranks them
- may re-query before answering

It improves retrieval robustness, not just generation.

Example
- User: “Explain CRISPR gene editing.”
System:
- Retrieves mixed-quality docs
- Detects irrelevant ones
- Filters + re-retrieves
- Produces accurate answer
Usage
- Noisy or unstructured data sources
- Enterprise search systems
- Systems with inconsistent indexing
Best when:
- Retrieval quality is unreliable
- Data sources are messy

10. Attention-based RAG

Uses attention mechanisms to prioritize the most relevant parts of retrieved documents.

Instead of treating all retrieved chunks equally, the model:
- assigns weights to different passages
- focuses on high-signal content during generation
Example

User: “Causes of climate change?”

System:
- Retrieves multiple documents
- Uses attention to emphasize key sections (e.g., greenhouse gases)
- Generates answer based on most relevant snippets
Usage
- Long-document QA
- Summarization systems
- Legal / research analysis
Best when:
- Retrieved context is large or noisy
- Fine-grained relevance matters

11. Cost-Constrained RAG

Optimizes RAG pipelines under cost constraints (tokens, API calls, latency).

The system dynamically:
- limits retrieval depth
- chooses cheaper models when possible
- balances cost vs quality

Example

User: “Summarize this document.”

System:
- Checks budget
- Uses fewer retrieved chunks or smaller model
- Produces acceptable answer within cost limits
Usage
- Production systems at scale
- SaaS AI products
- High-volume query environments
Best when:
- Cost efficiency is critical
- Trade-offs between quality and expense are needed

12. XAI RAG (Explainable AI RAG)

Focuses on making RAG outputs explainable.

The system:
- shows retrieved sources
- provides reasoning traces
- justifies answers with evidence

Example

User: “Why was this loan rejected?”

System:
- Retrieves policy documents
- Generates answer
Explains decision with cited rules and reasoning
Usage
- Finance / healthcare
- Legal systems
- Auditable AI applications
Best when:
- Transparency is mandatory
- Decisions must be

RAG is a powerful AI architecture that bridges the gap between:

Static knowledge (LLMs) Dynamic real-world data (databases, APIs, documents)

It enables:

Smarter AI systems Context-aware responses Enterprise-ready AI applications Final Thought

RAG is not a single technology—it is a design pattern that combines retrieval systems and generative AI to create intelligent, real-world-ready applications.

Home

Prompt Engineering

📌 What is it?

🎯 Example

RAG (Retrieval Augmented Generation)

Overview

RAG is widely used in:

Why RAG is Important

Limitations of standard LLMs

RAG solves these problems by:

Real-Life Example

Student 1 (LLM only)

Student 2 (RAG system)

Key Benefits of RAG

RAG Architecture Overview

RAG consists of two major pipelines:

1. Ingestion Pipeline

2. Retrieval Pipeline

Combined flow

Why It is Called RAG

RAG Technologies & Tools

Frameworks

1. LangChain

2. LlamaIndex

3. Haystack

Embedding Models

Vector Databases

RAG Chunking Strategies

Types of RAG Architectures

1. Standard RAG

2. Hybrid RAG

3. RAG with Memory

4. Graph RAG

5. Agentic RAG

6. Multimodal RAG

7. Self-RAG (Reflective RAG)

8. Adaptive RAG

9. Corrective RAG

10. Attention-based RAG

11. Cost-Constrained RAG

12. XAI RAG (Explainable AI RAG)

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally