Home

Prompt Engineering
RAG (Retrieval Augmented Generation)
Embeddings + Vector
DB Function Calling / Tools

Prompt Engineering

📌 What is it?

Writing smart input (prompt) to get correct output from LLM

🎯 Example

❌ Bad Prompt: Tell me about milk

✅ Good Prompt: You are a shop assistant.

Extract product name and quantity: Input: "2 milk and 1 bread"

Output JSON:

👉 Output becomes structured:

{ "milk": 2, "bread": 1 }

RAG (Retrieval Augmented Generation)

Overview

Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language models (LLMs) by allowing them to retrieve relevant external information before generating a response.

Instead of relying only on pre-trained knowledge, RAG enables models to access up-to-date, domain-specific, and private data sources, making responses more accurate and context-aware.

RAG is widely used in:

Customer support chatbots
Healthcare report summarization
Legal and compliance systems
Financial analysis tools
Enterprise knowledge search systems

Why RAG is Important

Traditional LLMs like GPT-style models generate responses based only on training data. This creates limitations:

Limitations of standard LLMs

Knowledge cutoff (no real-time updates)
Hallucinations (false or made-up answers)
No access to private company data
Lack of personalization/context

RAG solves these problems by:

Fetching real-time relevant data
Grounding answers in actual documents
Reducing hallucinations
Keeping knowledge updated without retraining

Real-Life Example

Imagine two students preparing for an exam:

Student 1 (LLM only)

Reads books once
Answers from memory only
Cannot verify facts

Student 2 (RAG system)

Reads books
Can open books during exam
Verifies answers in real-time

Student 2 performs better because they can retrieve information when needed.

Key Benefits of RAG

Reduces Hallucination

LLMs generate more factual and grounded responses. Some LLM is more overconfitent 
and giving wrong response. In LLM we can not verified data but in RAG we can verified
then respose will be more grounded and real data.

Keeps Knowledge Updated

 Works with real-time and dynamic data sources. In LLM there is a knowlege cutoff date 
 mean till training date all info present. But by using RAG we can use current data or uptodate.

Cost Efficient

 Avoids expensive retraining or fine-tuning of models mean new data then we can give access 
 by using RAG without training and finetuning.

Data Privacy

Sensitive enterprise data stays within controlled systems. Becuase our not access whole data same time for particular query 
it is fetching/access only data. Suppose big company do not want to give full access to model. So he can control by using 
RAG.

Context Awareness

Personalized responses using user-specific data.

Example:

     Airline chatbot knows your booking details (PNR, flight time, delay status)

RAG Architecture Overview

RAG consists of two major pipelines:

1. Ingestion Pipeline

This prepares data for retrieval.

Steps:

Data Collection
- PDFs
- Web pages
- Databases
- Excel files
- APIs
Chunking
- Large documents are split into smaller pieces (chunks)

Types of chunking:

Fixed-size chunking
Hierarchical chunking
Semantic chunking
Embedding Generation
Text is converted into numerical vectors using embedding models.

Popular embedding tools:

OpenAI Embeddings
Google Gemini Embeddings
Sentence Transformers (Hugging Face)
Vector Database Storage
Embeddings are stored in vector databases such as:
Pinecone
ChromaDB
FAISS
Elasticsearch

Vector databases enable semantic search (meaning-based search), not just keyword matching.

2. Retrieval Pipeline

This handles user queries.

Steps:

User Query Input
```
User asks a question.
```
Query Embedding
```
Query is converted into vector form.
```

Similarity Search

System finds the most relevant document chunks from the vector database.

Context Creation
```
Retrieved data is used as context.
```
Augmentation
```
Prompt is enriched with:
```
User query
```
Retrieved context
```

Generation

LLM generates final response using grounded context.

Combined flow

Why It is Called RAG

Retrieval → Fetch relevant data
Augmented → Add context to prompt
Generation → Produce final AI response

RAG Technologies & Tools

Frameworks

LangChain
LlamaIndex
Haystack

Embedding Models

OpenAI text-embedding models
Gemini embeddings
Sentence Transformers

Vector Databases

Pinecone
FAISS
ChromaDB
Elasticsearch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Prompt Engineering

📌 What is it?

🎯 Example

RAG (Retrieval Augmented Generation)

Overview

RAG is widely used in:

Why RAG is Important

Limitations of standard LLMs

RAG solves these problems by:

Real-Life Example

Student 1 (LLM only)

Student 2 (RAG system)

Key Benefits of RAG

RAG Architecture Overview

RAG consists of two major pipelines:

1. Ingestion Pipeline

2. Retrieval Pipeline

Combined flow

Why It is Called RAG

RAG Technologies & Tools

Frameworks

Embedding Models

Vector Databases

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally