-
Notifications
You must be signed in to change notification settings - Fork 0
Home
- Prompt Engineering
- RAG (Retrieval Augmented Generation)
- Embeddings + Vector
- DB Function Calling / Tools
Writing smart input (prompt) to get correct output from LLM
❌ Bad Prompt: Tell me about milk
✅ Good Prompt: You are a shop assistant.
Extract product name and quantity: Input: "2 milk and 1 bread"
Output JSON:
👉 Output becomes structured:
{ "milk": 2, "bread": 1 }
Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language models (LLMs) by allowing them to retrieve relevant external information before generating a response.
Instead of relying only on pre-trained knowledge, RAG enables models to access up-to-date, domain-specific, and private data sources, making responses more accurate and context-aware.
- Customer support chatbots
- Healthcare report summarization
- Legal and compliance systems
- Financial analysis tools
- Enterprise knowledge search systems
Traditional LLMs like GPT-style models generate responses based only on training data. This creates limitations:
- Knowledge cutoff (no real-time updates)
- Hallucinations (false or made-up answers)
- No access to private company data
- Lack of personalization/context
- Fetching real-time relevant data
- Grounding answers in actual documents
- Reducing hallucinations
- Keeping knowledge updated without retraining
Imagine two students preparing for an exam:
- Reads books once
- Answers from memory only
- Cannot verify facts
- Reads books
- Can open books during exam
- Verifies answers in real-time
Student 2 performs better because they can retrieve information when needed.
-
Reduces Hallucination
LLMs generate more factual and grounded responses. Some LLM is more overconfitent and giving wrong response. In LLM we can not verified data but in RAG we can verified then respose will be more grounded and real data. -
Keeps Knowledge Updated
Works with real-time and dynamic data sources. In LLM there is a knowlege cutoff date mean till training date all info present. But by using RAG we can use current data or uptodate. -
Cost Efficient
Avoids expensive retraining or fine-tuning of models mean new data then we can give access by using RAG without training and finetuning. -
Data Privacy
Sensitive enterprise data stays within controlled systems. Becuase our not access whole data same time for particular query it is fetching/access only data. Suppose big company do not want to give full access to model. So he can control by using RAG. -
Context Awareness
Personalized responses using user-specific data.
Example:
Airline chatbot knows your booking details (PNR, flight time, delay status)
This prepares data for retrieval.
Steps:
- Data Collection
- PDFs
- Web pages
- Databases
- Excel files
- APIs
- Chunking
- Large documents are split into smaller pieces (chunks)
Types of chunking:
- Fixed-size chunking
- Hierarchical chunking
- Semantic chunking
- Embedding Generation
- Text is converted into numerical vectors using embedding models.
Popular embedding tools:
- OpenAI Embeddings
- Google Gemini Embeddings
- Sentence Transformers (Hugging Face)
- Vector Database Storage
- Embeddings are stored in vector databases such as:
- Pinecone
- ChromaDB
- FAISS
- Elasticsearch
Vector databases enable semantic search (meaning-based search), not just keyword matching.
This handles user queries.
Steps:
-
User Query Input
User asks a question. -
Query Embedding
Query is converted into vector form. -
Similarity Search
System finds the most relevant document chunks from the vector database. -
Context Creation
Retrieved data is used as context. -
Augmentation
Prompt is enriched with: -
User query
Retrieved context -
Generation
LLM generates final response using grounded context.
- Retrieval → Fetch relevant data
- Augmented → Add context to prompt
- Generation → Produce final AI response