## RAG basics (Retrieval Augmented Generation) :

* RAG ek technique hai jisme LLM ko sirf apni training memory par depend nahi rehna padta balki external documents se relevant information retrieve karke answer generate karta hai.

* Isme do main steps hote hain retrieval aur generation. Pehle system relevant data dhundhta hai phir LLM us data ko use karke final response banata hai.

* RAG hallucination problem ko kam karta hai kyunki model guess nahi karta balki actual documents se answer deta hai.

* Ye approach dynamic knowledge ke liye best hai jaise PDFs, notes, company data, policies jo model ke training ke baad aaye hote hain.

* LLMs static hote hain lekin real world data dynamic hota hai. RAG is gap ko fill karta hai.

* Ye enterprise AI systems ka base hai jaha private data ko securely use karna hota hai.

* RAG accuracy aur trust dono improve karta inspired by search + generation model.

## Embeddings (RAG Context) :

* Embeddings text ko dense numerical vectors me convert karte hain jisme meaning store hota hai.

* Similar meaning wale text ke vectors close hote hain aur different meaning wale door.

* RAG me embeddings ka use documents aur query dono ko same vector space me lane ke liye hota hai.

* Ye semantic search possible banata hai jaha exact word match nahi balki meaning match hota hai.

* Embeddings bina keywords ke relevant content dhundhne me madad karti hain.

* Large text data ko efficiently compare karna possible hota hai.

* Ye RAG ke retrieval step ka backbone hota hai.

In [11]:
from langchain_community.embeddings import HuggingFaceEmbeddings

# Embedding model load
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# Sample text
text = "LangChain helps build LLM applications"

# Text ko vector me convert
vector = embeddings.embed_query(text)

print(vector)

[-0.014138972386717796, -0.024635283276438713, 0.05002330616116524, -0.11973728984594345, 0.008941663429141045, -0.03222166746854782, -0.051894813776016235, 0.062491536140441895, 0.025787075981497765, -0.021769948303699493, -0.02081720158457756, -0.005019721109420061, 0.011507866904139519, -0.0011709253303706646, 0.07824187725782394, 0.035470061004161835, 0.008089656010270119, 0.08329253643751144, 0.04782746732234955, -0.07897520065307617, -0.021960780024528503, -0.04138180613517761, 0.04501473158597946, 0.05675424262881279, 0.06118106096982956, -0.07703030854463577, 0.043954066932201385, 0.03707760572433472, 0.0894288718700409, -0.02595287747681141, 0.008475967682898045, 0.1328008770942688, -0.05419498682022095, 0.0015967212384566665, -0.09040915220975876, 0.09495869278907776, 0.004634810145944357, -0.024181600660085678, -0.03745850548148155, -0.0740288719534874, -0.040495630353689194, 0.010981051251292229, 0.06296328455209732, -0.027635274454951286, 0.029761357232928276, -0.045458603

## Vector Stores :

* Vector store ek database hota hai jaha embeddings store ki jaati hain.

* Ye similarity search ke liye optimized hota hai jisse fast retrieval possible hota hai.

* Vector store query ke embedding ko stored embeddings se compare karta hai.

* Popular vector stores FAISS, Chroma, Pinecone hote hain.

* why in AI/ML :

    - Simple list comparison slow hota hai jab data bada ho jata hai.

    - Vector store millions of embeddings ko efficiently handle karta hai.

    - Ye real time RAG systems ke liye critical hota hai.

In [None]:
from langchain_community.vectorstores import FAISS


texts = [
    "LangChain is a framework for LLM apps",
    "RAG improves LLM accuracy",
    "Embeddings capture text meaning"
]

# Vector store create
vector_store = FAISS.from_texts(texts , embeddings)

# Similar search
results = vector_store.similarity_search("What is RAG?")

print(results[0].page_content)

## Document Loaders :

* Document loaders raw files ko readable text me convert karte hain.

* Ye PDFs, TXT, DOCX, HTML jaise formats support karte hain.

* Loader text ko chunks me todta hai taaki embeddings efficient bane.

* Ye RAG pipeline ka entry point hota hai.

* Real world data files ke form me hota hai, direct text nahi.

* Large documents ko manageable pieces me todna zaroori hota hai.

* Chunking se context loss kam hota hai.

In [5]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("test.txt") # Add txt file
docs = loader.load()
# file ka content text form me load hua

print(docs[0].page_content)

Hello I am sagar , and i am testing how Text loader work in langchain


## Simple RAG Pipeline :

* Simple RAG pipeline me document load hota hai phir embeddings banti hain.

* Embeddings vector store me save hoti hain.

* User query ko embedding me convert karke similar documents retrieve hote hain.

* Retrieved text ko LLM ke prompt me add karke final answer generate hota hai.

* Ye end to end intelligent system banata hai.

* Private aur updated data ke sath LLM use karna possible hota hai.

* Ye real applications jaise chatbot, knowledge base, QnA systems ka base hai.

In [None]:
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.llms import OpenAI
from langchain_community.chains import RetrievalQA

# Step 1: Load documents
loader = TextLoader("test.txt")
documents = loader.load()

# Step 2: Create embeddings
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# Step 3: Vector store
vector_store = FAISS.from_documents(documents, embeddings)

# Step 4: Load LLM
llm = OpenAI(temperature=0.2)

# Step 5: RAG chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vector_store.as_retriever()
)

# Step 6: Ask question
response = qa_chain.run("Explain RAG in simple words")
print(response)
