# RAG System
Imagine you're asking a super smart friend (that's the Large Language Model or LLM) a question. A RAG system is like giving your super smart friend a quick way to look things up before answering.
How RAG works:

1.  **Documents are Chunked and Embedded:** Your knowledge base is broken into small pieces, and each piece is converted into a numerical "meaning" representation.
2.  **Embeddings are Stored:** These numerical representations are then saved in a special database designed for quick similarity searches.
3.  **User Submits a Query:** You ask your question to the RAG system.
4.  **Query is Embedded:** Your question is also converted into a numerical "meaning" representation.
5.  **Relevant Chunks are Retrieved:** The system searches its database for document chunks whose meanings are most similar to your query's meaning.
6.  **Context is Formed:** The retrieved relevant text chunks are then added to your original query, creating an enriched prompt.
7.  **LLM Generates Answer:** A Large Language Model uses this enriched prompt to provide a factual and comprehensive response.

### RAG application built on gemini 

In [None]:
# install required packages
! pip install langchain langchain-community langchain-google-genai python-dotenv streamlit langchain-experimental sentence-transformers langchain-chroma langchainhub pypdf rapidocr-onnxruntime
! pip install faiss-cpu


: 

In [None]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("mypdf.pdf")
data = loader.load()  
#data

In [None]:
len(data)

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# split data
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
docs = text_splitter.split_documents(data)


print("Total number of documents: ",len(docs))

In [None]:
docs[7]

In [None]:
data

In [None]:
data[1]

### Get an API key: 
Head to https://ai.google.dev/gemini-api/docs/api-key to generate a Google AI API key. Paste in .env file

Embedding models: https://python.langchain.com/v0.1/docs/integrations/text_embedding/


**A vector database** stores data as numerical "vectors" to enable semantic search, finding items based on meaning rather than keywords.   
**ChromaDB** is an easy-to-use, open-source vector database that simplifies storing and querying these embeddings, with built-in persistence for your data.   
**FAISS** is a high-performance library for efficient similarity search and clustering of dense vectors, widely used for building scalable vector search applications.

In [None]:
from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import FAISS

from dotenv import load_dotenv
load_dotenv() 


embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector = embeddings.embed_query("hello, world!")
vector
#vector

In [None]:
vectorstore = FAISS.from_documents(documents=docs, embedding=embeddings)
# vectorstore = Chroma.from_documents(documents=docs, embedding=GoogleGenerativeAIEmbeddings(model="models/embedding-001"))

The `retriever` enables efficient semantic search by retrieving the most relevant document chunks from the vectorstore based on the user's query.

In [None]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 10})

retrieved_docs = retriever.invoke("Location of Himalayan Engineering College?")


In [None]:
len(retrieved_docs)

In [None]:
data[7]

In [None]:
print(retrieved_docs[5].page_content)

### Langchain 
LangChain is an open-source framework designed to simplify the development of applications powered by large language models (LLMs). It provides tools for connecting LLMs to external data sources, enabling retrieval-augmented generation, and building advanced conversational and reasoning workflows.

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash",temperature=0.3, max_tokens=500)

**Prompt engineering** is the practice of crafting effective inputs (prompts) to guide large language models (LLMs) toward producing accurate, relevant, and useful outputs.  
It involves experimenting with wording, structure, and context to optimize model responses for specific tasks or applications.  
Good prompt engineering can significantly improve the quality and reliability of LLM-powered solutions.

In [None]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

In [None]:
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [None]:
response = rag_chain.invoke({"input": "who is the principle of Himalayan Engineering College?"})
print(response["answer"])

In [None]:
response = rag_chain.invoke({"input": "WELL TELL  ME ABOUT THE COURSE OF COMPUTER ENGINEERIING"})
print(response["answer"])

In [None]:
response = rag_chain.invoke({"input": "WHAT IS HCOE POPULAR FOR"})
print(response["answer"])