In [14]:
#Requirements
!pip install faiss-cpu
!pip install langchain-community
!pip install sentence-transformers



In [15]:
import os
import google.generativeai as genai
from langchain.vectorstores import FAISS # This will be the vector database
from langchain_community.embeddings import HuggingFaceEmbeddings # To perform word embeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter # This is for chunking
from pypdf import PdfReader
import faiss 

## Step 1: Configure the Model

In [16]:
# Configure LLM
key  = os.getenv('GOOGLE_API_KEY')
genai.configure(api_key = key)
llm_model = genai.GenerativeModel('gemini-2.5-flash-lite')

In [17]:
# Configure Embedding Model
embedding_model = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')

## Step 2: Loading the PDF File

In [19]:
loaded_file = PdfReader('RAG_CHATBOT.pdf')

In [20]:
raw_text = ''
for page in loaded_file.pages:
    text_only = page.extract_text()
    if text_only:
        raw_text += text_only

In [21]:
print(raw_text)

Case Study: RAG Chatbot Powered by Google 
Gemini for Smart Document Q&A 
Project Title: Intelligent Document Q&A Assistant using Retrieval-Augmented Generation 
(RAG) with Gemini 
GitHub Repository: https://github.com/mukul-mschauhan/RAG-Using-Gemini 
Live Demo: https://gemini-rag2025.streamlit.app/ 
 
Problem Statement 
Across industries such as legal, finance, healthcare, and construction, professionals are 
required to extract insights from massive document repositories—contracts, product 
manuals, policies, reports, regulations, and emails. 
Traditional keyword-based search and static FAQs fail to deliver contextual, accurate 
answers. Employees waste hours scanning PDFs and notes, leading to operational 
inefficiencies, poor decision-making, and knowledge silos. 
There’s a critical need for an intelligent assistant that can understand natural language 
questions, reason over domain-specific documents, and deliver precise responses—
instantly. 
 
Business Objective 
To build an en

## Step 3: Chunking (Create Chunks)

In [22]:
splitter = RecursiveCharacterTextSplitter(chunk_size = 300, chunk_overlap = 50)
chunks = splitter.split_text(raw_text)

In [23]:
len(chunks)

16

In [24]:
print(chunks[0])

Case Study: RAG Chatbot Powered by Google 
Gemini for Smart Document Q&A 
Project Title: Intelligent Document Q&A Assistant using Retrieval-Augmented Generation 
(RAG) with Gemini 
GitHub Repository: https://github.com/mukul-mschauhan/RAG-Using-Gemini


## Step 4: Create FAISS Vector Store

In [25]:
vector_store = FAISS.from_texts(chunks, embedding_model)

## Step 5: Configure Retriever

In [26]:
retriever = vector_store.as_retriever(search_kwargs={'k':3})

## Step 6: Take the Query

In [27]:
query = 'Show me the steps to proceed with this project.'

## Step 7: Retrieval (R)

In [28]:
retrived_documents = retriever.get_relevant_documents(query=query)

  retrived_documents = retriever.get_relevant_documents(query=query)


In [29]:
context = ' '.join([doc.page_content for doc in retrived_documents])
context

'Live Demo: https://gemini-rag2025.streamlit.app/ \n \nProblem Statement \nAcross industries such as legal, finance, healthcare, and construction, professionals are \nrequired to extract insights from massive document repositories—contracts, product \nmanuals, policies, reports, regulations, and emails. 4. Ask questions in natural language \n5. Get contextual answers generated by Google Gemini 1.5 Flash using the retrieved \ndocuments \n \nArchitecture Overview \n1. Frontend: Streamlit web UI for uploading files and chat interface Case Study: RAG Chatbot Powered by Google \nGemini for Smart Document Q&A \nProject Title: Intelligent Document Q&A Assistant using Retrieval-Augmented Generation \n(RAG) with Gemini \nGitHub Repository: https://github.com/mukul-mschauhan/RAG-Using-Gemini'

## Step 8: Write a Augmentation Prompt (A)

In [30]:
prompt = f'''You are a helpful assistant using RAG
Here is the {context}

The query asked by the user is as follows = {query}'''

## Step 9: Generation (G)

In [31]:
print(llm_model.generate_content(prompt).text)

Here are the steps to proceed with the Intelligent Document Q&A Assistant project, based on the provided information:

**1. Project Setup and Environment:**

*   **Clone the Repository:** Start by cloning the GitHub repository: `https://github.com/mukul-mschauhan/RAG-Using-Gemini` to your local machine.
*   **Set up a Python Environment:** It's highly recommended to use a virtual environment (e.g., `venv` or `conda`) to manage project dependencies.
*   **Install Dependencies:** Install all necessary Python packages listed in the project's `requirements.txt` file. You can usually do this with `pip install -r requirements.txt`.

**2. Understanding the Core Components:**

*   **Frontend (Streamlit):**
    *   Familiarize yourself with the Streamlit code. This will handle:
        *   User interface for file uploads (documents).
        *   The chat interface for asking questions.
        *   Displaying the generated answers.
*   **Backend (RAG - Retrieval-Augmented Generation):**
    *   