<a href="https://colab.research.google.com/github/Sayandip2023/LLM-RAG-Agent-Projects-2025-/blob/main/Q%26A_Agent_alongwith_Document_Context_With_Gemini.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Installing the necessary libraries

In [2]:
!pip install pdfplumber python-docx google-generativeai google-search-results

Collecting pdfplumber
  Downloading pdfplumber-0.11.5-py3-none-any.whl.metadata (42 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/42.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.5/42.5 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting python-docx
  Downloading python_docx-1.1.2-py3-none-any.whl.metadata (2.0 kB)
Collecting google-search-results
  Downloading google_search_results-2.4.2.tar.gz (18 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pdfminer.six==20231228 (from pdfplumber)
  Downloading pdfminer.six-20231228-py3-none-any.whl.metadata (4.2 kB)
Collecting pypdfium2>=4.18.0 (from pdfplumber)
  Downloading pypdfium2-4.30.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (48 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.2/48.2 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
Downloading pdfplumber-0.11.5-py3-none-any.wh

## Importing The Libraries

In [23]:
import os
import google.generativeai as genai
from google.colab import files
import pdfplumber
import docx

## API Key Configuration

In [24]:
os.environ["GOOGLE_GENERATIVE_AI_API_KEY"] = ""

## Model Initialization

In [25]:
genai.configure(api_key=os.getenv("GOOGLE_GENERATIVE_AI_API_KEY"))
model = genai.GenerativeModel("gemini-1.5-flash")

## Helper Functions

In [26]:
def extract_text(file_name):
    """Extract text from a document based on its file type."""
    try:
        if file_name.endswith(".pdf"):
            with pdfplumber.open(file_name) as pdf:
                return "\n".join([page.extract_text() for page in pdf.pages if page.extract_text()])
        elif file_name.endswith(".docx"):
            doc = docx.Document(file_name)
            return "\n".join([para.text for para in doc.paragraphs])
        elif file_name.endswith(".txt"):
            with open(file_name, "r", encoding="utf-8") as f:
                return f.read()
        else:
            return "⚠️ Unsupported file format."
    except Exception as e:
        return f"⚠️ Error extracting text: {e}"


def ask_gemini(question, document_text, max_context_chars=5000):
    """Answer a question using Gemini, incorporating document context."""
    if not document_text.strip():
        return "⚠️ No text extracted from the document."

    context = document_text[:max_context_chars]  # Limit context size

    prompt = f"""You are an expert AI assistant. Use ONLY the following information to answer the user's question.
    If the information isn't in the provided context, say you don't know. Be concise.

    --- Context ---
    {context}
    --- End Context ---

    Question: {question}
    Answer:"""

    try:
        response = model.generate_content(prompt)
        return response.text
    except Exception as e:
        return f"⚠️ Error generating response: {e}"

## Main Program Flow

In [28]:
def main():
    """Main function to orchestrate the Q&A agent."""

    print("📂 Please upload a document (PDF, DOCX, or TXT)")
    uploaded = files.upload()

    if not uploaded:
        print("⚠️ No file uploaded. Exiting.")
        return

    file_name = list(uploaded.keys())[0]
    print(f"⏳ File '{file_name}' Uploaded!")

    document_text = extract_text(file_name)
    if "⚠️" in document_text:  # Check for extraction errors
        print(document_text)
        return

    print("✅ Text extracted!")

    while True:
        user_question = input("\n🔍 Ask a question about the document (or type 'quit' to exit):\n⬇️ ").strip()

        if user_question.lower() == "quit":
            print("👋 Goodbye!")
            break

        answer = ask_gemini(user_question, document_text)
        print("\n🤖 AI Response:\n", answer)

if __name__ == "__main__":
    main()

📂 Please upload a document (PDF, DOCX, or TXT)


Saving Sayandip_Bhattacharyya_Resume.pdf to Sayandip_Bhattacharyya_Resume (7).pdf
⏳ File 'Sayandip_Bhattacharyya_Resume (7).pdf' Uploaded!
✅ Text extracted!

🔍 Ask a question about the document (or type 'quit' to exit):
⬇️ List 5 skills from Sayandip's resume.

🤖 AI Response:
 Python, TensorFlow, PyTorch, NLP, Computer Vision


🔍 Ask a question about the document (or type 'quit' to exit):
⬇️ When did Sayandip pass Class 10?

🤖 AI Response:
 2019


🔍 Ask a question about the document (or type 'quit' to exit):
⬇️ What was the percentage marks obtained by Sayandip in Class 12 exams?

🤖 AI Response:
 92%


🔍 Ask a question about the document (or type 'quit' to exit):
⬇️ Write an introduction for Sayandip based on his resume.

🤖 AI Response:
 Sayandip Bhattacharyya is a highly accomplished and award-winning computer science student (B.Tech expected 2025) with expertise in deep learning, natural language processing, and computer vision.  His experience includes internships at HEVA AI and NIT