**RAG - Retrieval Augmented generation with GPT:**

- A RAG is technique where a language model (like GPT) is enhanced by retrieving relevant context from a knowledge base before generating an outcome.
  - Retrieve : Fetch relevant documents or data chunks from a knowledge source based on the user query.
  - Augument : Feed these retrieved chunks as additional context to the language model.
  - Generate - The language model (e.g,. GPT) generates an answer., leveraging both the retrieved data and its own knowledge.

In [None]:
!pip install PyPDF2



In [None]:
!pip install -U langchain-community



In [None]:
!pip install faiss-cpu



In [None]:
!pip install langchain_google_genai

Collecting langchain_google_genai
  Downloading langchain_google_genai-2.1.9-py3-none-any.whl.metadata (7.2 kB)
Collecting filetype<2.0.0,>=1.2.0 (from langchain_google_genai)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting google-ai-generativelanguage<0.7.0,>=0.6.18 (from langchain_google_genai)
  Downloading google_ai_generativelanguage-0.6.18-py3-none-any.whl.metadata (9.8 kB)
Downloading langchain_google_genai-2.1.9-py3-none-any.whl (49 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Downloading google_ai_generativelanguage-0.6.18-py3-none-any.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m21.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: filetype, google-ai-generativelanguage, langchain_google_genai
  Attempting uninstall: google-ai-generativelangu

In [None]:
import PyPDF2
from openai import OpenAI
import google.generativeai as genai
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import ChatOpenAI
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains import RetrievalQA

**1. Load text from PDF**

In [None]:
def extract_data_from_pdf(pdf_path):
    with open(pdf_path , 'rb') as file:
        pdfreader = PyPDF2.PdfReader(file)
        full_text = ''
        for page in pdfreader.pages:
            full_text += page.extract_text()
    return full_text

**2. Split text into chunks**

In [None]:
def split_text(text):
  splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=100)
  docs = splitter.create_documents([text])
  return docs

**3. Create vector store**

In [None]:
def create_vector_store(docs):
  embeddings = OpenAIEmbeddings()
  vectorstore = FAISS.from_documents(docs , embeddings)
  return vectorstore

**4. Setup RAG QA pipeline**

In [None]:
def setup_rag_qa(vectorstore):
  retriever = vectorstore.as_retriever(search_type = 'similarity')
  #llm = ChatOpenAI(model = "gpt-4.1-nano")
  llm = ChatGoogleGenerativeAI(model = "gemini-2.5-flash")
  rag_chain = RetrievalQA.from_chain_type(llm=llm , retriever=retriever)
  return rag_chain

**Complete Workflow**

In [None]:
import os
from google.colab import userdata
os.environ['OPENAI_API_KEY'] = userdata.get('o_key')

import google.generativeai as genai
os.environ['GOOGLE_API_KEY'] = userdata.get('GOOGLE_API_KEY')


pdf_path = '/content/company_manual.pdf'

text = extract_data_from_pdf(pdf_path)
docs = split_text(text)
vectorstore = create_vector_store(docs)
qa_chain = setup_rag_qa(vectorstore)

In [None]:
#Ask a question
query = "Tell me about the companty and the produxt, The output expected in bullet form. Also want to know the brief of warranty"
result = qa_chain(query)

print(result['result'])

Here is the information you requested:

*   **About the Company:**
    *   TechNova Solutions Pvt. Ltd. is a global leader in consumer electronics.
    *   They are committed to delivering innovative technology and excellent customer service.
    *   Their core values include innovation, environmental responsibility, and customer satisfaction.
    *   Headquartered in Bengaluru, India, with major regional offices in Singapore, Germany, and the United States.

*   **About the Product:**
    *   The provided text refers to "consumer electronics" in general and does not specify a particular product.

*   **Brief of Warranty:**
    *   To make a warranty claim, users must register the product on the company's website within 15 days of purchase.
    *   A valid proof of purchase is required during the claim process.
    *   Warranty services include repair, replacement, or refund, at the company's discretion.


In [None]:
#Ask a question
query = "How to reach customer support, output format - json"
result = qa_chain(query)

print(result['result'])

```json
{
  "customer_support_channels": [
    {
      "type": "Phone",
      "number": "+91-9999999999",
      "hours": "Mon-Fri, 9 AM to 6 PM IST"
    },
    {
      "type": "Email",
      "address": "support@technova.com",
      "response_time": "within 24 hours"
    },
    {
      "type": "Live Chat",
      "availability": "Available on our website and mobile app"
    },
    {
      "type": "Help Center",
      "url": "https://support.technova.com"
    },
    {
      "type": "Video Call Support",
      "note": "for troubleshooting smart home devices"
    },
    {
      "type": "Remote Diagnostics",
      "note": "for troubleshooting smart home devices"
    }
  ],
  "general_response_aim": "resolve most customer queries within 48 hours"
}
```
