# AI-Powered Document QA Pipeline Demonstration

## Step 1: Install Required Libraries
Install the required libraries by running the following commands.



In [11]:
!pip install langchain streamlit faiss-cpu openai langchain_openai langchain_community pypdf

Collecting pypdf
  Downloading pypdf-5.2.0-py3-none-any.whl.metadata (7.2 kB)
Downloading pypdf-5.2.0-py3-none-any.whl (298 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m298.7/298.7 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-5.2.0


---


## Step 2: Set Up the Environment
Import necessary libraries and define environment variables.

In [15]:
import os
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
# from langchain_community.chat_models import ChatOpenAI
from langchain_community.vectorstores.faiss import FAISS
from langchain_openai import ChatOpenAI


# Set your OpenAI API Key


In [9]:
os.environ['OPENAI_API_KEY'] = "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"

## Step 3: Define Helper Functions
We will use the functions `create_vector_store` and `query_vector_store` from the `test_api.py` file.

In [5]:
def create_vector_store(file_path):
    """
    Generates a FAISS vector store for a given file.
    Args:
        file_path (str): Path to the file for which the vector store needs to be created.

    Returns:
        FAISS: A FAISS vector store object.
    """
    base_name, extension = os.path.splitext(file_path)
    if extension.lower() != ".pdf":
        raise ValueError("Unsupported file format. Only PDFs are supported.")

    embeddings = OpenAIEmbeddings()
    document_loader = PyPDFLoader(file_path=file_path)
    document_content = document_loader.load()

    splitter = RecursiveCharacterTextSplitter(
        chunk_size=2000,
        chunk_overlap=100,
        separators=["\n", "\n\n", "(?<=\. )", "", " "]
    )
    split_documents = splitter.split_documents(documents=document_content)

    vector_store = FAISS.from_documents(split_documents, embeddings)
    vector_store.save_local(f"vector_store_{base_name}")

    return vector_store


def query_vector_store(vector_store, user_query):
    """
    Executes a query on a given FAISS vector store using an OpenAI language model.
    Args:
        vector_store (FAISS): The vector store to query.
        user_query (str): The query string to process.

    Returns:
        str: The response generated by the language model.
    """
    chat_model = ChatOpenAI(temperature=0.0, verbose=True)
    retriever_pipeline = RetrievalQA.from_chain_type(
        llm=chat_model,
        chain_type="stuff",
        retriever=vector_store.as_retriever()
    )

    return retriever_pipeline.run(user_query)



---

## Step 4: Upload a PDF File
Use the following code block to upload a PDF file to the Colab environment.

In [6]:
from google.colab import files

# Upload a PDF file
uploaded = files.upload()
file_name = list(uploaded.keys())[0]

Saving Test_Finance.pdf to Test_Finance.pdf


---

## Step 5: Create a Vector Store
Process the uploaded PDF and create a FAISS vector store.


In [12]:
# Generate the vector store
try:
    vector_store = create_vector_store(file_name)
    print("Vector store created successfully!")
except Exception as e:
    print(f"Error: {e}")

Vector store created successfully!


---

## Step 6: Query the Vector Store
Ask questions about the uploaded document and retrieve answers.

In [16]:
# Input your query
user_query = input("Enter your question: ")
# Fetch the response
try:
    response = query_vector_store(vector_store, user_query)
    print("Response:", response)
except Exception as e:
    print(f"Error: {e}")

Enter your question: What are the total expenses for Q2 2023?
Response: The total expenses for Q2 2023 are as follows:

- Employee benefit expenses: ₹20,311 crore
- Cost of technical sub-contractors: ₹3,116 crore
- Travel expenses: ₹426 crore
- Cost of software packages and others: ₹2,886 crore
- Communication expenses: ₹171 crore
- Consultancy and professional charges: ₹387 crore
- Depreciation and amortization expenses: ₹1,121 crore

Adding these up gives a total expense of ₹28,318 crore for Q2 2023.


---

## Conclusion
This Colab notebook demonstrates the full QA pipeline using the `chatbot_app.py` and `test_api.py` functionalities. You can upload a PDF, create a vector store, and query it for relevant answers using OpenAI's language models.
