### Persistent vector storage + Better QA (Question - Answer) Handling

In [1]:
import os
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_groq import ChatGroq
from langchain.chains import ConversationalRetrievalChain
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Load environment variables
load_dotenv()

True

### Step 1 : Ingesting PDFs from the folder

In [3]:
# Folder where PDFs are stored
pdf_folder = "policies"

documents = []
loaded_files = []

for file in os.listdir(pdf_folder):
    if file.endswith(".pdf"):
        file_path = os.path.join(pdf_folder, file)
        loader = PyPDFLoader(file_path)
        docs = loader.load()
        for d in docs:
            # Attach metadata (source filename)
            d.metadata["source"] = file
        documents.extend(docs)
        loaded_files.append(file)

print(f"✅ Loaded {len(documents)} total pages/chunks from {len(loaded_files)} PDF files:")
for f in loaded_files:
    print(f" - {f}")

✅ Loaded 15 total pages/chunks from 3 PDF files:
 - Leave-Policy.pdf
 - posh-policy.pdf
 - Salary_Policy.pdf


### Step 2 : Splitting Chunks

In [4]:
# Split into smaller chunks (for token efficiency)
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)

print(f"✅ Created {len(chunks)} chunks")

✅ Created 37 chunks


### Step 3 : Embedding and Vector Storage (local ChromaDB)

In [7]:
pip install sentence-transformers

Collecting tokenizers<0.22,>=0.21 (from transformers<5.0.0,>=4.41.0->sentence-transformers)
  Using cached tokenizers-0.21.4-cp39-abi3-win_amd64.whl.metadata (6.9 kB)
Using cached tokenizers-0.21.4-cp39-abi3-win_amd64.whl (2.5 MB)
Installing collected packages: tokenizers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.20.3
    Uninstalling tokenizers-0.20.3:
      Successfully uninstalled tokenizers-0.20.3
Successfully installed tokenizers-0.21.4
Note: you may need to restart the kernel to use updated packages.


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
chromadb 0.5.23 requires tokenizers<=0.20.3,>=0.13.2, but you have tokenizers 0.21.4 which is incompatible.


In [9]:
# Free alternative embeddings using SentenceTransformers
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma

# Create embeddings using a small, fast model
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Build Chroma vectorstore
vectordb = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_model,
    persist_directory="./chroma_store"  # This creates a directory by this name 
)

vectordb.persist()
print("✅ Vectorstore created & persisted to disk (using HuggingFace embeddings)")

✅ Vectorstore created & persisted to disk (using HuggingFace embeddings)


### Step 4 : Loading the LLM

In [10]:
groq_api_key = os.getenv("GROQ_API_KEY")

llm = ChatGroq(
    model="llama-3.1-8b-instant", 
    api_key=groq_api_key,
    temperature=0
)

### Step 5 : Build Conversational Retrieval Chain

- Retriever fetches top-2 most relevant chunks.
- Conversational chain allows multi-turn Q&A with context retention.
- chat_history stores previous Q&A turns.

In [11]:
retriever = vectordb.as_retriever(search_kwargs={"k": 2})

qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

chat_history = []

### Step 6 : Interactive QnA loop

In [12]:
while True:
    query = input("\nAsk a question (or type 'exit' to quit): ").strip()
    if query.lower() in ("exit", "quit"):
        break

    try:
        result = qa_chain.invoke({"question": query, "chat_history": chat_history})
        answer = result["answer"]
        sources = result["source_documents"]

        print("\nAnswer:")
        print(answer)

        print("\nSources:")
        for src in sources:
            print(f"- {src.metadata.get('source')}")

        # Update chat history
        chat_history.append((query, answer))

    except Exception as e:
        print(f"⚠️ Error: {e}")


Ask a question (or type 'exit' to quit):  What are POSH Policy guidelines



Answer:
Based on the provided context, the POSH Policy guidelines are as follows:

1. **Purpose**: To create and maintain a safe work environment, free from sexual harassment and discrimination for all employees.
2. **Scope**: BBIL SYSTEMS aims to adopt a zero-tolerance attitude against any kind of sexual harassment or discrimination caused by any employee towards any other person, including employees, clients, vendors, and contractors, in company premises or elsewhere in India or abroad.
3. **Applicability**: The policy applies to all employees of BBIL SYSTEMS, including those hired on a permanent, temporary, contracted, or part-time basis, directly or indirectly, or through a vendor organization.
4. **Definition of Employee**: An employee of BBIL SYSTEMS includes anyone carrying out work on behalf of the company, regardless of their employment status or basis.

These guidelines are based on the "The Sexual harassment of women at workplace (prevention, prohibition & redressal) Act, 2


Ask a question (or type 'exit' to quit):  How many Earned Leaves do I get in one year



Answer:
Unfortunately, the given context does not specify the number of Earned Leaves you get in one year. It only mentions that if you are unable to use all of your accrued Earned Leave during a calendar year, you may elect to carry forward any accrued but unused Earned leave into the next calendar year, subject to the maximum leave of 45 days.

Sources:
- Leave-Policy.pdf
- Leave-Policy.pdf



Ask a question (or type 'exit' to quit):  What is casual leave



Answer:
Casual leave is a type of leave that an employee can take for personal or miscellaneous reasons, such as a family event, a personal appointment, or a sudden need to attend to a personal matter. It is usually taken on short notice and is not related to illness or injury.

Sources:
- Leave-Policy.pdf
- Leave-Policy.pdf



Ask a question (or type 'exit' to quit):  What do you mean by Scope and purpose



Answer:
Based on the provided context, it appears that the Scope and purpose are related to the disclosure of information about IEnova and its affiliates, subsidiaries, and business relationships.

The Scope seems to refer to the categories of entities that are included in the disclosure, such as:

* Individuals mentioned in conditions a) to c)
* Partners or co-owners of these individuals
* Companies that are part of a business group or consortium to which IEnova belongs
* Companies over which IEnova has control or significant influence

The Purpose seems to be to provide transparency and disclosure about the business relationships and affiliations of IEnova and its related entities. This may be for regulatory, compliance, or reporting purposes.

Sources:
- Salary_Policy.pdf
- Salary_Policy.pdf



Ask a question (or type 'exit' to quit):  Can you describe the scope and functions in depth



Answer:
Based on the provided context, here's a detailed explanation of the scope and purpose of the POSH Policy guidelines:

**Purpose:**
The primary purpose of the POSH Policy guidelines is to create and maintain a safe work environment that is free from sexual harassment and discrimination for all employees. The policy aims to establish guidelines as per the guidelines of "The Sexual harassment of women at workplace (prevention, prohibition & redressal) Act, 2013."

In simpler terms, the purpose of the POSH Policy is to ensure that all employees, regardless of their gender, feel safe and respected in the workplace. It aims to prevent and prohibit any form of sexual harassment or discrimination, and provide a framework for addressing and resolving such issues.

**Scope:**
The scope of the POSH Policy guidelines is broad and inclusive. BBIL SYSTEMS aims to adopt a zero-tolerance attitude towards any kind of sexual harassment or discrimination caused by any employee during their tenur


Ask a question (or type 'exit' to quit):  exit
