### Persistent vector storage + Better QA (Question - Answer) Handling

In [1]:
import os
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_groq import ChatGroq
from langchain.chains import ConversationalRetrievalChain
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Load environment variables
load_dotenv()

True

### Step 1 : Ingesting PDFs from the folder

In [2]:
# Folder where PDFs are stored
pdf_folder = "policies"

documents = []
loaded_files = []

for file in os.listdir(pdf_folder):
    if file.endswith(".pdf"):
        file_path = os.path.join(pdf_folder, file)
        loader = PyPDFLoader(file_path)
        docs = loader.load()
        for d in docs:
            # Attach metadata (source filename)
            d.metadata["source"] = file
        documents.extend(docs)
        loaded_files.append(file)

print(f"✅ Loaded {len(documents)} total pages/chunks from {len(loaded_files)} PDF files:")
for f in loaded_files:
    print(f" - {f}")

✅ Loaded 15 total pages/chunks from 3 PDF files:
 - Leave-Policy.pdf
 - posh-policy.pdf
 - Salary_Policy.pdf


### Step 2 : Splitting Chunks

In [3]:
# Split into smaller chunks (for token efficiency)
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)

print(f"✅ Created {len(chunks)} chunks")

✅ Created 37 chunks


### Step 3 : Embedding and Vector Storage (local ChromaDB)

In [4]:
pip install sentence-transformers

Note: you may need to restart the kernel to use updated packages.


In [5]:
# Free alternative embeddings using SentenceTransformers
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma

# Create embeddings using a small, fast model
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Build Chroma vectorstore
vectordb = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_model,
    persist_directory="./chroma_store"  # This creates a directory by this name 
)

vectordb.persist()
print("✅ Vectorstore created & persisted to disk (using HuggingFace embeddings)")

  embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")


✅ Vectorstore created & persisted to disk (using HuggingFace embeddings)


  vectordb.persist()


### Step 4 : Loading the LLM

In [6]:
groq_api_key = os.getenv("GROQ_API_KEY")

llm = ChatGroq(
    model="llama-3.1-8b-instant", 
    api_key=groq_api_key,
    temperature=0
)

### Step 5 : Build Conversational Retrieval Chain

- Retriever fetches top-2 most relevant chunks.
- Conversational chain allows multi-turn Q&A with context retention.
- chat_history stores previous Q&A turns.

In [7]:
retriever = vectordb.as_retriever(search_kwargs={"k": 2})

qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

chat_history = []

### Step 6 : Interactive QnA loop

In [8]:
while True:
    query = input("\nAsk a question (or type 'exit' to quit): ").strip()
    if query.lower() in ("exit", "quit"):
        break

    try:
        result = qa_chain.invoke({"question": query, "chat_history": chat_history})
        answer = result["answer"]
        sources = result["source_documents"]

        print("\nYou Asked:")
        print(query)

        print("\nAnswer:")
        print(answer)

        print("\nSources:")
        for src in sources:
            print(f"- {src.metadata.get('source')}")

        # Update chat history
        chat_history.append((query, answer))

    except Exception as e:
        print(f"⚠️ Error: {e}")


Ask a question (or type 'exit' to quit):  What are POSH Policy guidelines



You Asked:
What are POSH Policy guidelines

Answer:
Based on the provided context, the POSH Policy guidelines for BBIL SYSTEMS are as follows:

1. **Purpose**: To create and maintain a safe work environment, free from sexual harassment and discrimination for all employees.
2. **Scope**: The policy applies to all employees of BBIL SYSTEMS, including those hired on a permanent, temporary, contracted, or part-time basis, directly or indirectly, or through a vendor organization.
3. **Applicability**: The policy applies to all employees of BBIL SYSTEMS, including those in company premises or elsewhere in India or abroad.
4. **Definition of Employee**: An employee of BBIL SYSTEMS includes anyone carrying out work on behalf of the company, regardless of their employment status or basis.

Additionally, the policy aims to adopt a zero-tolerance attitude towards any kind of sexual harassment or discrimination caused by any employee during their tenure in BBIL SYSTEMS towards any other person, i


Ask a question (or type 'exit' to quit):  How many Earned Leaves do I get in one year



You Asked:
How many Earned Leaves do I get in one year

Answer:
The text doesn't explicitly state the number of Earned Leaves you get in one year. However, it does mention that if you are unable to use all of your accrued Earned Leave during a calendar year, you may elect to carry forward any accrued but unused Earned Leave into the next calendar year, subject to the maximum leave of 45 days. This suggests that the maximum Earned Leave you can accrue in a year is 45 days.

Sources:
- Leave-Policy.pdf
- Leave-Policy.pdf



Ask a question (or type 'exit' to quit):  What is casual leave



You Asked:
What is casual leave

Answer:
Casual leave is a type of leave that an employee can take for personal or miscellaneous reasons, such as a family event, a personal appointment, or a sudden need to attend to a personal matter. It is usually taken on short notice and is not related to illness or injury.

Sources:
- Leave-Policy.pdf
- Leave-Policy.pdf



Ask a question (or type 'exit' to quit):  What do you mean by Scope and purpose



You Asked:
What do you mean by Scope and purpose

Answer:
Based on the provided context, it appears that the "Scope" and "Purpose" refer to the definition of the scope and purpose of the information disclosure, likely related to IEnova's business relationships and affiliations.

The scope seems to include:

- Individuals mentioned in conditions a) to c) (though these conditions are not explicitly stated in the provided text)
- Partners or co-owners of the individuals mentioned in conditions a) to c)
- Companies that are part of a business group or consortium to which IEnova belongs
- Companies over which one of the individuals referenced by conditions a) to c) have control or significant influence

The purpose of this scope is not explicitly stated in the provided text, but it is likely related to transparency and disclosure of IEnova's business relationships and affiliations, possibly for regulatory or compliance purposes.

Sources:
- Salary_Policy.pdf
- Salary_Policy.pdf



Ask a question (or type 'exit' to quit):  Can you describe the scope and functions in depth



You Asked:
Can you describe the scope and functions in depth

Answer:
Based on the provided context, the POSH Policy guidelines for BBIL SYSTEMS aim to create and maintain a safe work environment that is free from sexual harassment and discrimination. Here's a detailed explanation of the scope and functions of the POSH Policy guidelines:

**Scope:**

The POSH Policy guidelines apply to all employees of BBIL SYSTEMS, including:

* Permanent employees
* Temporary employees
* Contracted employees
* Employees on a retainer ship basis
* Employees on a part-time basis
* Employees hired directly or indirectly
* Employees hired through vendor organizations

The scope also extends to clients, vendors, and contractors in company premises or elsewhere in India or abroad.

**Functions:**

The POSH Policy guidelines aim to:

1. **Create a safe work environment**: The policy aims to create a work environment that is free from sexual harassment and discrimination, where all employees feel safe and r


Ask a question (or type 'exit' to quit):  exit
