## **Objective**

Objecttive of this project is to develop a chatbot application using LLM that can answer the user queries based on the product user guide

## **1. Data Loading, Preparation and Analysis** 


In [1]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
import re
from typing import List, Dict

In [2]:
# Load the files as documents
pdf_path = "D:/Projects/UserGuide_ Chat Application/sample_software_user_guide.pdf"

# Load PDF document
loader = PyPDFLoader(pdf_path)
doc = loader.load()

In [3]:
print(f"Loaded {len(doc)} pages from the PDF document.")
print(doc[0].page_content)  # Print content of the first page

Loaded 1 pages from the PDF document.
Acme Analytics – User Guide
1. Introduction
Welcome to Acme Analytics, a business intelligence tool for creating dashboards, reports, and data
visualizations. This guide will help you install, configure, and use the product effectively.
2. System Requirements
Operating System
Windows 10+, macOS 11+, Linux (Ubuntu 20.04+)
Processor
Intel i5 or equivalent
RAM
8 GB minimum, 16 GB recommended
Storage
500 MB free disk space
Dependencies
Python 3.9+, Node.js 16+
3. Installation & Setup
- Download the installer from the official Acme Analytics website.
- Run the installer and follow the on-screen instructions.
- Accept the license agreement and choose an installation directory.
- Launch the application from the Start Menu (Windows) or Applications folder (Mac).
4. Core Features
- Dashboard Creation – Build interactive dashboards with drag-and-drop widgets.
- Report Generation – Export data into PDF, Excel, or CSV formats.
- Data Integration – Connect to S

In [4]:
   
#As its a user guide with specific sections using Semantic Chunking (Topic-based) 
def semantic_chunking(document:str) -> List[Dict]:
    chunks = []

    # Split user guide into semantic chunks based on numbered headings (e.g., '1. Introduction').
    # Returns a list of dicts: { "section": str }
    

    # regex heading starts with number + dot+ space 
    sections = re.split(r'\n(?=\d+\.\s+)', document)

    for section in sections:
        if not section.strip():
            continue

        # Extract section title (first line)
        lines = section.strip().split('\n',1)
        section_title = lines[0].strip()
        section_body = lines[1].strip() if len(lines)>1 else ""

        chunks.append({
            "section": section_title,
            "content": section_body
        })
    return chunks



In [5]:
chunks = semantic_chunking(doc[0].page_content)
print(chunks)
# for c in chunks:
#     print(f"-- {c['section']} --")
#     print(c["content"])
#     print()



[{'section': 'Acme Analytics – User Guide', 'content': ''}, {'section': '1. Introduction', 'content': 'Welcome to Acme Analytics, a business intelligence tool for creating dashboards, reports, and data\nvisualizations. This guide will help you install, configure, and use the product effectively.'}, {'section': '2. System Requirements', 'content': 'Operating System\nWindows 10+, macOS 11+, Linux (Ubuntu 20.04+)\nProcessor\nIntel i5 or equivalent\nRAM\n8 GB minimum, 16 GB recommended\nStorage\n500 MB free disk space\nDependencies\nPython 3.9+, Node.js 16+'}, {'section': '3. Installation & Setup', 'content': '- Download the installer from the official Acme Analytics website.\n- Run the installer and follow the on-screen instructions.\n- Accept the license agreement and choose an installation directory.\n- Launch the application from the Start Menu (Windows) or Applications folder (Mac).'}, {'section': '4. Core Features', 'content': '- Dashboard Creation – Build interactive dashboards with

In [6]:
# Vector embeddings and FAISS index creation
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
import os
from langchain.schema import Document
from langchain.storage import LocalFileStore


In [7]:
from getpass import getpass
# Fetch your OPENAI API Key as an environment variable
api_key = getpass("Enter your OpenAI API key: ")
os.environ["OPENAI_API_KEY"] = api_key


In [8]:
#LangChain vectorstores like FAISS, Chroma, Pinecone expect input as a list of Document objects, not plain dicts.
#So you need to convert your chunks into Document format:
from langchain.schema import Document

langchain_chunks = [
    Document(page_content=c["content"], metadata={"section": c["section"]})
    for c in chunks
]

print(langchain_chunks[1])

page_content='Welcome to Acme Analytics, a business intelligence tool for creating dashboards, reports, and data
visualizations. This guide will help you install, configure, and use the product effectively.' metadata={'section': '1. Introduction'}


In [9]:
# # Initialise an embedding function
embeddings  = OpenAIEmbeddings(model="text-embedding-3-small")

# # Create a FAISS vector store from the documents and embeddings
# vector_store = FAISS.from_documents(langchain_chunks, embeddings)

# # Save the FAISS index to disk
# vector_store.save_local("faiss_index_user_guide")
# print("FAISS index created and saved to 'faiss_index_user_guide' directory.")

  embeddings  = OpenAIEmbeddings(model="text-embedding-3-small")


In [10]:
#load the index from disk
loaded_vector_store = FAISS.load_local("faiss_index_user_guide", embeddings ,allow_dangerous_deserialization=True)


# Create the RAG Chain 

In [11]:
# create a retriever from the vectorstore for top 3 results
retriever = loaded_vector_store.as_retriever(search_kwargs={"k": 3})

In [12]:
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

In [13]:
llm = ChatOpenAI(model="gpt-4.1-nano", temperature=0)

In [14]:
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="stuff"
)

In [15]:
query = "My App aint starting what to do ?"
answer = qa_chain.run(query)
print(answer)

  answer = qa_chain.run(query)


Ensure that all dependencies are installed and then try restarting your system. If the problem persists, you may want to check for any error messages and consult the support team at support@acmeanalytics.example or visit the online knowledge base at https://docs.acmeanalytics.example for further assistance.
