# A Simple website AI Assistant with RAG

In [1]:
# Define the list of pages of sonfmeets.com website
page_list = [
    "https://softmeets.com/",
    "https://softmeets.com/automation/",
    "https://softmeets.com/internet-of-things/",
    "https://softmeets.com/artificial-intelligence/",
    "https://softmeets.com/analytics/",
    "https://softmeets.com/about-us/",
    "https://softmeets.com/customers/",
    "https://softmeets.com/partners/",
    "https://softmeets.com/contact-us/"
    "https://softmeets.com/vetting-of-detail-project-report-of-automation-of-drinking-water-project-with-national-institute-of-technology-durgapur/",
    "https://softmeets.com/implementation-of-document-management-system-at-sail-bokaro-steel-plant/",
    "https://softmeets.com/application-development-system-modernization-and-support-upv-tms/",
    "https://softmeets.com/barqat/",
    "https://softmeets.com/rbms/",
    "https://softmeets.com/upv/",
    "https://softmeets.com/cms/",
    "https://softmeets.com/privacy-policy/"
]


### Load the contents of web pages

In [2]:
# Import libraries
from langchain_community.document_loaders import WebBaseLoader
import bs4

In [3]:
# Load the contents of web pages
loader = WebBaseLoader(
    web_paths= page_list,
    bs_kwargs= {
        "parse_only": bs4.SoupStrainer(["h1", "h2", "h3", "p", "div class='x-content'", "li"])
    },
    bs_get_text_kwargs= { "separator": " \n", "strip": True}
)

In [4]:
documents = loader.load()

In [5]:
len(documents)

16

In [6]:
print(documents[1].page_content)

Home 
Innovation 
Automation 
Internet of Things 
Artificial intelligence 
Analytics 
About 
About Us 
Case Study 
Customers 
Partners 
Contact Us 
Home 
Innovation 
Automation 
Internet of Things 
Artificial intelligence 
Analytics 
About 
About Us 
Case Study 
Customers 
Partners 
Contact Us 
Automation 
Our automation solutions leverage advanced technology to perform tasks with minimal human intervention, enhancing efficiency, accuracy, and scalability. Softmeets streamlines workflows, reduces costs, and minimizes errors, all while enabling seamless 24/7 operations. By automating repetitive tasks, we empower organizations to allocate resources to strategic initiatives, enhance safety by managing hazardous processes, and ensure compliance through consistent and reliable execution. 
With real-time monitoring and predictive analytics, automation drives data-driven decision-making and operational agility, resulting in better productivity, customer satisfaction, and long-term sustainabil

### Split the documents into chunks

In [7]:
# Import libraries
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [8]:
# Define splitter
splitter = RecursiveCharacterTextSplitter(chunk_size= 500, chunk_overlap= 50)

In [9]:
# Split the documents
splitted_docs = splitter.split_documents(documents)

In [10]:
len(splitted_docs)

147

In [11]:
print(splitted_docs[5].page_content)

With real-time monitoring and predictive analytics, automation drives data-driven decision-making and operational agility, resulting in better productivity, customer satisfaction, and long-term sustainability. 
Empowering Efficiency, 
Elevating Accuracy 
Increased Efficiency 
Faster Execution: 
Automation completes tasks faster than manual efforts, reducing process cycle times. 
24/7 Operation: 
Automated systems can run continuously without breaks or downtime. 
Streamlined Workflows:


In [12]:
print(splitted_docs[6].page_content)

Streamlined Workflows: 
Reduces bottlenecks by automating sequential tasks. 
Cost Savings 
Reduced Labor Costs: 
Minimizes the need for repetitive manual work, allowing employees to focus on strategic tasks. 
Lower Operational Costs: 
ptimizes resource usage, such as energy and materials, in industrial automation. 
Error Reduction: 
Fewer errors mean lower costs associated with rework or waste. 
Improved Accuracy 
Minimized Human Errors:


### Embedding and ingest in Vector Database

In [13]:
# Import libraries
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma

In [14]:
# Define embedding model
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

In [15]:
# Store in in-memory chroma DB
vector_db = Chroma.from_documents(
    documents= splitted_docs,
    embedding= embeddings,
    collection_name="softmeets_website"
)

### Define Retriever and RAG Chain

In [16]:
# Set Retriever
retriever = vector_db.as_retriever()

In [19]:
# Import Libraries
from langchain_openai import ChatOpenAI
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

In [18]:
# LLM
llm = ChatOpenAI(model="gpt-4o")

In [20]:
# Define Prompt Template
template = """
You are an assistant for question-answering task. Use the following pieces of retrieved context for answering the question.
If you don't know the answer, just say you don't know, don't try to make up an answer.
Your answer should be to the point and consice.
Question: {question}
Context: {context}
"""
prompt = ChatPromptTemplate.from_template(template)

In [21]:
# Define Chain
rag_chain = (
    { "context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

### Test RAG Chain

In [22]:
rag_chain.invoke("Tell me about Softmeets")

'Softmeets Info Solutions Pvt. Ltd. is a company that places significant importance on protecting the privacy and security of personal information. Their privacy policy outlines how they collect, use, and safeguard information provided by users. They collect data entered knowingly on their services, which may include personal details like name, phone number, email address, photograph, and school affiliation, primarily used for notifications. They emphasize strong privacy protections and require users to agree to their Privacy Statement when providing personal information.'

In [23]:
rag_chain.invoke("Tell me their Office address")

'The office address is Webel IT Park (3rd Floor), Kalyanpur Satellite Township, Asansol – 713302, Dist – Paschim Bardhaman (W.B), India.'

In [24]:
rag_chain.invoke("Who are their clients?")

"I don't know."

In [25]:
rag_chain.invoke("Tell me about the UPV?")

'UPV (Unified Plant View) is a system implemented at the SAIL IISCO Steel Plant to automate the production process and provide real-time monitoring and analysis reporting across various departments such as Blast Furnace, Sinter Plant, and Basic Oxygen Furnace. It integrates different plant operations into a single window to facilitate streamlined decision-making, real-time data visualization, process optimization, centralized monitoring, and predictive maintenance capabilities.'

In [26]:
rag_chain.invoke("Tell me about Rolling Block project")

'The Rolling Block Management System (RBMS) is an advanced system designed to manage train movements and track maintenance more dynamically and efficiently compared to traditional fixed block signaling systems. It enhances operational safety, optimizes track utilization, and ensures uninterrupted train services while allowing for scheduled maintenance activities. The RBMS increases line capacity, promotes efficient track usage, reduces congestion, and supports eco-friendly operations. It works seamlessly with advanced train control systems like CBTC (Communications-Based Train Control) or ETCS (European Train Control System), utilizing predictive analytics for proactive planning and resource allocation.'

In [27]:
rag_chain.invoke("Which technologies they use?")

'They use technologies such as Web and Android platforms, Angular, HTML-5, CSS3 for frontend, and Node.js with Express for the backend. Additionally, they work with innovations in the fields of Automation, Internet of Things, Artificial Intelligence, and Analytics.'

In [28]:
rag_chain.invoke("Tell me about their innovation")

"I don't have specific details on their innovation from the provided context."

In [29]:
rag_chain.invoke("Do thay work on IOT?")

'Yes, they work on IoT (Internet of Things).'

In [30]:
rag_chain.invoke("Any certfication do they have?")

"I don't know. The provided context does not contain any information about certifications."

In [31]:
rag_chain.invoke("What they do in AI?")

"I don't know."