## AI-ML Driven Chatbot

The aim of this project is to develop an AI-ML driven chatbot as part of our problem statement no:  2

### **As a prerequisite, we install all the necessary libraries required for our project.**

In [None]:
# !pip install langchain
# !pip install chromadb
# !pip install PyPDF2
# !pip install faiss-cpu
# !pip install gradio

###  **Then, we import these libraries and modules necessary. These are mentioned below.**

In [None]:
from PyPDF2 import PdfReader
from langchain.embeddings import HuggingFaceEmbeddings, HuggingFaceInferenceAPIEmbeddings, HuggingFaceInstructEmbeddings
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS, Chroma
from langchain.llms import HuggingFaceHub
from langchain.chains import RetrievalQA, LLMChain

import warnings
warnings.filterwarnings('ignore')

###**We can either upload the file from drive or from our local system/computer. Here we have chosen a google drive where our file is located.**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


**The approach that we are going to take involves the use of a PDF which contains all queries that can be possibly asked by the user/individual. For eg: Regarding the Contact details. At present we have chosen already available questions in the FAQ section of CPGRAMS portal as reference. It can be scaled further by updating the pdf with more questions and answers. After the questions alongwith answers are written in Microsoft Word and converted to pdf, we can use the same.**

In [None]:
reader = PdfReader('/content/drive/MyDrive/CPGRAMS QNA2.pdf')

In [None]:
raw_text = ''
for i, page in enumerate(reader.pages):
  text = page.extract_text()
  if text:
    raw_text += text

In [None]:
raw_text

'5. What happens to the grievances? How are the grievances dealt with in Central \nMinistries/Departments?  1. What are the contact details of the Department of Administrative Reforms and \nPublic Grievances?  \nAns: Department of Administrative Reforms and Public Grievances, 5th floor, Sardar Patel \nBhavan, Sansad Marg, New Delhi – 110001  \nWebsite:: www.darpg.gov.in Tele fax : 23741006  \n2. Where can the grievances be sent?  \n \nAns: Grievances can be directed to the following departments: \n \na) The Department of Administrative Reforms and Public Grievances at \npgportal.gov.in. \nb) The Department of Pensions and Pensioners’ Welfare (DP&PW) at \npgportal.gov.in/pension. \n \nThese nodal agencies accept grievances online through pgportal.gov.in, as well as by \npost or in person. \n \n3. How do I lodge the grievance?  \n \nAns: Complaints can be submitted online. However, if internet access is unavailable or for \nany other reason, individuals are welcome to mail their grievanc

In [None]:
raw_text[:500]

'5. What happens to the grievances? How are the grievances dealt with in Central \nMinistries/Departments?  1. What are the contact details of the Department of Administrative Reforms and \nPublic Grievances?  \nAns: Department of Administrative Reforms and Public Grievances, 5th floor, Sardar Patel \nBhavan, Sansad Marg, New Delhi – 110001  \nWebsite:: www.darpg.gov.in Tele fax : 23741006  \n2. Where can the grievances be sent?  \n \nAns: Grievances can be directed to the following departments: \n \na) Th'

### **_To improve processing efficiency, the code uses a text splitter to divide extracted text into smaller chunks. The chosen parameters (separator, chunk size, and overlap) ensure optimal chunking for subsequent operations._**

In [None]:
text_splitter = CharacterTextSplitter(separator = '\n',
                      chunk_size = 1000, chunk_overlap = 0, length_function = len,)
texts = text_splitter.split_text(raw_text)

In [None]:
texts[0]

'5. What happens to the grievances? How are the grievances dealt with in Central \nMinistries/Departments?  1. What are the contact details of the Department of Administrative Reforms and \nPublic Grievances?  \nAns: Department of Administrative Reforms and Public Grievances, 5th floor, Sardar Patel \nBhavan, Sansad Marg, New Delhi – 110001  \nWebsite:: www.darpg.gov.in Tele fax : 23741006  \n2. Where can the grievances be sent?  \n \nAns: Grievances can be directed to the following departments: \n \na) The Department of Administrative Reforms and Public Grievances at \npgportal.gov.in. \nb) The Department of Pensions and Pensioners’ Welfare (DP&PW) at \npgportal.gov.in/pension. \n \nThese nodal agencies accept grievances online through pgportal.gov.in, as well as by \npost or in person. \n \n3. How do I lodge the grievance?  \n \nAns: Complaints can be submitted online. However, if internet access is unavailable or for'

### **_This code snippet imports necessary modules, securely prompts the user for an API token, and sets it as an environment variable for further use in the program. The purpose of this code would likely be to authenticate with the Hugging Face API using the provided token. The key can be accessed from [https://huggingface.co/settings/tokens].  We have used getpass module to hide the key needed for security reasons_**

In [None]:
import os
from getpass import getpass
HF_token = getpass()
os.environ["HUGGINGFACEHUB_API_TOKEN"] = HF_token

··········


### **This line of code is creating a tool to use a language model called 'BAAI/bge-base-en-v1.5' provided by Hugging Face. The tool will use an API key called 'HF_token' to access this language model. We also use library FAISS to perform similarity search**

In [None]:
embeddings = HuggingFaceInferenceAPIEmbeddings(api_key = HF_token, MODEL_NAME = 'BAAI/bge-base-en-v1.5')

In [None]:
db = FAISS.from_texts(texts, embeddings)

### **We then set and update the necessary parameters that will be used in our chatbot to answer questions.**

In [None]:
llm = HuggingFaceHub(
    repo_id="google/flan-t5-xxl",
    model_kwargs={"temperature": 0.5, "max_length": 64,"max_new_tokens":512}
)

In [None]:
from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(llm, chain_type = 'stuff')

### **Let us now perform query search**

In [None]:
query = "How to register for grievance?"
search = db.similarity_search(query)
chain.run(input_documents = search, question = query)

' Visit CPGRAMS Portal: Go to https://pgportal.gov.in/  Register/Login: New users register at https://pgportal.gov.in/Registration, existing users log in at https://pgportal.gov.in/Login  Lodge a Grievance: Access "Lodge Grievance," choose category, provide details, and submit. Receive a unique Grievance Registration Number for tracking and follow-up.'

# Combined code with all neccesary imports and install(assumes pdf file to be uploaded onto google drive )

In [None]:
# -*- coding: utf-8 -*-
"""app.ipynb

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/drive/1FVQMK-vj2XSfOCCm7-kv6P6OWeTTcdAS
"""

# !pip install langchain
# !pip install chromadb
# !pip install PyPDF2
# !pip install faiss-cpu
# !pip install gradio  # Install Gradio

import os
from getpass import getpass
from PyPDF2 import PdfReader
from langchain.embeddings import HuggingFaceEmbeddings, HuggingFaceInferenceAPIEmbeddings, HuggingFaceInstructEmbeddings
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS, Chroma
from langchain.llms import HuggingFaceHub
from langchain.chains import RetrievalQA, LLMChain
import gradio as gr  # Import Gradio

from google.colab import drive
drive.mount('/content/drive')

reader = PdfReader('/content/drive/MyDrive/CPGRAMS QNA2.pdf')

raw_text = ''
for i, page in enumerate(reader.pages):
  text = page.extract_text()
  if text:
    raw_text += text

text_splitter = CharacterTextSplitter(separator = '\n',
                      chunk_size = 1000, chunk_overlap = 0, length_function = len,)
texts = text_splitter.split_text(raw_text)

HF_token = getpass()
os.environ["HUGGINGFACEHUB_API_TOKEN"] = HF_token

embeddings = HuggingFaceInferenceAPIEmbeddings(api_key = HF_token, MODEL_NAME = 'BAAI/bge-base-en-v1.5')

db = FAISS.from_texts(texts, embeddings)

llm = HuggingFaceHub(
    repo_id="google/flan-t5-xxl",
    model_kwargs={"temperature": 0.5, "max_length": 64,"max_new_tokens":512}
)

from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(llm, chain_type = 'stuff')

# Define a function to answer questions
def answer_question(query):
    search = db.similarity_search(query)
    return chain.run(input_documents = search, question = query)

# Create a Gradio interface
iface = gr.Interface(fn=answer_question, inputs="text", outputs="text")

# Launch the interface
iface.launch()


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
··········
Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://47aa9f5b70023104f4.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


