# POC - EPFO Question and Answer System
## Account (UAN) (creation, documents required, claims), KYC (procedure, update)
This is an end to end LLM project based on Google Palm and Langchain. In this project a question and answer system related to EPFO (Employee's Provident Fund Organization) is developed. EPFO is one of the World's largest Social Security Organisations in terms of clientele and the volume of financial transactions undertaken. In the developed project questions related to account (UAN) (creation, documents required, claims), KYC (procedure, update) etc. are tried to answered using google palm large language model.

## Project Architecture:
1. **CSV loading :** CSV loader from langchain document loader will load the csv question and answer file.
2. **Database questions embedding :** Questions from CSV question and answer file will be embedded using <u>huggingface embeding</u>.
3. **Vector Database :** Embedded questions and corresponding answers will be stored using <u>FAISS</u>.
4. **Creating a retrieval chain :** Using a <u>prompt template</u>  and <u>google palm api</u> retrieval chain will be prepared.

## Output:
Output will be an answer based on the input question. Following will happen in the background.
1. A question asked to the retrieval chain will try to find the similar questions from the vector database.
2. Corresponding answers from the vector database of the relevant questions from step 1 will be outputted nicely using google palm llm.

## Installing the required packages, modules and libraries

In [None]:
# Mounting google drive
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
# Installing all the required modules from requirements.txt
# langchain==0.0.284
# python-dotenv==1.0.0
# streamlit==1.22.0
# tiktoken==0.4.0
# faiss-cpu==1.7.4
# protobuf~=3.19.0

from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

!pip install -r requirements.txt

Saving requirements.txt to requirements.txt
User uploaded file "requirements.txt" with length 107 bytes
Collecting langchain==0.0.284 (from -r requirements.txt (line 1))
  Downloading langchain-0.0.284-py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m18.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting python-dotenv==1.0.0 (from -r requirements.txt (line 2))
  Downloading python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Collecting streamlit==1.22.0 (from -r requirements.txt (line 3))
  Downloading streamlit-1.22.0-py2.py3-none-any.whl (8.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.9/8.9 MB[0m [31m31.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tiktoken==0.4.0 (from -r requirements.txt (line 4))
  Downloading tiktoken-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m61.8 MB/s[0m eta [36m0:

In [None]:
# Restart the runtime once all the required installations are done

In [None]:
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.embeddings import HuggingFaceInstructEmbeddings

!pip install InstructorEmbedding
!pip install sentence_transformers
# Initialize instructor embeddings using the Hugging Face model
instructor_embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-large")

load INSTRUCTOR_Transformer
max_seq_length  512


In [None]:
from langchain.llms import GooglePalm
api_key = 'key from makersuit'

llm = GooglePalm(google_api_key=api_key, temperature=0.3)

## CSV Loading

In [None]:
# Load the data from EPFO faq's
from langchain.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path='/content/drive/MyDrive/Career/Applied AI/Case Studies/EPFO LLM Project/EPFO_FAQs.csv', encoding='unicode_escape', source_column="Question ")

# Store the loaded data in the 'data' variable
data = loader.load()

# correcting the rows as there are only specific number of questions
data=data[:41]

# lets check the read data
data

[Document(page_content='Question: What is Universal Account Number or UAN\nAnswer: UAN is 12-digit number provided to each member of EPFO. The UAN acts as an umbrella for the multiple Member IDs allotted to an individual. This number acts as a pivot to link multiple Member Identification Numbers (Member Id) allotted to a single member under single Universal Account Number. UAN duly seeded with KYC detail. This enables the member to avail various online services directly without the need for any intermediation by the employer.', metadata={'source': 'What is Universal Account Number or UAN', 'row': 0}),
 Document(page_content='Question: What is KYC\nAnswer: Know Your Customer or KYC is a one-time process which helps in identity verification of subscribers by linking UAN with KYC details. The Employees / Employers need to provide KYC details viz., Aadhaar, PAN, Bank etc., for unique identification of the employees enabling seamless online services.', metadata={'source': 'What is KYC', 'ro

## Creating a vector database and question embedding

In [None]:
# Create a FAISS instance for vector database from 'data'
vectordb = FAISS.from_documents(documents=data,embedding=instructor_embeddings)

# Create a retriever for querying the vector database
retriever = vectordb.as_retriever(score_threshold = 0.7)

In [None]:
# Sample question and corresponding searched questions in the vector database
rdocs = retriever.get_relevant_documents("What should I do if I change my job")
rdocs

[Document(page_content='Question: What is to be done in case I change the job and join somewhere else\nAnswer: You need to simply declare your UAN with your subsequent employer.', metadata={'source': 'What is to be done in case I change the job and join somewhere else', 'row': 33}),
 Document(page_content='Question: I have changed my job. Should I activate my UAN again\nAnswer: UAN has to be activated only once. You do not have to re-activate it every time you switch jobs.', metadata={'source': 'I have changed my job. Should I activate my UAN again', 'row': 16}),
 Document(page_content='Question: Two UAN allotted to me what should I do\nAnswer: In case two UAN are allotted to you, this could be because of not filling of Date of Exit by your previous employer in ECR filing and / or you have applied for transfer of service in your current establishment. In such a case, it is suggested to immediately report the matter to your employer and through email to EPFO (uanepf@epfindia.gov.in) by 

In [None]:
rdocs = retriever.get_relevant_documents("What is the procedure to change the password and can i link two mobile phones to a single account")
rdocs

[Document(page_content='Question: What to do if I forgot my password and my registered mobile with UAN has also changed\nAnswer: Please click on \x93Forgot Password\x94 at Member Interface of Unified Portal. Provide your UAN with CAPTCHA. System will ask whether OTP is to be sent on registered mobile or some other mobile. System will ask to enter your basic details (Name, DOB and Gender). After successful matching of basic details system will ask to provide your Aadhar or PAN. If KYC details are matched system will ask new mobile number and OTP will be sent to the new mobile. After successful verification of OTP, you can reset your password.', metadata={'source': 'What to do if I forgot my password and my registered mobile with UAN has also changed', 'row': 32}),
 Document(page_content='Question: How to change my UAN linked mobile number\nAnswer: After login into the Member Interface of Unified Portal, there is a provision in \x93Member Profile\x94 section to change your mobile number.

In [None]:
rdocs = retriever.get_relevant_documents("How to link an AADHAR with UAN")
rdocs

[Document(page_content='Question: What can I do if my UAN is not seeded with Aadhaar\nAnswer: Member can himself seed UAN with Aadhaar by visiting member portal. Thereafter the employer must approve the same to complete the linkage. Alternatively, member can ask his employer to link Aadhaar with UAN. The member can use \x93e-KYC Portal\x94 under Online Service available on home page of EPFO website or e-KYC service under EPFO in UMANG APP to link his/her UAN with Aadhaar without employer\x92s intervention.', metadata={'source': 'What can I do if my UAN is not seeded with Aadhaar', 'row': 10}),
 Document(page_content='Question: How can I seed my KYC details with UAN\nAnswer: o Login to your EPF account at the unified member portal o Click on the \x93KYC\x94 option in the \x93Manage\x94 section o You can select the details (PAN, Bank Account, Aadhar etc) which you want to link with UAN o Fill in the requisite fields o Now click on the \x93Save\x94 option o Your request will be displayed 

## Create RetrievalQA chain along with prompt template

In [None]:
from langchain.prompts import PromptTemplate

prompt_template = """Given the following context and a question, generate an answer based on this context only.
In the answer try to provide as much text as possible from "Answer" section in the source document context without making much changes.
If the answer is not found in the context, kindly state "I don't know." Don't try to make up an answer.

CONTEXT: {context}

QUESTION: {question}"""


PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)
chain_type_kwargs = {"prompt": PROMPT}


from langchain.chains import RetrievalQA

chain = RetrievalQA.from_chain_type(llm=llm,
                            chain_type="stuff",
                            retriever=retriever,
                            input_key="query",
                            return_source_documents=True,
                            chain_type_kwargs=chain_type_kwargs)

In [None]:
chain("What should I do if I change my job")

{'query': 'What should I do if I change my job',
 'result': 'You need to simply declare your UAN with your subsequent employer.',
 'source_documents': [Document(page_content='Question: What is to be done in case I change the job and join somewhere else\nAnswer: You need to simply declare your UAN with your subsequent employer.', metadata={'source': 'What is to be done in case I change the job and join somewhere else', 'row': 33}),
  Document(page_content='Question: I have changed my job. Should I activate my UAN again\nAnswer: UAN has to be activated only once. You do not have to re-activate it every time you switch jobs.', metadata={'source': 'I have changed my job. Should I activate my UAN again', 'row': 16}),
  Document(page_content='Question: Two UAN allotted to me what should I do\nAnswer: In case two UAN are allotted to you, this could be because of not filling of Date of Exit by your previous employer in ECR filing and / or you have applied for transfer of service in your curren

In [None]:
chain("What is the procedure to change the password")

{'query': 'What is the procedure to change the password',
 'result': '\nAnswer: Please click on “Change Password” at Member Interface of Unified Portal. Provide your UAN with CAPTCHA. System will send the OTP on your mobile which is seeded with UAN and you can reset the password.',
 'source_documents': [Document(page_content='Question: In which format I should create my UAN password\nAnswer: The password should be alphanumeric, have minimum 1 Special Character and 8 - 25 character long. Special characters are: ! @ # $ % ^ & * ( ) Sample Password : abc@1973', metadata={'source': 'In which format I should create my UAN password', 'row': 9}),
  Document(page_content='Question: What to do if I forgot my password\nAnswer: Please click on \x93Forgot Password\x94 at Member Interface of Unified Portal. Provide your UAN with CAPTCHA. System will send the OTP on your mobile which is seeded with UAN and you can reset the password.', metadata={'source': 'What to do if I forgot my password', 'row':

In [None]:
chain("How to link an AADHAR with UAN")

{'query': 'How to link an AADHAR with UAN',
 'result': 'Member can himself seed UAN with Aadhaar by visiting member portal. Thereafter the employer must approve the same to complete the linkage. Alternatively, member can ask his employer to link Aadhaar with UAN. The member can use “e-KYC Portal” under Online Service available on home page of EPFO website or e-KYC service under EPFO in UMANG APP to link his/her UAN with Aadhaar without employer’s intervention.',
 'source_documents': [Document(page_content='Question: What can I do if my UAN is not seeded with Aadhaar\nAnswer: Member can himself seed UAN with Aadhaar by visiting member portal. Thereafter the employer must approve the same to complete the linkage. Alternatively, member can ask his employer to link Aadhaar with UAN. The member can use \x93e-KYC Portal\x94 under Online Service available on home page of EPFO website or e-KYC service under EPFO in UMANG APP to link his/her UAN with Aadhaar without employer\x92s intervention.'

In [None]:
chain("What if I purchase a mobile phone, do I need to create an account")

{'query': 'What if I purchase a mobile phone, do I need to create an account',
 'result': "\nAnswer: I don't know.",
 'source_documents': [Document(page_content='Question: Can I apply online claim if my mobile is not linked with Aadhaar\nAnswer: No, you cannot submit online claim if your mobile is not linked with Aadhaar. At the time of claim submission, OTP is sent to Aadhaar linked mobile only.', metadata={'source': 'Can I apply online claim if my mobile is not linked with Aadhaar', 'row': 29}),
  Document(page_content='Question: What are the minimum details which are required to be linked with UAN for availing online services\nAnswer: Mobile, Aadhar and Bank account number.', metadata={'source': 'What are the minimum details which are required to be linked with UAN for availing online services', 'row': 21}),
  Document(page_content='Question: Can one mobile number be linked with multiple UANs\nAnswer: One mobile number can be used for registration with one UAN only.', metadata={'sou

## Observations:
1. For an asked questions, similar questions were able to find from vector database.
2. Multiple similar questions found for an asked questions. Multiple answers from these quesions were summarised nicely by llm.
3. For a question which is not related to the base question and answer document, retrieval chain provides answer I don't know.


## Saving the vecotr database for app development
Prepared vector database will be used as an input for app development. In the case of changes in question and answer or improving embedding procedure this vector database needs to be updated.

In [None]:
import pickle

In [None]:
# Lets save the vector database
pickle_path = '/content/drive/MyDrive/Career/Applied AI Course/Case Studies/EPFO LLM Project/'
pickle_out=open(pickle_path+'vectordb.pickle', 'wb')
pickle.dump(vectordb,pickle_out)
pickle_out.close()

In [None]:
# Lets save the retriever
pickle_path = '/content/drive/MyDrive/Career/Applied AI Course/Case Studies/EPFO LLM Project/'
pickle_out=open(pickle_path+'retriever.pickle', 'wb')
pickle.dump(retriever,pickle_out)
pickle_out.close()