# Exercie 3: DOCUMENT RETRIEVAL IMPLEMENTATION USING FAISS

# **Description:**
### The objective of this assignment is to design and implement a document retrieval system using FAISS (Facebook AI Similarity Search) algorithm to retrieve relevant chunks or passages from a given PDF document based on user queries.

# **Installing all the dependencies**

In [1]:
!pip install langchain

Collecting langchain
  Downloading langchain-0.1.11-py3-none-any.whl (807 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/807.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m235.5/807.5 kB[0m [31m6.8 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m807.5/807.5 kB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langchain-community<0.1,>=0.0.25 (from langchain)
  Downloading langchain_community-0.0.27-py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m52.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-core<0.2,>=0.1.29 (from langchain)
  Downloa

In [2]:
!pip install pypdf

Collecting pypdf
  Downloading pypdf-4.1.0-py3-none-any.whl (286 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/286.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m276.5/286.1 kB[0m [31m8.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m286.1/286.1 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-4.1.0


In [3]:
!pip install sentence-transformers

Collecting sentence-transformers
  Downloading sentence_transformers-2.5.1-py3-none-any.whl (156 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m156.5/156.5 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: sentence-transformers
Successfully installed sentence-transformers-2.5.1


In [4]:
!pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m50.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: faiss-cpu
Successfully installed faiss-cpu-1.8.0


# **Importing all the necessary packages**

In [5]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# **Code Implementation**

### **Loading PDF file**

In [26]:
# Loading the PDF 'sample' by using PyPDFLoader
loader = PyPDFLoader("sample.pdf")
documents = loader.load()

### **Splitting the PDF**

In [27]:
# Splitting the PDF into chunks by uding CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=30, separator="\n")
docs = text_splitter.split_documents(documents=documents)

### **Embedding and Indexing the document**

In [28]:
# Converting chunks into embeddings
embeddings = HuggingFaceEmbeddings()

# Creating FAISS index from chunks embeddings
vectorstore = FAISS.from_documents(docs, embeddings)
vectorstore.save_local("faiss_index_constitution")

### **Perform Similarity Search**

In [23]:
# Defining query string
query = "Player avatar and dialog with non-player characters based on generative AI"

# Performing similarity search using FAISS index
docs = vectorstore.similarity_search(query)

### **printing the result**

In [25]:
print(docs[0].page_content)

players  can exert greater control over game scripts and characters,  customize  the game’s appearance,  understand  the emotional  states 
of characters,  analyze  the game more eﬃciently,  and optimize  both game diﬃculty  and the overall player experience.  Additionally,  
developers  can leverage  generative  AI to simulate  player behavior  and add new content,  resulting  in an improved  gaming  experience.  
According  to Ramirez  Gomez and Lankes (2021) [37] , the use of generative  AI in player avatars  provides  game developers  with 
exciting  opportunities  to create more immersive  game worlds and oﬀer players  a more thrilling  gaming  experience.  
Dobre et al. (2022) [38] highlighted  in their study that generative  AI has the potential  to enhance  the development  of more realistic  
non-player  characters  (NPCs) for video games. Through  the implementation  of machine  learning  techniques,  game developers  can
