# FAISS - VectorDB Developed by Facebook Team

Just like ChromaDB, FAISS is a Local Vector DB, unlike Pinecone that is a cloud VectorDB

In [3]:
# !pip -q install langchain
# !pip install pypdf
# !pip install sentence-transformers==2.2.2

In [5]:
# !pip install openai
# !pip install tiktoken
# !pip install faiss-cpu

##### Load the Documents

In [6]:
from langchain.document_loaders import PyPDFDirectoryLoader
loader = PyPDFDirectoryLoader("pdfs")
data = loader.load()

In [12]:
len(data)

290

#### Perform Chunking

In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [9]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
text_chunks = text_splitter.split_documents(data)

In [11]:
len(text_chunks)

1250

In [16]:
print(text_chunks[2].page_content)

much ear wax. 
These eclectic exercises will help you build a flexible and 
robust framework for working through a wide range of 
challenges, from truly grokking current events to handling 
the daily surprises of the business world. 
You‘ll learn how to:
• Calculate distributions to see the range of your beliefs
• Compare hypotheses and draw reliable conclusions • Calculate Bayes’ theorem and understand what it’s 
useful for
• Find the posterior, likelihood, and prior to check the


#### Create Embedding from the chunks

In [17]:
from langchain.embeddings import HuggingFaceEmbeddings

In [18]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

#### Now Load the VectorDB to store the Embeddings

In [19]:
from langchain.vectorstores import FAISS

vectorstore = FAISS.from_documents(text_chunks, embeddings)

In [24]:
query = "What is Bayesian model?"

In [25]:
docs = vectorstore.similarity_search(query, k=3)

In [26]:
for doc in docs:
    print(doc.page_content)

bayesian statistics the fun Way
One particularly nice thing about Bayesian statistics is that, because we 
can view it simply as reasoning about uncertain things, all of the tools and techniques of Bayesian statistics make intuitive sense.
Bayesian statistics is about looking at a problem you face, figuring out
1
bayesian t hinkinG  an D 
eve RyDay Reas Onin G
In this first chapter, I’ll give you an over -
view of Bayesian reasoning , the formal pro -
cess we use to update our beliefs about the 
world once we’ve observed some data. We’ll 
work through a scenario and explore how we can map 
our everyday experience to Bayesian reasoning. 
The good news is that you were already a Bayesian even before you 
picked up this book! Bayesian statistics is closely aligned with how people


In [29]:
from dotenv import load_dotenv
load_dotenv() # Take Enviroment variables from .env
import os

OPENAI_API_KEY=os.getenv("OPENAI_API_KEY")


In [34]:
from langchain.llms import OpenAI
llm = OpenAI()

from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(llm = llm, chain_type='stuff', retriever = vectorstore.as_retriever())

In [35]:
query = "What is Bayesian model?"

print(qa.run(query))

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


 Bayesian model is a hypothesis or belief about how the world works that makes predictions based on the observed data. It is a key component of Bayesian statistics, which is a method of reasoning about uncertain things.
