### FAISS

Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.

In [1]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

In [2]:
loader = TextLoader("data/about.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=50,chunk_overlap=10)

In [3]:
docs = text_splitter.split_documents(documents)
docs

Created a chunk of size 762, which is longer than the specified 50
Created a chunk of size 825, which is longer than the specified 50
Created a chunk of size 361, which is longer than the specified 50
Created a chunk of size 639, which is longer than the specified 50
Created a chunk of size 129, which is longer than the specified 50


[Document(metadata={'source': 'data/about.txt'}, page_content="Cricket is a bat-and-ball game that is played between two teams of eleven players on a field, at the centre of which is a 22-yard (20-metre; 66-foot) pitch with a wicket at each end, each comprising two bails (small sticks) balanced on three stumps. Two players from the batting team, the striker and nonstriker, stand in front of either wicket holding bats, while one player from the fielding team, the bowler, bowls the ball toward the striker's wicket from the opposite end of the pitch. The striker's goal is to hit the bowled ball with the bat and then switch places with the nonstriker, with the batting team scoring one run for each of these swaps. Runs are also scored when the ball reaches the boundary of the field or when the ball is bowled illegally."),
 Document(metadata={'source': 'data/about.txt'}, page_content='The fielding team aims to prevent runs by dismissing batters (so they are "out"). Dismissal can occur in var

In [4]:
embeddings = OllamaEmbeddings(model="gemma:2b")
db = FAISS.from_documents(documents,embeddings)
db

  embeddings = OllamaEmbeddings(model="gemma:2b")


<langchain_community.vectorstores.faiss.FAISS at 0x22eff72a0f0>

In [5]:
query = "Who played cricket more? Men or Women?"
docs = db.similarity_search(query)
docs[0].page_content

'Cricket is a bat-and-ball game that is played between two teams of eleven players on a field, at the centre of which is a 22-yard (20-metre; 66-foot) pitch with a wicket at each end, each comprising two bails (small sticks) balanced on three stumps. Two players from the batting team, the striker and nonstriker, stand in front of either wicket holding bats, while one player from the fielding team, the bowler, bowls the ball toward the striker\'s wicket from the opposite end of the pitch. The striker\'s goal is to hit the bowled ball with the bat and then switch places with the nonstriker, with the batting team scoring one run for each of these swaps. Runs are also scored when the ball reaches the boundary of the field or when the ball is bowled illegally.\n\nThe fielding team aims to prevent runs by dismissing batters (so they are "out"). Dismissal can occur in various ways, including being bowled (when the ball hits the striker\'s wicket and dislodges the bails), and by the fielding s

### Vector store as a retriever

We can also convert the vector store into a retriever class. This allows us to easily use it in other LangChain methods, which largely work with retrievers.

In [6]:
retriever = db.as_retriever()
docs = retriever.invoke(query)
docs[0].page_content

'Cricket is a bat-and-ball game that is played between two teams of eleven players on a field, at the centre of which is a 22-yard (20-metre; 66-foot) pitch with a wicket at each end, each comprising two bails (small sticks) balanced on three stumps. Two players from the batting team, the striker and nonstriker, stand in front of either wicket holding bats, while one player from the fielding team, the bowler, bowls the ball toward the striker\'s wicket from the opposite end of the pitch. The striker\'s goal is to hit the bowled ball with the bat and then switch places with the nonstriker, with the batting team scoring one run for each of these swaps. Runs are also scored when the ball reaches the boundary of the field or when the ball is bowled illegally.\n\nThe fielding team aims to prevent runs by dismissing batters (so they are "out"). Dismissal can occur in various ways, including being bowled (when the ball hits the striker\'s wicket and dislodges the bails), and by the fielding s

### Similarity search with score

There are some FAISS specific methods. One of them is similarity_search_with_score, which allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance. Therefore, a lower score is better.

In [7]:
docs_and_score = db.similarity_search_with_score(query)
docs_and_score

[(Document(id='0375ab1e-c876-459b-ae37-1e96e604c7c2', metadata={'source': 'data/about.txt'}, page_content='Cricket is a bat-and-ball game that is played between two teams of eleven players on a field, at the centre of which is a 22-yard (20-metre; 66-foot) pitch with a wicket at each end, each comprising two bails (small sticks) balanced on three stumps. Two players from the batting team, the striker and nonstriker, stand in front of either wicket holding bats, while one player from the fielding team, the bowler, bowls the ball toward the striker\'s wicket from the opposite end of the pitch. The striker\'s goal is to hit the bowled ball with the bat and then switch places with the nonstriker, with the batting team scoring one run for each of these swaps. Runs are also scored when the ball reaches the boundary of the field or when the ball is bowled illegally.\n\nThe fielding team aims to prevent runs by dismissing batters (so they are "out"). Dismissal can occur in various ways, includ

### Similarity score by vector

In [8]:
embedding_vector = embeddings.embed_query(query)
embedding_vector

[0.23790918290615082,
 -1.5517252683639526,
 -0.07741346210241318,
 0.36311104893684387,
 0.8644593358039856,
 3.2688074111938477,
 1.303024411201477,
 0.16313911974430084,
 -1.2841843366622925,
 -0.6574084758758545,
 1.3112001419067383,
 0.9868209958076477,
 -0.6499816179275513,
 1.358393907546997,
 1.9309133291244507,
 -0.7299582958221436,
 0.3698870539665222,
 0.09775909036397934,
 0.04736475273966789,
 0.5660082697868347,
 0.7539756298065186,
 0.42710864543914795,
 -0.4066932201385498,
 -0.19984926283359528,
 -1.735933542251587,
 0.9603045582771301,
 0.002421056618914008,
 -0.42364779114723206,
 -0.13951128721237183,
 -2.3117032051086426,
 -0.5373821258544922,
 -0.28293463587760925,
 0.45096126198768616,
 -0.25114428997039795,
 0.5255200862884521,
 1.0571997165679932,
 0.9407007694244385,
 0.8557592630386353,
 0.14633306860923767,
 -1.1089304685592651,
 1.1210342645645142,
 0.9498705863952637,
 0.24565719068050385,
 -1.1346949338912964,
 -0.46458831429481506,
 -0.3274490535259247,


In [9]:
docs_and_score = db.similarity_search_by_vector(embedding_vector)
docs_and_score

[Document(id='0375ab1e-c876-459b-ae37-1e96e604c7c2', metadata={'source': 'data/about.txt'}, page_content='Cricket is a bat-and-ball game that is played between two teams of eleven players on a field, at the centre of which is a 22-yard (20-metre; 66-foot) pitch with a wicket at each end, each comprising two bails (small sticks) balanced on three stumps. Two players from the batting team, the striker and nonstriker, stand in front of either wicket holding bats, while one player from the fielding team, the bowler, bowls the ball toward the striker\'s wicket from the opposite end of the pitch. The striker\'s goal is to hit the bowled ball with the bat and then switch places with the nonstriker, with the batting team scoring one run for each of these swaps. Runs are also scored when the ball reaches the boundary of the field or when the ball is bowled illegally.\n\nThe fielding team aims to prevent runs by dismissing batters (so they are "out"). Dismissal can occur in various ways, includi

### Saving and loading from local

In [10]:
db.save_local("faiss_index1")

In [13]:
new_db = FAISS.load_local("faiss_index1",embeddings,allow_dangerous_deserialization=True)
docs = new_db.similarity_search(query)
docs

[Document(id='0375ab1e-c876-459b-ae37-1e96e604c7c2', metadata={'source': 'data/about.txt'}, page_content='Cricket is a bat-and-ball game that is played between two teams of eleven players on a field, at the centre of which is a 22-yard (20-metre; 66-foot) pitch with a wicket at each end, each comprising two bails (small sticks) balanced on three stumps. Two players from the batting team, the striker and nonstriker, stand in front of either wicket holding bats, while one player from the fielding team, the bowler, bowls the ball toward the striker\'s wicket from the opposite end of the pitch. The striker\'s goal is to hit the bowled ball with the bat and then switch places with the nonstriker, with the batting team scoring one run for each of these swaps. Runs are also scored when the ball reaches the boundary of the field or when the ball is bowled illegally.\n\nThe fielding team aims to prevent runs by dismissing batters (so they are "out"). Dismissal can occur in various ways, includi