#### Faiss
Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.

In [3]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader=TextLoader("suraj.txt")
documents=loader.load()
text_splitter=CharacterTextSplitter(chunk_size=500,chunk_overlap=30)
docs=text_splitter.split_documents(documents)


Created a chunk of size 576, which is longer than the specified 500
Created a chunk of size 713, which is longer than the specified 500
Created a chunk of size 873, which is longer than the specified 500
Created a chunk of size 638, which is longer than the specified 500


In [4]:
docs

[Document(metadata={'source': 'suraj.txt'}, page_content='Suraj is a seasoned AI/ML Engineer with over four years of hands-on experience designing, developing, and deploying intelligent systems across a spectrum of industries, including healthcare, enterprise automation, and 3D environment reconstruction. With a strong foundation in both traditional machine learning and cutting-edge deep learning techniques, he has consistently delivered robust, scalable, and production-ready AI solutions that drive tangible business impact.'),
 Document(metadata={'source': 'suraj.txt'}, page_content='He began his career at Resolute AI Software, where he architected real-time AI services using OpenCV and LSTMs to detect anomalous user behavior in video feeds—drastically reducing manual monitoring workloads. At the same time, he spearheaded initiatives to modernize ML workflows by migrating legacy models to AWS SageMaker, implementing RESTful model endpoints using API Gateway and Lambda, and leveraging 

In [5]:
embeddings=OllamaEmbeddings(model="llama3.2")
db=FAISS.from_documents(docs,embeddings)
db

  embeddings=OllamaEmbeddings(model="llama3.2")


<langchain_community.vectorstores.faiss.FAISS at 0x1067e4370>

In [7]:
### querying 
query="List the companies he worked on"
docs=db.similarity_search(query)
docs[0].page_content


'He began his career at Resolute AI Software, where he architected real-time AI services using OpenCV and LSTMs to detect anomalous user behavior in video feeds—drastically reducing manual monitoring workloads. At the same time, he spearheaded initiatives to modernize ML workflows by migrating legacy models to AWS SageMaker, implementing RESTful model endpoints using API Gateway and Lambda, and leveraging ensemble learning for predictive analytics in sales and customer churn domains. His work laid a strong foundation in MLOps, cloud-native deployment, and operational AI.'

#### As a Retriever
We can also convert the vectorstore into a Retriever class. This allows us to easily use it in other LangChain methods, which largely work with retrievers

In [8]:
retriever=db.as_retriever()
docs=retriever.invoke(query)
docs[0].page_content

'He began his career at Resolute AI Software, where he architected real-time AI services using OpenCV and LSTMs to detect anomalous user behavior in video feeds—drastically reducing manual monitoring workloads. At the same time, he spearheaded initiatives to modernize ML workflows by migrating legacy models to AWS SageMaker, implementing RESTful model endpoints using API Gateway and Lambda, and leveraging ensemble learning for predictive analytics in sales and customer churn domains. His work laid a strong foundation in MLOps, cloud-native deployment, and operational AI.'

#### Similarity Search with score
There are some FAISS specific methods. One of them is similarity_search_with_score, which allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance. Therefore, a lower score is better.

In [9]:
docs_and_score=db.similarity_search_with_score(query)
docs_and_score

[(Document(id='dff8ab54-f7cc-487c-a92b-8b27ff030a20', metadata={'source': 'suraj.txt'}, page_content='He began his career at Resolute AI Software, where he architected real-time AI services using OpenCV and LSTMs to detect anomalous user behavior in video feeds—drastically reducing manual monitoring workloads. At the same time, he spearheaded initiatives to modernize ML workflows by migrating legacy models to AWS SageMaker, implementing RESTful model endpoints using API Gateway and Lambda, and leveraging ensemble learning for predictive analytics in sales and customer churn domains. His work laid a strong foundation in MLOps, cloud-native deployment, and operational AI.'),
  np.float32(7678.937)),
 (Document(id='5aa2024a-96eb-44d7-9dee-e44f10222053', metadata={'source': 'suraj.txt'}, page_content='Suraj holds a Master’s degree in Applied Artificial Intelligence from Stevens Institute of Technology. He is AWS Certified and has completed a specialization in Generative AI with LLMs. Passi

In [11]:
embedding_vector=embeddings.embed_query(query)
embedding_vector

[-1.3596292734146118,
 -1.5510677099227905,
 1.6135107278823853,
 -1.2145380973815918,
 0.22306373715400696,
 -0.743732750415802,
 0.4384339153766632,
 0.17169876396656036,
 0.46322980523109436,
 0.39545387029647827,
 0.7668346762657166,
 -0.6951468586921692,
 1.7802459001541138,
 0.6512146592140198,
 -0.06905478984117508,
 0.23536764085292816,
 0.11343193054199219,
 -1.4829609394073486,
 0.45334944128990173,
 -0.5792098641395569,
 -1.055586576461792,
 -1.8453829288482666,
 1.7883820533752441,
 0.40555527806282043,
 1.0911856889724731,
 -1.1419998407363892,
 2.2206826210021973,
 -2.230595827102661,
 1.9162901639938354,
 0.3997955322265625,
 0.32253432273864746,
 1.7801663875579834,
 0.5828211307525635,
 0.4423215687274933,
 -1.455322504043579,
 0.7580912113189697,
 0.7411032915115356,
 1.2661795616149902,
 -1.724794864654541,
 -0.512087881565094,
 -1.3300725221633911,
 1.2593400478363037,
 0.2418956309556961,
 0.2187410593032837,
 -1.7577353715896606,
 -0.7099931836128235,
 2.385691881

In [12]:
docs_score=db.similarity_search_by_vector(embedding_vector)
docs_score

[Document(id='dff8ab54-f7cc-487c-a92b-8b27ff030a20', metadata={'source': 'suraj.txt'}, page_content='He began his career at Resolute AI Software, where he architected real-time AI services using OpenCV and LSTMs to detect anomalous user behavior in video feeds—drastically reducing manual monitoring workloads. At the same time, he spearheaded initiatives to modernize ML workflows by migrating legacy models to AWS SageMaker, implementing RESTful model endpoints using API Gateway and Lambda, and leveraging ensemble learning for predictive analytics in sales and customer churn domains. His work laid a strong foundation in MLOps, cloud-native deployment, and operational AI.'),
 Document(id='5aa2024a-96eb-44d7-9dee-e44f10222053', metadata={'source': 'suraj.txt'}, page_content='Suraj holds a Master’s degree in Applied Artificial Intelligence from Stevens Institute of Technology. He is AWS Certified and has completed a specialization in Generative AI with LLMs. Passionate about solving complex

In [13]:
### Saving And Loading
db.save_local("faiss_index")

In [14]:
new_db=FAISS.load_local("faiss_index",embeddings,allow_dangerous_deserialization=True)
docs=new_db.similarity_search(query)

In [15]:
docs

[Document(id='dff8ab54-f7cc-487c-a92b-8b27ff030a20', metadata={'source': 'suraj.txt'}, page_content='He began his career at Resolute AI Software, where he architected real-time AI services using OpenCV and LSTMs to detect anomalous user behavior in video feeds—drastically reducing manual monitoring workloads. At the same time, he spearheaded initiatives to modernize ML workflows by migrating legacy models to AWS SageMaker, implementing RESTful model endpoints using API Gateway and Lambda, and leveraging ensemble learning for predictive analytics in sales and customer churn domains. His work laid a strong foundation in MLOps, cloud-native deployment, and operational AI.'),
 Document(id='5aa2024a-96eb-44d7-9dee-e44f10222053', metadata={'source': 'suraj.txt'}, page_content='Suraj holds a Master’s degree in Applied Artificial Intelligence from Stevens Institute of Technology. He is AWS Certified and has completed a specialization in Generative AI with LLMs. Passionate about solving complex