<img src="https://drive.google.com/uc?export=view&id=1wYSMgJtARFdvTt5g7E20mE4NmwUFUuog" width="200">

[![Build Fast with AI](https://img.shields.io/badge/BuildFastWithAI-GenAI%20Bootcamp-blue?style=for-the-badge&logo=artificial-intelligence)](https://www.buildfastwithai.com/genai-course)
[![EduChain GitHub](https://img.shields.io/github/stars/satvik314/educhain?style=for-the-badge&logo=github&color=gold)](https://github.com/satvik314/educhain)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1FT4vu-qmBPZik5kuu0ENH8LDZcA6Gd3g?usp=sharing)
## Master Generative AI in 6 Weeks
**What You'll Learn:**
- Build with Latest LLMs
- Create Custom AI Apps
- Learn from Industry Experts
- Join Innovation Community
Transform your AI ideas into reality through hands-on projects and expert mentorship.
[Start Your Journey](https://www.buildfastwithai.com/genai-course)
*Empowering the Next Generation of AI Innovators


## 🧠 FAISS: Efficient Similarity Search and Clustering Library

FAISS (Facebook AI Similarity Search) is an open-source library for **efficient similarity search and clustering of dense vectors**, enabling fast retrieval and storage of large-scale vector data. 🚀

🔑 **Key Features**:
- ⚡ High-performance vector search and clustering algorithms.
- 📊 Scalable to billions of vectors, even those too large to fit into memory.
- 💻 Optimized for both CPU and GPU, making it ideal for large-scale applications.


###**Setup and Installation**

In [None]:
!pip install -qU langchain-community faiss-cpu langchain_openai

In [None]:
from google.colab import userdata
import os

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")


### **Create Vector Store with FAISS**


In [None]:
import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

index = faiss.IndexFlatL2(len(embeddings.embed_query("hello world")))

vector_store = FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

### **Add Documents to Vector Store**


In [None]:
from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocalate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={"source": "news"},
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={"source": "tweet"},
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={"source": "website"},
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={"source": "website"},
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={"source": "tweet"},
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={"source": "news"},
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)

['394b6f0e-0713-49b2-99ce-678cfec60d1a',
 'f0c32d8d-a40e-46ca-b3e6-b4eed9320df8',
 '8346a614-6190-43fc-b6bd-9f9caf22a146',
 '48ed56b8-9f15-4d8a-ae43-21e110f04848',
 'ae1838cd-31ca-4d1e-b197-5b5bdffac3a9',
 'babaee1c-f03b-49b1-9b4e-cfa59df0cd20',
 'c32049f1-252b-4917-9ce0-940ac44d370e',
 '1de2c16d-f022-49b5-8f17-5cd5df420ce4',
 '4370d7b2-4b86-4a08-9d1f-5dbe94300598',
 '82ee146f-894e-4eab-b060-40902e28dca2']

### **Delete Document from Vector Store**


In [None]:
vector_store.delete(ids=[uuids[-1]])

True

### **Direct Query with Similarity Search**


In [None]:
results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=2,
    filter={"source": "tweet"},
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]


### **Similarity Search with Score**


In [None]:
results = vector_store.similarity_search_with_score(
    "Will it be hot tomorrow?", k=1, filter={"source": "news"}
)
for res, score in results:
    print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")

* [SIM=0.893661] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]


###**Query by turning into retriever**

In [None]:
retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})
retriever.invoke("Stealing from the bank is a crime", filter={"source": "news"})

[Document(id='48ed56b8-9f15-4d8a-ae43-21e110f04848', metadata={'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]

###**Saving and loading**

In [None]:
vector_store.save_local("faiss_index")

new_vector_store = FAISS.load_local(
    "faiss_index", embeddings, allow_dangerous_deserialization=True
)

docs = new_vector_store.similarity_search("qux")

In [None]:
docs[0]

Document(id='8346a614-6190-43fc-b6bd-9f9caf22a146', metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!')

###**Merging**

In [None]:
db1 = FAISS.from_texts(["foo"], embeddings)
db2 = FAISS.from_texts(["bar"], embeddings)

db1.docstore._dict

{'ec568d45-3e04-4045-a187-3a3c10ca3e1a': Document(id='ec568d45-3e04-4045-a187-3a3c10ca3e1a', metadata={}, page_content='foo')}

In [None]:
db2.docstore._dict

{'c742a0fa-ad17-443f-8bd0-332eb7e25572': Document(id='c742a0fa-ad17-443f-8bd0-332eb7e25572', metadata={}, page_content='bar')}

In [None]:
db1.merge_from(db2)

In [None]:
db1.docstore._dict

{'ec568d45-3e04-4045-a187-3a3c10ca3e1a': Document(id='ec568d45-3e04-4045-a187-3a3c10ca3e1a', metadata={}, page_content='foo'),
 'c742a0fa-ad17-443f-8bd0-332eb7e25572': Document(id='c742a0fa-ad17-443f-8bd0-332eb7e25572', metadata={}, page_content='bar')}