# Vector Space

- Documentation For Avaliable VectorStore DB in langchain
- https://python.langchain.com/v0.2/docs/integrations/vectorstores/

In [2]:
import os
from dotenv import load_dotenv
load_dotenv()

True

## FAISS DB

## 🟢 Faiss
<div dir='rtl'>

**Faiss (Facebook AI Similarity Search)** هي مكتبة مفتوحة المصدر طورتها شركة Facebook AI، وتُستخدم لإجراء البحث الكفء عن التشابه (Similarity Search) والتجميع (Clustering) بين المتجهات الكثيفة (Dense Vectors).

### ✅ ما هي Faiss؟
- مكتبة متخصصة في البحث السريع عن المتجهات المتشابهة داخل مجموعات ضخمة من البيانات.
- تدعم البحث حتى مع مجموعات بيانات لا يمكن تحميلها بالكامل في الذاكرة (RAM).
- مصممة للتعامل مع بيانات عالية الأبعاد (High-dimensional vectors).

### ✅ الاستخدامات:
- البحث الدلالي (Semantic Search).
- استرجاع المعلومات (Information Retrieval).
- محركات التوصية (Recommendation Systems).
- تطبيقات البحث عن الصور والفيديوهات المشابهة.

### ✅ المزايا:
- كفاءة عالية جدًا في الأداء حتى مع بيانات ضخمة.
- توفر خوارزميات بحث متنوعة (Flat, IVFFlat, HNSW, وغيرها).
- دعم للمعالجة على الـ CPU و GPU.
- أدوات مساعدة للتقييم وضبط المعاملات (Parameter Tuning).

### ✅ مثال مبسط:
- عند تحويل النصوص إلى متجهات (Embeddings)، يمكن استخدام Faiss للبحث عن أقرب المتجهات (Nearest Neighbors).
- تستخدم كثيرًا مع نماذج التضمين مثل OpenAI Embeddings أو HuggingFace models.

### ✅ موارد:
- الموقع الرسمي: [https://faiss.ai/](https://faiss.ai/)
- المستودع على GitHub: [https://github.com/facebookresearch/faiss](https://github.com/facebookresearch/faiss)


In [5]:
from langchain_community.vectorstores import FAISS # we need to install faiss libraries ex(faiss-cpu if you need to use cpu, faiss-gpu if you need to use gpu)
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

# load data
loader = TextLoader('speech.txt')
documents = loader.load()

# text splitting
text_splitter = CharacterTextSplitter(chunk_size = 1000, chunk_overlap = 30)
docs = text_splitter.split_documents(documents)

# Embedding
embedding = OllamaEmbeddings(model = 'mxbai-embed-large')

# vectore store
db = FAISS.from_documents(docs, embedding)
db

<langchain_community.vectorstores.faiss.FAISS at 0x28263885490>

In [7]:
# Now We Can Retrieve From DB (query)
query = "How does the speaker describe the desired outcome of the war?"
similarity = db.similarity_search(query) # get vector similar to query content
similarity

[Document(id='5e7ab3d1-8e2f-4530-bdb9-048de6cd51aa', metadata={'source': 'speech.txt'}, page_content='It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.'),
 Document(id='77e4f324-22f9-48be-a1b0-39793e3f2f5d', metadata={'source': 'speech.txt'}, page_content='…\n

- رجعهم مترتبين الاول اكثر تشابه والاخير اقل تشابه

In [8]:
similarity[0].page_content

'It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.'

<div dir='rtl'>

## 🟢 استخدام Vectorstore كـ Retriever

في LangChain، يمكن تحويل **Vectorstore** إلى **Retriever**، مما يسهل استخدامه داخل باقي مكونات LangChain التي تعتمد في الأساس على الـ **Retrievers**.

### ✅ ما هو الـ Retriever؟
- هو كائن وظيفته استرجاع المستندات (Documents) الأكثر تشابهًا مع استعلام معين (Query).
- يعمل فوق الـ Vectorstore ليقوم بعمليات البحث بناءً على التشابه بين المتجهات (Vector Similarity).

### ✅ لماذا نحوله إلى Retriever؟
- لتوحيد واجهة الاستخدام مع باقي مكونات LangChain.
- معظم أدوات LangChain مثل الـ Chains و Agents تتكامل بشكل مباشر مع الـ Retriever.
- يسهل تضمينه في عمليات الاسترجاع الدلالي (Semantic Search) أو أنظمة سؤال وجواب (Question Answering).

### ✅ كيف يعمل؟
- عند استقبال استعلام نصي، يقوم بتحويله إلى متجه (Embedding).
- يبحث في الـ Vectorstore عن أقرب المتجهات (Nearest Vectors).
- يعيد المستندات الأكثر صلة بالاستعلام.

### ✅ الخلاصة:
- **Vectorstore = قاعدة بيانات المتجهات.**
- **Retriever = طبقة استرجاع فوق Vectorstore تتكامل مع LangChain.**


In [None]:
# زي اللى فات بالظبط بس حولناه كدا 
# علشان نعرف نشغله مع باقى مكونات ال langchain بسهوله
retriever = db.as_retriever() 
similarity = retriever.invoke(query)
similarity[0].page_content

'It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.'

In [None]:
# لو عايز اخليه يرجع كمان ال score يعني مقياس التشابه لكل document 
# score => l2 distance يعني الاقل هو اكتر تشابه
docs_and_score = db.similarity_search_with_score(query)
docs_and_score

[(Document(id='5e7ab3d1-8e2f-4530-bdb9-048de6cd51aa', metadata={'source': 'speech.txt'}, page_content='It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.'),
  np.float32(204.15417)),
 (Document(id='77e4f324-22f9-48be-a1b0-39793e3f2f5d', metadata={'source': 'spe

In [12]:
# we can do similarity by vector direct
embedding_vector = embedding.embed_query(query)
embedding_vector

[0.5107460021972656,
 -0.39258888363838196,
 -0.28117015957832336,
 0.4477410912513733,
 -0.7403976321220398,
 -1.5128576755523682,
 -0.7013372778892517,
 0.338383287191391,
 0.046978987753391266,
 -0.13543474674224854,
 0.2859553098678589,
 0.05265656113624573,
 -0.3367568850517273,
 0.5879532694816589,
 -0.3939555287361145,
 -0.41364145278930664,
 -0.7216814160346985,
 -0.20970723032951355,
 0.256043016910553,
 0.04965982958674431,
 -0.03386341780424118,
 -0.40845733880996704,
 -0.6067925095558167,
 0.3827469050884247,
 -0.11284243315458298,
 0.27388694882392883,
 -0.5984520316123962,
 0.20384955406188965,
 1.1324933767318726,
 -0.11250436305999756,
 -0.1724998652935028,
 0.5426586270332336,
 -0.0407537966966629,
 -0.6853196620941162,
 0.4342827796936035,
 -0.6312527656555176,
 0.4277111887931824,
 -1.912022590637207,
 0.13513967394828796,
 -0.15915414690971375,
 0.2527821958065033,
 0.2892087697982788,
 0.5580652952194214,
 -0.4557502567768097,
 -1.4732164144515991,
 -0.282515704631

In [14]:
db.similarity_search_by_vector(embedding_vector)

[Document(id='5e7ab3d1-8e2f-4530-bdb9-048de6cd51aa', metadata={'source': 'speech.txt'}, page_content='It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.'),
 Document(id='77e4f324-22f9-48be-a1b0-39793e3f2f5d', metadata={'source': 'speech.txt'}, page_content='…\n

In [15]:
# لو عايز احفظ الداتا بيز دي على الجهاز علشان استخدمها بعد كدا على طول 
db.save_local('faiss_index')

In [18]:
# to load
# allow_dangerous_deserialization=True => علشان تديله صلاحية الوثوق
# لاحظ لازم نديله نوع ال embedding اللى بنحول بيه النص ل vector

new_df = FAISS.load_local('faiss_index', embedding, allow_dangerous_deserialization=True)
new_df

<langchain_community.vectorstores.faiss.FAISS at 0x2827db09b20>

In [19]:
new_df.similarity_search(query)

[Document(id='5e7ab3d1-8e2f-4530-bdb9-048de6cd51aa', metadata={'source': 'speech.txt'}, page_content='It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.'),
 Document(id='77e4f324-22f9-48be-a1b0-39793e3f2f5d', metadata={'source': 'speech.txt'}, page_content='…\n

## Chroma DB

## 🟢 Chroma
<div dir='rtl'>

**Chroma** هي قاعدة بيانات متجهات (Vector Database) مفتوحة المصدر ومصممة خصيصًا للتكامل مع تطبيقات الذكاء الاصطناعي، مع التركيز على راحة وسهولة استخدام المطورين.

### ✅ ما هي Chroma؟
- قاعدة بيانات متخصصة في تخزين وإدارة واسترجاع المتجهات (Embeddings).
- مبنية خصيصًا لتناسب تطبيقات الذكاء الاصطناعي الحديثة.
- مرخصة تحت رخصة **Apache 2.0** (مفتوحة المصدر).

### ✅ المميزات:
- سهلة الاستخدام وسريعة في الأداء.
- مصممة لزيادة إنتاجية المطورين وسهولة التكامل مع تطبيقات الذكاء الاصطناعي.
- تدعم البحث عن التشابه بين المتجهات (Similarity Search).
- تكامل سهل مع بيئة LangChain.

### ✅ حالات الاستخدام:
- البحث الدلالي (Semantic Search).
- أنظمة سؤال وجواب (QA Systems).
- استرجاع المستندات (Document Retrieval).
- دعم تطبيقات RAG (Retrieval Augmented Generation).

### ✅ روابط مفيدة:
- الموقع الرسمي: [https://docs.trychroma.com/](https://docs.trychroma.com/)
- المستودع على GitHub: [https://github.com/chroma-core/chroma](https://github.com/chroma-core/chroma)


In [20]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import  Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings

# Loader 
loader = TextLoader('speech.txt')
docs = loader.load()

# splitting
splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)
docs = splitter.split_documents(docs)

# embedding
embedding = OllamaEmbeddings(model = 'mxbai-embed-large')

# vector space 
db = Chroma.from_documents(docs, embedding)
db

<langchain_community.vectorstores.chroma.Chroma at 0x2827da74110>

In [21]:
query = "What does the speaker believe is the main reason the United States should enter the war?"
similarity = db.similarity_search(query)

In [22]:
# To Save on disk
# documents => ال docs بعد ال splitting
# embedding  => طريقة ال embedding المستخدمه
# persist_directory => المكان اللى هتحطه فيه
Chroma.from_documents(documents=docs,embedding=embedding,persist_directory="./chroma_db")


<langchain_community.vectorstores.chroma.Chroma at 0x2827e293d10>

In [23]:
# load from disk
# persist_directory => المكان اللى هي فيه
# embedding_function => طريقة ال Embedding المستخدمه
db2 = Chroma(persist_directory="./chroma_db", embedding_function=embedding)


  db2 = Chroma(persist_directory="./chroma_db", embedding_function=embedding)


In [25]:
db2.similarity_search_with_score(query)

[(Document(metadata={'source': 'speech.txt'}, page_content='It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy,'),
  227.50631713867188),
 (Document(metadata={'source': 'speech.txt'}, page_content='To such a task we can dedicate our lives and our fortunes, everything that we are and everything that we have, with the pride of those who know that the day has come when America is privileged to spend her blood and her might for the principles that gave her birth and happiness and the peace which she has treasured. God helping her, she can do no other.

In [26]:
# Convert to retriever 
retriever = db2.as_retriever()
retriever.invoke(query)

[Document(metadata={'source': 'speech.txt'}, page_content='It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy,'),
 Document(metadata={'source': 'speech.txt'}, page_content='To such a task we can dedicate our lives and our fortunes, everything that we are and everything that we have, with the pride of those who know that the day has come when America is privileged to spend her blood and her might for the principles that gave her birth and happiness and the peace which she has treasured. God helping her, she can do no other.'),
 Document(metadata={'