## ***What is Vector stores?***
- A vector store is a system designed to store and retrieve data represented as numerical vectors.

### *Key Features*
- ***Store***-Ensure that vectors and their associated metadata are retrained, whether in memory for quick lookups or non-dis for durability and large-scale use
    * **in-memory** - RAM
    * **on-disk** - Hard-drive
- ***Similarity Search***
- ***Indexing***-Providing a DS or method that enables fast similarity searches on the high-dimensional vector(e.g. KNN)

    - Clustering concept to help first searching lik as **KNN** or any **clustering** algorithm work
    - e.g. **approximate nearest neighbor lookups**
- ***CRUD Operation***- Manage the lifecycle of data


### Use-cases
    1. Semantic Search
    2. RAG
    3. Recommender System
    4. Images/Multimedia Search

# ***Vector Store Vs Vector Database***
* **Vector Store** has tow main component and its work as `System`
    - storage
    - Retrieval
    - e.g. Chorma, FAISS
* **Vector Database** is `Vector store` + `Database Component`
    - Distributed
    - Backup and store
    - ACID trans
    - Concurrency
    - Auth
    - e.g. Pinecone, Milvus, Qdrant

A vector database is effectively a vector store with the extra database Features (e.g., Clustering, scaling, security, metadata, filtering, durability)

## Vector Stores in LnagChain

- Support Stores (e.g., FAISS, pinecone, Chroma, Qdrant, Eaviate, etc.)
- All database has common interface
- Easy to handling MEta data

## ***ChromaDB***

* Chroma is a lightweight, open-source vector database that is especially friendly for local development and small to medium scale production need.
* it a vector store features and also have some database features
* in chromaDB has `collection` which is RDBMS call as the `Table`

### Practice Data about BPL

In [1]:
from langchain.schema import Document

doc1 = Document(
    page_content="The 2023 BPL final was held at Sher-e-Bangla National Stadium. Comilla Victorians defeated Sylhet Strikers by 7 wickets.",
    metadata={"title": "BPL 2023 Final", "category": "Match Summary", "date": "2023-02-16"}
)

doc2 = Document(
    page_content="Shakib Al Hasan had an outstanding performance in the 2022 season, leading Fortune Barishal to the final.",
    metadata={"title": "Shakib's Performance in BPL 2022", "category": "Player Highlight", "date": "2022-02-17"}
)

doc3 = Document(
    page_content="BPL started in 2012 and is one of the top T20 leagues in South Asia. It is governed by the Bangladesh Cricket Board (BCB).",
    metadata={"title": "History of BPL", "category": "Background", "date": "2012-02-10"}
)

doc4 = Document(
    page_content="Tamim Iqbal scored a brilliant century for Khulna Tigers in the 2020 season, one of the top innings in BPL history.",
    metadata={"title": "Tamim Iqbal's Century in 2020", "category": "Match Highlight", "date": "2020-01-28"}
)

doc5 = Document(
    page_content="Mashrafe Mortaza is the most successful captain in BPL history, having led his team to multiple titles.",
    metadata={"title": "Mashrafe’s Captaincy Record", "category": "Player Stats", "date": "2021-01-18"}
)

doc6 = Document(
    page_content="Comilla Victorians have won the BPL title a record four times, making them the most successful team in the league.",
    metadata={"title": "Most Successful Team", "category": "Team Stats", "date": "2023-02-17"}
)

doc7 = Document(
    page_content="The BPL uses the draft system instead of an auction. Local and foreign players are picked before each season.",
    metadata={"title": "Player Draft System", "category": "Rules and Format", "date": "2022-12-01"}
)

doc8 = Document(
    page_content="The 2019 BPL season was rebranded as the Bangabandhu BPL in honor of Sheikh Mujibur Rahman.",
    metadata={"title": "Bangabandhu BPL 2019", "category": "Special Edition", "date": "2019-12-20"}
)

doc9 = Document(
    page_content="BPL 2024 is scheduled to start in January, with 7 teams participating and matches held across Dhaka, Chattogram, and Sylhet.",
    metadata={"title": "Upcoming BPL 2024", "category": "Upcoming Events", "date": "2024-01-05"}
)

doc10 = Document(
    page_content="Andre Russell is one of the top international stars in BPL, known for his explosive batting and death-over bowling.",
    metadata={"title": "Andre Russell in BPL", "category": "Foreign Players", "date": "2021-02-15"}
)



In [2]:
documents = [doc1, doc2, doc3, doc4, doc5, doc6, doc7, doc8, doc9, doc10]

In [4]:
doc2.metadata['title']

"Shakib's Performance in BPL 2022"

In [5]:
from langchain.vectorstores import Chroma
from langchain.embeddings import  HuggingFaceBgeEmbeddings

embeddings = HuggingFaceBgeEmbeddings(model_name = 'sentence-transformers/all-MiniLM-L6-v2')

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
vector_store = Chroma(
    embedding_function=embeddings,
    persist_directory='My-Vector-DB',
    collection_name='sample',
)

  vector_store = Chroma(


## Adding Documents

In [7]:
vector_store.add_documents(documents=documents)

  attn_output = torch.nn.functional.scaled_dot_product_attention(


['0da7bf83-2a17-4512-a444-b124b9ab5089',
 '72d12a26-00aa-4bbe-9a5b-0af934863c28',
 '92bb93fc-8071-4abd-9ffa-0f58b744f94c',
 'f656dc89-383d-4c9b-81b9-b32702cb8090',
 '260ca5ae-82b7-4bea-ad7d-80e938f6378f',
 'dc31ab4c-e53f-4394-8164-1d8344295eae',
 'a152b421-b85d-40c4-ac77-4e6bd3318d30',
 '51c5e7d7-3caf-4312-86f7-7f2a61a6329a',
 '66c0af74-7438-4146-9849-5b294a34b9e9',
 '95326341-c55b-4b1b-8e15-44c0f6d01c41']

In [10]:
vector_store.get(include=['embeddings', 'documents', 'metadatas'])

{'ids': ['0da7bf83-2a17-4512-a444-b124b9ab5089',
  '72d12a26-00aa-4bbe-9a5b-0af934863c28',
  '92bb93fc-8071-4abd-9ffa-0f58b744f94c',
  'f656dc89-383d-4c9b-81b9-b32702cb8090',
  '260ca5ae-82b7-4bea-ad7d-80e938f6378f',
  'dc31ab4c-e53f-4394-8164-1d8344295eae',
  'a152b421-b85d-40c4-ac77-4e6bd3318d30',
  '51c5e7d7-3caf-4312-86f7-7f2a61a6329a',
  '66c0af74-7438-4146-9849-5b294a34b9e9',
  '95326341-c55b-4b1b-8e15-44c0f6d01c41'],
 'embeddings': array([[-0.02689231,  0.02908844, -0.05808407, ..., -0.02871126,
         -0.02944018,  0.03548925],
        [ 0.04835459,  0.04547029, -0.03726059, ..., -0.05358697,
         -0.04143931, -0.04331527],
        [-0.01799518,  0.0371599 , -0.04485202, ..., -0.00150311,
          0.00068147,  0.0416664 ],
        ...,
        [-0.04388712,  0.06512021, -0.0073859 , ..., -0.03398222,
          0.02102759,  0.05663031],
        [-0.0406306 , -0.02024882, -0.02437574, ...,  0.0144648 ,
         -0.04356062,  0.0195601 ],
        [-0.01841543,  0.08995268, 

In [11]:
query = "Who is the top scorer in 2020 session"

In [12]:
vector_store.similarity_search(
    query=query,
    k=2
)

[Document(metadata={'category': 'Match Highlight', 'date': '2020-01-28', 'title': "Tamim Iqbal's Century in 2020"}, page_content='Tamim Iqbal scored a brilliant century for Khulna Tigers in the 2020 season, one of the top innings in BPL history.'),
 Document(metadata={'category': 'Player Highlight', 'date': '2022-02-17', 'title': "Shakib's Performance in BPL 2022"}, page_content='Shakib Al Hasan had an outstanding performance in the 2022 season, leading Fortune Barishal to the final.')]

In [17]:
vector_store.similarity_search_with_score(
    query=query,
    k=3
)

[(Document(metadata={'category': 'Match Highlight', 'date': '2020-01-28', 'title': "Tamim Iqbal's Century in 2020"}, page_content='Tamim Iqbal scored a brilliant century for Khulna Tigers in the 2020 season, one of the top innings in BPL history.'),
  1.074141298775187),
 (Document(metadata={'category': 'Player Highlight', 'date': '2022-02-17', 'title': "Shakib's Performance in BPL 2022"}, page_content='Shakib Al Hasan had an outstanding performance in the 2022 season, leading Fortune Barishal to the final.'),
  1.186674695625631),
 (Document(metadata={'category': 'Match Summary', 'date': '2023-02-16', 'title': 'BPL 2023 Final'}, page_content='The 2023 BPL final was held at Sher-e-Bangla National Stadium. Comilla Victorians defeated Sylhet Strikers by 7 wickets.'),
  1.2536855962215463)]

In [18]:
vector_store.similarity_search_with_score(
    query="",
    filter={
        "title": "Player Draft System"
    }
)

[(Document(metadata={'category': 'Rules and Format', 'date': '2022-12-01', 'title': 'Player Draft System'}, page_content='The BPL uses the draft system instead of an auction. Local and foreign players are picked before each season.'),
  2.046391831943181)]

## **Updated Existing Document**

In [20]:
print(doc1)

page_content='The 2023 BPL final was held at Sher-e-Bangla National Stadium. Comilla Victorians defeated Sylhet Strikers by 7 wickets.' metadata={'title': 'BPL 2023 Final', 'category': 'Match Summary', 'date': '2023-02-16'}


In [21]:
updated_doc1 = Document(
    page_content='The 2023 BPL final was held at North South University(NSU) SAC building and Khulna won the match and its a false news',
    metadata={'title': 'BPL 2023 Final', 'category': 'Match Summary', 'date': '2023-02-16'}
)

In [22]:
vector_store.update_document(
    document_id='0da7bf83-2a17-4512-a444-b124b9ab5089',
    document=updated_doc1
)

In [23]:
vector_store.get(include=['embeddings', 'documents', 'metadatas'])

{'ids': ['0da7bf83-2a17-4512-a444-b124b9ab5089',
  '72d12a26-00aa-4bbe-9a5b-0af934863c28',
  '92bb93fc-8071-4abd-9ffa-0f58b744f94c',
  'f656dc89-383d-4c9b-81b9-b32702cb8090',
  '260ca5ae-82b7-4bea-ad7d-80e938f6378f',
  'dc31ab4c-e53f-4394-8164-1d8344295eae',
  'a152b421-b85d-40c4-ac77-4e6bd3318d30',
  '51c5e7d7-3caf-4312-86f7-7f2a61a6329a',
  '66c0af74-7438-4146-9849-5b294a34b9e9',
  '95326341-c55b-4b1b-8e15-44c0f6d01c41'],
 'embeddings': array([[-0.09337316,  0.09153391, -0.01607833, ..., -0.01423268,
         -0.03410052,  0.08766431],
        [ 0.04835459,  0.04547029, -0.03726059, ..., -0.05358697,
         -0.04143931, -0.04331527],
        [-0.01799518,  0.0371599 , -0.04485202, ..., -0.00150311,
          0.00068147,  0.0416664 ],
        ...,
        [-0.04388712,  0.06512021, -0.0073859 , ..., -0.03398222,
          0.02102759,  0.05663031],
        [-0.0406306 , -0.02024882, -0.02437574, ...,  0.0144648 ,
         -0.04356062,  0.0195601 ],
        [-0.01841543,  0.08995268, 

In [24]:
vector_store.delete(ids=['0da7bf83-2a17-4512-a444-b124b9ab5089'])

In [25]:
vector_store.get(include=['embeddings', 'documents', 'metadatas'])

{'ids': ['72d12a26-00aa-4bbe-9a5b-0af934863c28',
  '92bb93fc-8071-4abd-9ffa-0f58b744f94c',
  'f656dc89-383d-4c9b-81b9-b32702cb8090',
  '260ca5ae-82b7-4bea-ad7d-80e938f6378f',
  'dc31ab4c-e53f-4394-8164-1d8344295eae',
  'a152b421-b85d-40c4-ac77-4e6bd3318d30',
  '51c5e7d7-3caf-4312-86f7-7f2a61a6329a',
  '66c0af74-7438-4146-9849-5b294a34b9e9',
  '95326341-c55b-4b1b-8e15-44c0f6d01c41'],
 'embeddings': array([[ 0.04835459,  0.04547029, -0.03726059, ..., -0.05358697,
         -0.04143931, -0.04331527],
        [-0.01799518,  0.0371599 , -0.04485202, ..., -0.00150311,
          0.00068147,  0.0416664 ],
        [-0.02563454,  0.09765103, -0.06250849, ...,  0.00658364,
          0.00703883,  0.04216996],
        ...,
        [-0.04388712,  0.06512021, -0.0073859 , ..., -0.03398222,
          0.02102759,  0.05663031],
        [-0.0406306 , -0.02024882, -0.02437574, ...,  0.0144648 ,
         -0.04356062,  0.0195601 ],
        [-0.01841543,  0.08995268, -0.03396286, ..., -0.05104535,
          0

In [26]:
retriever = vector_store.as_retriever()

In [28]:
import os
from dotenv import load_dotenv
load_dotenv()
from langchain_groq import ChatGroq

groq_api_key = os.getenv("GROQ_API_KEY")
os.environ["HF_TOKEN"] = os.getenv("HF_TOKEN")

llm = ChatGroq(groq_api_key=groq_api_key, model="Llama3-8b-8192")


In [29]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

message = """
Answer this question using the provided context only.

{question}

Context:
{context}
"""

prompt = ChatPromptTemplate.from_messages([("human", message)])
rag_chain = {"context": retriever, "question": RunnablePassthrough()}|prompt|llm
response = rag_chain.invoke("Tell me about Timim Iqbal")

In [30]:
response.content

'According to the provided context, Timim Iqbal is the batsman who scored a brilliant century for Khulna Tigers in the 2020 season, one of the top innings in BPL history.'