## Install ChromaDB

In [None]:
!pip install chromadb -q

## Create ChromaDB client

In [None]:
import chromadb

client = chromadb.Client()

## Create ChromaDB collection

In [None]:
collection = client.create_collection(name="my_collection")

## Add data to ChromaDB collection

In [None]:
cricket_news = """
The T20 World Cup 2024 is in full swing, bringing excitement and drama to cricket fans worldwide.
India's team, captained by Rohit Sharma, is preparing for a crucial match against Ireland, with standout player Jasprit Bumrah expected to play a pivotal role in their campaign.
The tournament has already seen controversy, particularly concerning the pitch conditions at Nassau County International Cricket Stadium in New York, which came under fire after a low-scoring game between Sri Lanka and South Africa.
"""

football_news = """
The world of football is buzzing with excitement as major tournaments and league matches continue to captivate fans globally.
In the UEFA Champions League, the semi-final matchups have been set, with defending champions Real Madrid set to face Manchester City, while Bayern Munich will take on Paris Saint-Germain.
Both ties promise thrilling encounters, featuring some of the best talents in world football.
"""

election_news = """
As election season heats up, the latest developments reveal a highly competitive atmosphere across several key races.
The presidential election has seen intense campaigning from all major candidates, with recent polls indicating a tight race.
Incumbent President Jane Doe is seeking re-election on a platform of economic stability and healthcare reform, while her main rival, Senator John Smith, focuses on education and climate change initiatives."""


ai_revolution_news = """
The AI revolution continues to transform industries and reshape the global economy.
Significant advancements in artificial intelligence have led to breakthroughs in healthcare, with AI-driven diagnostics improving patient outcomes and reducing costs.
Autonomous systems are becoming increasingly prevalent in logistics and transportation, enhancing efficiency and safety."""

In [None]:
collection.add(
    documents = [cricket_news, football_news, election_news, ai_revolution_news],
    metadatas = [{"source": "cricket"},{"source": "football"},{'source':'election'},{"source":"ai revolution"}],
    ids = ["id1", "id2", "id3", "id4"]
)

## Similarity search

In [None]:
results = collection.query(
    query_texts=["technology"],
    n_results=2
)

results

In [None]:
collection.count()

## CRUD operations on Vector Database

#### Add data

In [None]:
blockchain_news = """
The blockchain industry continues to evolve rapidly, marked by significant technological advancements and regulatory developments.
This month, the spotlight is on the launch of Ethereum 3.0, which promises enhanced scalability and security features.
This upgrade is expected to drastically reduce transaction fees and increase processing speeds, making decentralized applications (dApps) more efficient and user-friendly.
"""

In [None]:
collection.add(
    documents = [blockchain_news],
    metadatas = [{"source": "blockchain"}],
    ids = ["id5"]
)

In [None]:
collection.count()

In [None]:
results = collection.query(
    query_texts=["technology"],
    n_results=2
)

results

#### Read data

In [None]:
res = collection.get()
res

In [None]:
res = collection.get(ids=["id1", "id3"])
res

#### Update data

In [None]:
collection.update(
    ids=["id3"],
    documents=["This is sample document about generative AI"],
    metadatas=[{"source": "gen ai"}],
)

In [None]:
res = collection.get(ids=["id3"])
res

In [None]:
res = collection.get()
res

#### Delete data

In [None]:
collection.count()

In [None]:
collection.delete(ids = ['id2'])

In [None]:
collection.count()

In [None]:
results = collection.query(
    query_texts=["sport"],
    n_results=2
)

results

## Use alternative Embedding model

In [None]:
!pip install sentence_transformers -q

In [None]:
from sentence_transformers import SentenceTransformer

embedding_model = SentenceTransformer("all-mpnet-base-v2")

In [None]:
embeddings = embedding_model.encode([cricket_news, football_news])

In [None]:
embeddings

In [None]:
len(embeddings[0])

In [None]:
new_collection = client.create_collection(name="my_new_collection")

In [None]:
new_collection.add(
    documents = [cricket_news, football_news],
    embeddings = embeddings,
    metadatas = [{"source": "cricket"},{"source": "football"}],
    ids = ["id1", "id2"]
)

In [None]:
new_collection.get()

In [None]:
results = new_collection.query(
    query_embeddings=embedding_model.encode(["test worldcup"]),
    n_results=1
)

results