**Install required libraries**

In [1]:
pip install chromadb sentence-transformers



**Simple Textual Data**

In [2]:
documents = [
    "Django is a Python web framework",
    "Flask is used for lightweight web applications",
    "Machine learning learns patterns from data",
    "Deep learning uses neural networks",
    "Vector databases store embeddings"
]


**Pass Text Through an Embedding Model**

In [3]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [4]:
embeddings = model.encode(documents)

In [5]:
print(type(embeddings))
print(len(embeddings))
print(len(embeddings[0]))


<class 'numpy.ndarray'>
5
384


In [6]:
print(embeddings[0])

[-4.16636579e-02 -6.39138669e-02 -4.92803752e-02 -7.78210396e-03
 -1.36073902e-02 -7.47318193e-02 -8.68346635e-03  2.13141199e-02
 -6.32586852e-02 -4.37665433e-02  2.03174190e-03  4.56523038e-02
 -1.39647154e-02  1.55595234e-02  4.64705862e-02 -2.19470225e-04
  3.56616564e-02  1.82337165e-02  2.70770136e-02  3.31673911e-03
 -4.63879928e-02  4.85313199e-02  4.01275791e-02  1.33116944e-02
 -1.65328719e-02 -3.81252915e-02 -2.21211128e-02  5.70918582e-02
  2.98301931e-02 -1.15168542e-02  1.04211094e-02 -2.00353209e-02
 -1.95910521e-02  3.09325401e-02 -2.25616600e-02  5.02759330e-02
  2.25658715e-02 -1.33574799e-01 -1.95865035e-02  1.40625238e-02
  2.15212498e-02  1.21233901e-02 -5.41655757e-02 -3.14494893e-02
 -8.56190920e-02 -2.52636392e-02 -3.79093699e-02 -2.74908058e-02
  6.17560446e-02 -1.24147043e-01 -1.13079049e-01 -2.04953868e-02
 -3.55523042e-02 -6.92708865e-02 -7.19052181e-02  4.36704792e-03
  5.19439541e-02 -5.85780814e-02 -8.07537362e-02 -7.36347139e-02
 -2.38504782e-02  1.56964

**Store These Vectors in Vector Database (ChromaDB)**

In [7]:
import chromadb

client = chromadb.Client()


**Create a collection (like a table)**

In [8]:
collection = client.create_collection(name="my_text_data")


**Store text + vectors**

In [9]:
collection.add(
    documents=documents,
    embeddings=embeddings.tolist(),
    ids=[f"id_{i}" for i in range(len(documents))]
)


**Searching**

In [10]:
query = "What is used to store embeddings?"
query_embedding = model.encode([query])


In [11]:
results = collection.query(
    query_embeddings=query_embedding.tolist(),
    n_results=2
)
print(results["documents"])


[['Vector databases store embeddings', 'Deep learning uses neural networks']]


**Insert new text**

In [12]:
collection.add(
    documents=["FAISS is another vector database"],
    embeddings=model.encode(["FAISS is another vector database"]).tolist(),
    ids=["id_100"]
)


**Delete text**

In [13]:
collection.delete(ids=["id_1"])
