# 3 - Vector DB (ChromaDB)

chromadb is an open-source embedding database designed to store and manage high-dimensional vector representations of data, such as text, images, or audio. It is optimized for efficient similarity search and retrieval tasks, making it suitable for applications like recommendation systems, information retrieval, and machine learning.

More info at: https://docs.trychroma.com/docs/overview/getting-started

## 3.1 Embed and index documents

In [1]:
import os
os.environ["ANONYMIZED_TELEMETRY"] = "False"
import chromadb

from sentence_transformers import SentenceTransformer

# Initialize a persistent ChromaDB client.
# The database will be stored in the folder ./data/chroma_db
client = chromadb.PersistentClient(path='./data/chroma_db')

# Create or retrieve a collection named "test".
# A collection is a logical grouping of documents + embeddings.
col = client.get_or_create_collection('test')

# Load a Sentence Transformers model to generate embeddings.
# The model "all-MiniLM-L6-v2" is lightweight, fast, and widely used.
model = SentenceTransformer('all-MiniLM-L6-v2')

# Texts that we want to index in the vector database.
texts = [
    "Le stock de sécurité permet d’absorber les variations de la demande et les retards fournisseurs",
    "Le processus S&OP aligne la demande, la production et les capacités logistiques",
    "Le lead time correspond au délai total entre la commande et la livraison",
    "Un lead time long augmente le besoin en stock de sécurité",
    "Le S&OP est un processus collaboratif entre ventes, production et supply chain",
    "La réduction du lead time améliore le taux de service client"
]

# Generate embeddings (numerical vector representations) for the texts.
# .tolist() converts the tensor output into native Python lists.
embs = model.encode(texts).tolist()

# Add the texts and their embeddings into the collection.
# We assign simple IDs ("0", "1", "2") for each document.
col.add(ids=[str(i) for i in range(len(texts))], documents=texts, embeddings=embs)

# Print the total number of items currently stored in the collection.
print('Count:', col.count())


  from .autonotebook import tqdm as notebook_tqdm
Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event CollectionAddEvent: capture() takes 1 positional argument but 3 were given


Count: 6


## 3.2 Query the vector database

In [5]:
# collection query example
queries = [
    "Quel processus permet de coordonner ventes et opérations ?",
    "Quel est le délai de livraison d’un fournisseur ?"
]

for q in queries:
    print('Query:', q)
    query_embs = model.encode([q]).tolist()
    results = col.query(query_embeddings=query_embs, n_results=2)
    print('Results:', results['documents'])
    

Query: Quel processus permet de coordonner ventes et opérations ?
Results: [['Le S&OP est un processus collaboratif entre ventes, production et supply chain', 'Le processus S&OP aligne la demande, la production et les capacités logistiques']]
Query: Quel est le délai de livraison d’un fournisseur ?
Results: [['Le lead time correspond au délai total entre la commande et la livraison', 'Le stock de sécurité permet d’absorber les variations de la demande et les retards fournisseurs']]
