### Load Pre-trained Sentence Transformer Model

In [31]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
print("Loaded successfully")

Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


Loaded successfully


### Test Encoding of Sample Text

In [30]:
from sentence_transformers import SentenceTransformer

# Load a pre-trained model
model = SentenceTransformer("all-MiniLM-L6-v2")

# Test encoding
text = "Turning Text into Numbers"
embedding = model.encode(text)

print("Embedding length:", len(embedding))
print("Sample of embedding:", embedding[:10])

Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


Embedding length: 384
Sample of embedding: [ 0.05771338  0.12840357 -0.05404996  0.00614077 -0.08776704  0.08671194
  0.07675532  0.0499517   0.03868663 -0.01710961]


### Initialize ChromaDB Client and Create a Collection

In [13]:
import chromadb

# Initialize the ChromaDB client
client = chromadb.Client()

# Create a collection (similar to an SQL table)
collection = client.create_collection(name="my_kb_1")

### Add Documents to ChromaDB Collection

In [14]:
docs = [
    "Machine learning focuses on data.",
    "Docker containerizes apps."
]

# Encode the documents into vectors
embeddings = model.encode(docs)

# Add the documents to the collection with IDs
collection.add(
    documents=docs,
    embeddings=embeddings,
    ids=["id1", "id2"]
)


### Query ChromaDB with Multiple Questions

In [26]:
# List of queries
queries = [
    "How do computers learn from experience?",
    "What is Docker?",
    "Explain machine learning algorithms."
]

# Initialize a loop to query each question
for query in queries:
    # Convert the query into a vector
    query_vector = model.encode(query)

    # Perform the query to get results based on semantic similarity
    results = collection.query(
        query_embeddings=[query_vector],
        n_results=1
    )

    # Print the results for each query
    print(f"Query: {query}")
    print("Closest Document:", results['documents'][0][0])  # First document (index 0) from the first result (index 0)
    print("Document ID:", results['ids'][0][0])  # First ID (index 0) from the first result (index 0)
    print("Distance (similarity score):", results['distances'][0][0])  # First distance (index 0)
    print("="*50)  # Separator between results of different queries

Query: How do computers learn from experience?
Closest Document: Machine learning focuses on data.
Document ID: id1
Distance (similarity score): 1.2167723178863525
Query: What is Docker?
Closest Document: Docker containerizes apps.
Document ID: id2
Distance (similarity score): 0.6844878196716309
Query: Explain machine learning algorithms.
Closest Document: Machine learning focuses on data.
Document ID: id1
Distance (similarity score): 0.618829071521759
