## Here we added `Embeddings` with to improve the similarity search functionality

In [15]:
import chromadb
import chromadb.utils.embedding_functions as embedding_functions
from chromadb.config import DEFAULT_TENANT, DEFAULT_DATABASE, Settings

import os

In [22]:
client = chromadb.PersistentClient(
    path="test",
    settings=Settings(),
    tenant=DEFAULT_TENANT,
    database=DEFAULT_DATABASE,
)

In [23]:
student_info = """
Alexandra Thompson, a 19-year-old computer science sophomore with a 3.7 GPA,
is a member of the programming and chess clubs who enjoys pizza, swimming, and hiking
in her free time in hopes of working at a tech company after graduating from the University of Washington.
"""

club_info = """
The university chess club provides an outlet for students to come together and enjoy playing
the classic strategy game of chess. Members of all skill levels are welcome, from beginners learning
the rules to experienced tournament players. The club typically meets a few times per week to play casual games,
participate in tournaments, analyze famous chess matches, and improve members' skills.
"""

university_info = """
The University of Washington, founded in 1861 in Seattle, is a public research university
with over 45,000 students across three campuses in Seattle, Tacoma, and Bothell.
As the flagship institution of the six public universities in Washington state,
UW encompasses over 500 buildings and 20 million square feet of space,
including one of the largest library systems in the world.
"""

## Create Embeddings

In [24]:
from dotenv import load_dotenv
load_dotenv() 

# Embedding models: https://python.langchain.com/v0.1/docs/integrations/text_embedding/
GOOGLE_API_KEY = os.getenv('GEMINI_API_KEY')

# use directly
google_ef  = embedding_functions.GoogleGenerativeAiEmbeddingFunction(api_key=GOOGLE_API_KEY)
student_embeddings = google_ef([student_info, club_info, university_info]) # list of text `document`

In [28]:
collection2 = client.get_or_create_collection(name="Students2", embedding_function=google_ef)

In [None]:
collection2.add(
    documents = [student_info, club_info, university_info],
    metadatas = [{"source": "student info"},{"source": "club info"},{'source':'university info'}],
    ids = ["id1", "id2", "id3"]
)

In [None]:
results = collection2.query(
    query_texts=["Whats the students name?"],
    n_results=2
)

print(results['documents'][0][0].strip())
print(results['ids'])
print(results['distances'])

**In this case due to embedding the distances have been reduced between documents and it improved the results in some cases.**

```
The similarity search now returns information about the university instead of a club. Additionally, the distance between the vectors is lower than the default embedding model, which is a good thing.
```

### Lets update/delete data

#### Update

In [37]:
collection2.update(
    ids=["id1"],
    documents=["Istiaq Ahmed Fahad, a 19-year-old computer science sophomore with a 3.7 GPA"],
    metadatas=[{"source": "student info"}],
)

#### Delete

In [None]:
collection2.delete(ids = ['id1'])


results = collection2.query(
    query_texts=["What is the student name?"],
    n_results=2
)

results