#### Install necessary packages or make a requirements.txt file and run the file in the terminal ###

**packages:**
- chromadb
- mistral

Chroma bruger DefaultEmbeddingFunction som standard embedding. Den er baseret på Sentence Transformers all-MiniLm-L6-v2 modellen.


In [29]:
import chromadb
# Importerer ChromaDB-biblioteket. 
# Dette giver adgang til klientklasser og metoder 
# til at oprette og håndtere vektor-databaser.

client = chromadb.PersistentClient()
# Opretter en PersistentClient-instans. 
# PersistentClient() konfigurerer chroma til at gemme og opdater databasen
# fra min lokale enhed

In [31]:
collection = client.create_collection(name="exerciseembeddings")
# Opretter en ny "collection" (samling) i ChromaDB med navnet "exerciseembeddings".
# Collections er hvor vi opbevarer embeddings, dokumenter og metadata.
# Collections indekserer vores embeddings og dokumenter, 
# og aktivere retrieval og filtrering

#collection = client.get_collection(name="exerciseembeddings")
# Henter en eksisterende collection

In [33]:
collection.add(
    documents=[
        "Hello world",
        "MistralAI Vectors and Embeddings are easy!",
        """This latest generation continues to push the boundaries of cost efficiency, speed, and performance. Mistral Large 2 is exposed on la Plateforme and enriched with new features to facilitate building innovative AI applications.

Mistral Large 2
Mistral Large 2 has a 128k context window and supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash.

Mistral Large 2 is designed for single-node inference with long-context applications in mind – its size of 123 billion parameters allows it to run at large throughput on a single node. We are releasing Mistral Large 2 under the Mistral Research License, that allows usage and modification for research and non-commercial usages. For commercial usage of Mistral Large 2 requiring self-deployment, a Mistral Commercial License must be acquired by contacting us.

General performance
Mistral Large 2 sets a new frontier in terms of performance / cost of serving on evaluation metrics. In particular, on MMLU, the pretrained version achieves an accuracy of 84.0%, and sets a new point on the performance/cost Pareto front of open models.

Code & Reasoning
Following our experience with Codestral 22B and Codestral Mamba, we trained Mistral Large 2 on a very large proportion of code. Mistral Large 2 vastly outperforms the previous Mistral Large, and performs on par with leading models such as GPT-4o, Claude 3 Opus, and Llama 3 405B.

"""
    ],
    ids=["id1", "id2", "id3"]
)

#Tilføj dokumenter som en liste til collections
#Chroma opbevarer teksten og håndterer embeddings og indeksering automatisk
#Embedding modellen kan tilpasses/indstilles
#Unikke string ID'er for hvert dokument inddrages som en liste

In [34]:
results = collection.query(
    query_texts=["OpenAI"], # Chroma will embed this for you
    n_results=2, # how many results to return
     include=["embeddings", "documents", "metadatas", "distances"], #Viser embedding vektor
)
results

#Collections opretter en en liste af query tekst,
#Hvor chroma vil returnerer 'n' resultater mest sandsynlig/lignende

{'ids': [['id1', 'id3']],
 'embeddings': [array([[-3.44773121e-02,  3.10231913e-02,  6.73492113e-03,
           2.61089392e-02, -3.93620469e-02, -1.60302490e-01,
           6.69240132e-02, -6.44139666e-03, -4.74505313e-02,
           1.47588924e-02,  7.08753988e-02,  5.55276386e-02,
           1.91933122e-02, -2.62512993e-02, -1.01095960e-02,
          -2.69405544e-02,  2.23073903e-02, -2.22266335e-02,
          -1.49692684e-01, -1.74930319e-02,  7.67623214e-03,
           5.43523543e-02,  3.25441011e-03,  3.17259766e-02,
          -8.46214294e-02, -2.94059888e-02,  5.15955910e-02,
           4.81240265e-02, -3.31475399e-03, -5.82791269e-02,
           4.19692621e-02,  2.22107004e-02,  1.28188923e-01,
          -2.23389398e-02, -1.16563467e-02,  6.29283637e-02,
          -3.28763351e-02, -9.12261009e-02, -3.11751850e-02,
           5.26994802e-02,  4.70348373e-02, -8.42030197e-02,
          -3.00562177e-02, -2.07448807e-02,  9.51771997e-03,
          -3.72177758e-03,  7.34329922e-03,  

In [35]:
client.heartbeat() 
#Returns a nanosecond heartbeat to confirm that a client
#has connection to the ChromaDB. 

1749036695219047600

In [None]:
client.reset()
#Empties and resets the database.
#Needs to allow reset through settings

In [37]:
client.delete_collection(name="exerciseembeddings")
#Delete a collection and all associated embeddings, documents and metadata

In [41]:
all_names = [col.name for col in client.list_collections()]

if "exerciseembeddings" not in all_names:
    print("exerciseembeddings not found")
else:
    print("exerciseembeddings still ")

#Laver en liste for at tjekke om gammel collections er væk

exerciseembeddings not found


Kilder:

https://docs.trychroma.com/docs/overview/getting-started

https://docs.trychroma.com/docs/run-chroma/persistent-client
