# Chroma 🦜

## Installation

```bash
pip install chromadb
```

## Import

In [1]:
import chromadb

In [2]:
# Get the Chroma Client
chroma_client = chromadb.Client()

Using embedded DuckDB without persistence: data will be transient


## Create a collection

Collections are where you'll store your embeddings, documents, and any additional metadata. You can create a collection with a name:

In [3]:
collection = chroma_client.create_collection(name="my_collection")

No embedding_function provided, using default embedding function: SentenceTransformerEmbeddingFunction
  from .autonotebook import tqdm as notebook_tqdm


## Add some text documents to the collection

- Chroma will store your text, and handle tokenization, embedding, and indexing automatically.

```py
collection.add(
    documents=["This is a document", "This is another document"],
    metadatas=[{"source": "my_source"}, {"source": "my_source"}],
    ids=["id1", "id2"]
)
```

---

- If you have already generated embeddings yourself, you can load them directly in:

```py
collection.add(
    embeddings=[[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]],
    documents=["This is a document", "This is another document"],
    metadatas=[{"source": "my_source"}, {"source": "my_source"}],
    ids=["id1", "id2"]
)
```

In [4]:
collection.add(
    documents=["This is a document", "This is another document"],
    metadatas=[{"source": "my_source"}, {"source": "my_source"}],
    ids=["id1", "id2"]
)

## Query the collection

You can query the collection with a list of query texts, and Chroma will return the n most similar results.

In [9]:
results = collection.query(
    query_texts=["My another document is a good document"],
    n_results=2
)

# print results json in a pretty format
import json
print(json.dumps(results, indent=2))

{
  "ids": [
    [
      "id2",
      "id1"
    ]
  ],
  "embeddings": null,
  "documents": [
    [
      "This is another document",
      "This is a document"
    ]
  ],
  "metadatas": [
    [
      {
        "source": "my_source"
      },
      {
        "source": "my_source"
      }
    ]
  ],
  "distances": [
    [
      0.6579898595809937,
      0.802357017993927
    ]
  ]
}
