# chroma

Chroma的目标是帮助用户更加便捷地构建大模型应用，更加轻松的将知识（knowledge）、事实（facts）和技能（skills）等我们现实世界中的文档整合进大模型中。

Chroma提供的工具：

- 存储文档数据和它们的元数据：store embeddings and their metadata
- 嵌入：embed documents and queries
- 搜索： search embeddings

Chroma的设计优先考虑：

- 足够简单并且提升开发者效率：simplicity and developer productivity
- 搜索之上再分析：analysis on top of search
- 追求快（性能）： it also happens to be very quick

In [1]:
import chromadb
# setup Chroma in-memory, for easy prototyping. Can add persistence easily!
client = chromadb.Client()

# Create collection. get_collection, get_or_create_collection, delete_collection also available!
collection = client.create_collection("all-my-documents")

# Add docs to the collection. Can also update and delete. Row-based API coming soon!
collection.add(
    documents=["This is document1", "This is document2"], # we handle tokenization, embedding, and indexing automatically. You can skip that and add your own embeddings as well
    metadatas=[{"source": "notion"}, {"source": "google-docs"}], # filter on these!
    ids=["doc1", "doc2"], # unique for each doc
)

# Query/search 2 most similar results. You can also .get by id
results = collection.query(
    query_texts=["This is a query document"],
    n_results=2,
    # where={"metadata_field": "is_equal_to_this"}, # optional filter
    # where_document={"$contains":"search_string"}  # optional filter
)

In [2]:
print(results)

{'ids': [['doc1', 'doc2']], 'embeddings': None, 'documents': [['This is document1', 'This is document2']], 'uris': None, 'data': None, 'metadatas': [[{'source': 'notion'}, {'source': 'google-docs'}]], 'distances': [[0.902620792388916, 1.0357502698898315]], 'included': [<IncludeEnum.distances: 'distances'>, <IncludeEnum.documents: 'documents'>, <IncludeEnum.metadatas: 'metadatas'>]}


### **数据集（Collection）**

collection是Chroma中一个重要的概念，下面的代码和注释简单介绍了collection的主要功能和使用方法。

从客户端对象中获取名为“test”的集合对象。如果该集合不存在，则会抛出一个异常。

In [3]:
collection = client.get_collection(name="test") 

InvalidCollectionException: Collection test does not exist.

一些常用的函数

In [ ]:
collection.delete()