## Importing Data

This notebook has 2 examples

***

The first examples is how to load in data that uses Chroma's default embedding function (SentenceTransformers).

In [None]:
! pip install chromadb --quiet
# ! pip install chroma_datasets --quiet

In [3]:
import chromadb
from chroma_datasets import StateOfTheUnion
from chroma_datasets.utils import import_into_chroma

chroma_client = chromadb.Client()
collection = import_into_chroma(chroma_client=chroma_client, dataset=StateOfTheUnion)
result = collection.query(query_texts=["The United States of America"], n_results=1)
print(result)

Loaded 41 documents into the collection named: StateOfTheUnion
{'ids': [['40']], 'embeddings': None, 'documents': [['Now is our moment to meet and overcome the challenges of our time.\nAnd we will, as one people.\nOne America.\nThe United States of America.\nMay God bless you all. May God protect our troops.']], 'metadatas': [[None]], 'distances': [[1.186147928237915]]}


The second example is how to load in data that is embedded using OpenAI embeddings. This requires passing a `OpenAIEmbeddingFunction` because in order to use the collection and query it, you need to configure it with your API keys.

In [4]:
import chromadb
from chromadb.utils import embedding_functions
from chroma_datasets import Glue
from chroma_datasets.utils import import_into_chroma
from chroma_datasets import PaulGrahamEssay

chroma_client = chromadb.Client()
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="API_KEY",
    model_name="text-embedding-ada-002"
)
sotu_coll = import_into_chroma(chroma_client=chroma_client, dataset=PaulGrahamEssay, embedding_function=openai_ef)
print(sotu_coll.count())


Loaded 104 documents into the collection named: PaulGrahamEssay
104
