## Importing Data

This notebook has 2 examples

***

The first examples is how to load in data that uses Chroma's default embedding function (SentenceTransformers).

In [4]:
! pip install chromadb --quiet
# ! pip install chroma_datasets --quiet

In [5]:
import chromadb
from chromadb.utils import embedding_functions
from chroma_datasets import StateOfTheUnion
from chroma_datasets.utils import import_into_chroma

chroma_client = chromadb.Client()
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="API_KEY",
    model_name="text-embedding-ada-002"
)
collection = import_into_chroma(chroma_client=chroma_client, dataset=StateOfTheUnion, embedding_function=openai_ef)
result = collection.query(query_texts=["The United States of America"], n_results=1)
print(result)

Loaded 42 documents into the collection named: StateOfTheUnion
{'ids': [['41']], 'embeddings': None, 'documents': [['We are the only nation on Earth that has always turned every crisis we have faced into an opportunity. \n \n The only nation that can be defined by a single word: possibilities. \n \n So on this night, in our 245th year as a nation, I have come to report on the State of the Union. \n \n And my report is this: the State of the Union is strong—because you, the American people, are strong. \n \n We are stronger today than we were a year ago. \n \n And we will be stronger a year from now than we are today. \n \n Now is our moment to meet and overcome the challenges of our time. \n \n And we will, as one people. \n \n One America. \n \n The United States of America. \n \n May God bless you all. May God protect our troops.']], 'metadatas': [[{'source': 'sotu'}]], 'distances': [[0.3151229918003082]]}


The second example is how to load in data that is embedded using OpenAI embeddings. This requires passing a `OpenAIEmbeddingFunction` because in order to use the collection and query it, you need to configure it with your API keys.

In [7]:
import chromadb
from chromadb.utils import embedding_functions
from chroma_datasets.utils import import_into_chroma
from chroma_datasets import PaulGrahamEssay

chroma_client = chromadb.Client()
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="API_KEY",
    model_name="text-embedding-ada-002"
)
sotu_coll = import_into_chroma(chroma_client=chroma_client, dataset=PaulGrahamEssay, embedding_function=openai_ef)
print(sotu_coll.count())


Found cached dataset parquet (/Users/jeff/.cache/huggingface/datasets/chromadb___parquet/chromadb--paul_graham_essay-93509c9cb92f393a/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7)


Loaded 104 documents into the collection named: PaulGrahamEssay
104
