-
Notifications
You must be signed in to change notification settings - Fork 7k
Correct Async functionality with AsyncClientAPI and add embedding model support #6660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1. Corrected async usage to use actual AsyncHttpClient. 2. Added optional argument embedding_function to support embedding models other than the default, e.g.: ´´´python from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction return OpenAIEmbeddingFunction( api_key_env_var="AZURE_OPENAI_API_KEY", model_name=config.index.config.get("model"), # type: ignore deployment_id=config.index.config.get("azure_deployment"), api_base=config.index.config.get("azure_endpoint"), api_type="azure", api_version=config.index.config.get("api_version"), ) ´´´
@microsoft-github-policy-service agree |
This reverts commit 6b83ebf.
@danielnyari There have been some recent updates to the chromadb example in #6648 that overlaps with some of the ideas you have here (e.g., there is now support for an embedding function config) that is now merged in main Thanks. |
@@ -19,6 +22,7 @@ | |||
OpenAIEmbeddingFunctionConfig, | |||
PersistentChromaDBVectorMemoryConfig, | |||
SentenceTransformerEmbeddingFunctionConfig, | |||
AzureOpenAIEmbeddingFunctionConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment more on this import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just being a git noob with the rebase. Hope it makes sense now, with the config class committed, but anyways:
I wrote a separate Config class for Azure OpenAI, cause - unfortunately - chromadb shares a single EmbeddingFunction for both OpenAI and Azure OpenAI.
For OpenAI you can just do:
from chromadb.utils import embedding_functions
oai_embed = embedding_functions.OpenAIEmbeddingFunction(api_key="api-key",
model_name="text-embedding-3-small")
But if you want to use Azure OpenAI you have to:
from chromadb.utils import embedding_functions
oai_embed = embedding_functions.OpenAIEmbeddingFunction(api_key="api-key",
model_name="text-embedding-3-small",
deployment_id="text-embedding-3-small",
api_base="https://test-endpoint.openai.azure.com/",
api_type="azure",
api_version="2024-12-01-preview",
)
I did not subclass the existing OpenAIEmbeddingFunctionConfig
class because it would cause trouble at:
autogen/python/packages/autogen-ext/src/autogen_ext/memory/chromadb/_chromadb.py
Lines 183 to 201 in 593151e
elif isinstance(config, OpenAIEmbeddingFunctionConfig): | |
try: | |
return embedding_functions.OpenAIEmbeddingFunction(api_key=config.api_key, model_name=config.model) | |
except Exception as e: | |
raise ImportError( | |
f"Failed to create OpenAI embedding function with model '{config.model}'. " | |
f"Ensure openai is installed and API key is valid. Error: {e}" | |
) from e | |
elif isinstance(config, AzureOpenAIEmbeddingFunctionConfig): | |
try: | |
return embedding_functions.OpenAIEmbeddingFunction( | |
api_key=config.api_key, | |
api_type=config.api_type, | |
model_name=config.model, | |
api_base=config.azure_endpoint, | |
deployment_id=config.azure_deployment, | |
api_version=config.api_version, | |
) |
@@ -89,7 +93,7 @@ class ChromaDBVectorMemory(Memory, Component[ChromaDBVectorMemoryConfig]): | |||
collection_name="multilingual_memory", | |||
persistence_path=os.path.join(str(Path.home()), ".chromadb_autogen"), | |||
embedding_function_config=SentenceTransformerEmbeddingFunctionConfig( | |||
model_name="paraphrase-multilingual-mpnet-base-v2" | |||
model="paraphrase-multilingual-mpnet-base-v2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this argument converted to model from model_name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am aware that this is a breaking change. However, my thought process was the following:
- There is already a config implementation for the
AzureOpenAIChatCompletionClient
/OpenAIChatCompletionClient
autogen/python/packages/autogen-ext/src/autogen_ext/models/openai/config/__init__.py
Lines 100 to 108 in 6e7415e
class BaseOpenAIClientConfigurationConfigModel(CreateArgumentsConfigModel): | |
model: str | |
api_key: SecretStr | None = None | |
timeout: float | None = None | |
max_retries: int | None = None | |
model_capabilities: ModelCapabilities | None = None # type: ignore | |
model_info: ModelInfo | None = None | |
add_name_prefixes: bool | None = None | |
default_headers: Dict[str, str] | None = None |
2. I assume that most people who use Azure OpenAI use the same resource for chat completions and embeddings, which means that api key, endpoint will be the same
3. This makes it easier to share configs if you have higher-level constructor
4. If
model
makes more sense for OpenAI/AzureOpenAI then it should be uniform for all Embedding functions
+1 since this was added recently hopefully it isn't a big change
Corrected async usage to use actual AsyncHttpClient.
Added optional argument embedding_function to support embedding models other than the default, e.g.:
Why are these changes needed?
Related issue number
Checks