-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenAI API compatible assistant #424
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nick for the PR. As much as I would like to have this generic assistant, it won't work as implemented right now. I've added a few comments below indicating places that would need to be made more flexible to serve "all" OpenAI compatible APIs.
My suggestion would be to create an OpenAIAPICompatible
base class with a public method make_request
that can be overridden by subclasses. Plus we also need to handle the different streaming methods.
) -> AsyncIterator[str]: | ||
import httpx_sse | ||
|
||
async with httpx_sse.aconnect_sse( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all APIs use SSE for streaming, e.g. Ollama (#376). Meaning, although the request is the same, the response is coming back in a different protocol and has to be handled accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we already have this logic (work by you!) for the Google assistant, can we just re-use it here?
If so, perhaps just a separate block of code, and another environment variable to control whether this block is executed?
"temperature": 0.0, | ||
"max_tokens": max_new_tokens, | ||
"stream": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of OpenAI, Ollama, and vLLM need to get passed a model here as well.
async with httpx_sse.aconnect_sse( | ||
self._client, | ||
"POST", | ||
f"{self._base_url}/v1/chat/completions", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Azure OpenAI needs a different scheme:
$AZURE_OPENAI_ENDPOINT/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-02-01
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that Azure OpenAI is compatible with the OpenAI API. Granted, there is a strong connection between the two, and there may be an argument for special-casing Azure here, but I think that is a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is compliant. You just need to use a different URL as posted above. The gpt-35-turbo
in there is the name of the deployment, e.g. the model name. Meaning, similar to the llamafile approach in this PR, you don't pass a model in the request (see #424 (comment)). Apart from that, this is exactly the same API as the public OpenAI one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is compliant.
My understanding was that it is not compliant. Can you point me to some resources on this please?
As a quick check, I went to the OpenAPI spec for Azure, and I saw this example for the /chat/completions
endpoint, which appears to not be compliant. A quick skim of the Azure API documentation also seems to suggest several other examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that it is not compliant. Can you point me to some resources on this please?
I can't. We have used it in an environment that I cannot share publicly.
As a quick check, I went to the OpenAPI spec for Azure, and I saw this example for the
/chat/completions
endpoint, which appears to not be compliant.
IIUC, these are just extra parameters that Azure OpenAI supports but OpenAI does not. Meaning Azure OpenAI seems to be a superset of the public API. I can attest to the fact that you do not need to set any of these parameters to use it.
A quick skim of the Azure API documentation also seems to suggest several other examples.
I did not see anything that would make it non-compliant. Could you share the specific section / example that you are referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning Azure OpenAI seems to be a superset of the public API.
Maybe this reduces to me having a different (incorrect?) definition of what a compliant API is then. I would have assumed that a superset automatically made it non-compliant.
I did not see anything that would make it non-compliant.
They had stuff related to an operations
endpoint that I didn't recognise on a quick glance. Having a version number as a query parameter also seemed like it could create compatibility problems, unless things were deprecated/added at the same time for both API endpoints, which they aren't. But again, maybe I have a different or incorrect understanding of what compatibility means here.
Superseded by #425 |
PoC - do not merge.
Details
export BASE_URL=http://localhost:8080
(or wherever your model is available locally)assistants = [ "ragna.assistants.OpenAIApiCompatible", ]
toragna.toml