Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenAI API compatible assistant #424

Closed
wants to merge 1 commit into from

Conversation

nenb
Copy link
Contributor

@nenb nenb commented May 24, 2024

PoC - do not merge.

Details

  1. Download https://huggingface.co/Mozilla/Mistral-7B-Instruct-v0.2-llamafile/blob/main/mistral-7b-instruct-v0.2.Q5_K_M.llamafile (or host your own model locally)
  2. export BASE_URL=http://localhost:8080 (or wherever your model is available locally)
  3. Add assistants = [ "ragna.assistants.OpenAIApiCompatible", ] to ragna.toml

Copy link
Member

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nick for the PR. As much as I would like to have this generic assistant, it won't work as implemented right now. I've added a few comments below indicating places that would need to be made more flexible to serve "all" OpenAI compatible APIs.

My suggestion would be to create an OpenAIAPICompatible base class with a public method make_request that can be overridden by subclasses. Plus we also need to handle the different streaming methods.

) -> AsyncIterator[str]:
import httpx_sse

async with httpx_sse.aconnect_sse(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all APIs use SSE for streaming, e.g. Ollama (#376). Meaning, although the request is the same, the response is coming back in a different protocol and has to be handled accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we already have this logic (work by you!) for the Google assistant, can we just re-use it here?

If so, perhaps just a separate block of code, and another environment variable to control whether this block is executed?

Comment on lines +60 to +62
"temperature": 0.0,
"max_tokens": max_new_tokens,
"stream": True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of OpenAI, Ollama, and vLLM need to get passed a model here as well.

async with httpx_sse.aconnect_sse(
self._client,
"POST",
f"{self._base_url}/v1/chat/completions",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Azure OpenAI needs a different scheme:

$AZURE_OPENAI_ENDPOINT/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-02-01

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that Azure OpenAI is compatible with the OpenAI API. Granted, there is a strong connection between the two, and there may be an argument for special-casing Azure here, but I think that is a separate issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is compliant. You just need to use a different URL as posted above. The gpt-35-turbo in there is the name of the deployment, e.g. the model name. Meaning, similar to the llamafile approach in this PR, you don't pass a model in the request (see #424 (comment)). Apart from that, this is exactly the same API as the public OpenAI one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is compliant.

My understanding was that it is not compliant. Can you point me to some resources on this please?

As a quick check, I went to the OpenAPI spec for Azure, and I saw this example for the /chat/completions endpoint, which appears to not be compliant. A quick skim of the Azure API documentation also seems to suggest several other examples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that it is not compliant. Can you point me to some resources on this please?

I can't. We have used it in an environment that I cannot share publicly.

As a quick check, I went to the OpenAPI spec for Azure, and I saw this example for the /chat/completions endpoint, which appears to not be compliant.

IIUC, these are just extra parameters that Azure OpenAI supports but OpenAI does not. Meaning Azure OpenAI seems to be a superset of the public API. I can attest to the fact that you do not need to set any of these parameters to use it.

A quick skim of the Azure API documentation also seems to suggest several other examples.

I did not see anything that would make it non-compliant. Could you share the specific section / example that you are referring to?

Copy link
Contributor Author

@nenb nenb May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning Azure OpenAI seems to be a superset of the public API.

Maybe this reduces to me having a different (incorrect?) definition of what a compliant API is then. I would have assumed that a superset automatically made it non-compliant.

I did not see anything that would make it non-compliant.

They had stuff related to an operations endpoint that I didn't recognise on a quick glance. Having a version number as a query parameter also seemed like it could create compatibility problems, unless things were deprecated/added at the same time for both API endpoints, which they aren't. But again, maybe I have a different or incorrect understanding of what compatibility means here.

@nenb
Copy link
Contributor Author

nenb commented May 28, 2024

Superseded by #425

@nenb nenb closed this May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants