Add OpenAI API compatible assistant #424

nenb · 2024-05-24T19:09:36Z

PoC - do not merge.

Details

Download https://huggingface.co/Mozilla/Mistral-7B-Instruct-v0.2-llamafile/blob/main/mistral-7b-instruct-v0.2.Q5_K_M.llamafile (or host your own model locally)
export BASE_URL=http://localhost:8080 (or wherever your model is available locally)
Add assistants = [ "ragna.assistants.OpenAIApiCompatible", ] to ragna.toml

pmeier

Thanks Nick for the PR. As much as I would like to have this generic assistant, it won't work as implemented right now. I've added a few comments below indicating places that would need to be made more flexible to serve "all" OpenAI compatible APIs.

My suggestion would be to create an OpenAIAPICompatible base class with a public method make_request that can be overridden by subclasses. Plus we also need to handle the different streaming methods.

pmeier · 2024-05-27T07:15:49Z

ragna/assistants/_openai_api_compatible.py

+    ) -> AsyncIterator[str]:
+        import httpx_sse
+
+        async with httpx_sse.aconnect_sse(


Not all APIs use SSE for streaming, e.g. Ollama (#376). Meaning, although the request is the same, the response is coming back in a different protocol and has to be handled accordingly.

Given that we already have this logic (work by you!) for the Google assistant, can we just re-use it here?

If so, perhaps just a separate block of code, and another environment variable to control whether this block is executed?

pmeier · 2024-05-27T07:17:56Z

ragna/assistants/_openai_api_compatible.py

+                "temperature": 0.0,
+                "max_tokens": max_new_tokens,
+                "stream": True,


All of OpenAI, Ollama, and vLLM need to get passed a model here as well.

pmeier · 2024-05-27T07:21:26Z

ragna/assistants/_openai_api_compatible.py

+        async with httpx_sse.aconnect_sse(
+            self._client,
+            "POST",
+            f"{self._base_url}/v1/chat/completions",


Azure OpenAI needs a different scheme:

$AZURE_OPENAI_ENDPOINT/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-02-01

I don't think that Azure OpenAI is compatible with the OpenAI API. Granted, there is a strong connection between the two, and there may be an argument for special-casing Azure here, but I think that is a separate issue.

It is compliant. You just need to use a different URL as posted above. The gpt-35-turbo in there is the name of the deployment, e.g. the model name. Meaning, similar to the llamafile approach in this PR, you don't pass a model in the request (see #424 (comment)). Apart from that, this is exactly the same API as the public OpenAI one.

It is compliant.

My understanding was that it is not compliant. Can you point me to some resources on this please?

As a quick check, I went to the OpenAPI spec for Azure, and I saw this example for the /chat/completions endpoint, which appears to not be compliant. A quick skim of the Azure API documentation also seems to suggest several other examples.

My understanding was that it is not compliant. Can you point me to some resources on this please?

I can't. We have used it in an environment that I cannot share publicly.

As a quick check, I went to the OpenAPI spec for Azure, and I saw this example for the /chat/completions endpoint, which appears to not be compliant.

IIUC, these are just extra parameters that Azure OpenAI supports but OpenAI does not. Meaning Azure OpenAI seems to be a superset of the public API. I can attest to the fact that you do not need to set any of these parameters to use it.

A quick skim of the Azure API documentation also seems to suggest several other examples.

I did not see anything that would make it non-compliant. Could you share the specific section / example that you are referring to?

Meaning Azure OpenAI seems to be a superset of the public API.

Maybe this reduces to me having a different (incorrect?) definition of what a compliant API is then. I would have assumed that a superset automatically made it non-compliant.

I did not see anything that would make it non-compliant.

They had stuff related to an operations endpoint that I didn't recognise on a quick glance. Having a version number as a query parameter also seemed like it could create compatibility problems, unless things were deprecated/added at the same time for both API endpoints, which they aren't. But again, maybe I have a different or incorrect understanding of what compatibility means here.

nenb · 2024-05-28T00:51:04Z

Superseded by #425

Add OpenAI API compatible assistant

9699116

pmeier requested changes May 27, 2024

View reviewed changes

pmeier added the pr-status: do-not-merge 🛑 Do not merge (yet) label May 27, 2024

pmeier mentioned this pull request May 27, 2024

refactor assistant streaming and create OpenAI compliant base class #425

Merged

nenb closed this May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenAI API compatible assistant #424

Add OpenAI API compatible assistant #424

nenb commented May 24, 2024 •

edited

Loading

pmeier left a comment

pmeier May 27, 2024

nenb May 27, 2024

pmeier May 27, 2024

pmeier May 27, 2024

nenb May 27, 2024

pmeier May 27, 2024

nenb May 27, 2024

pmeier May 27, 2024

nenb May 28, 2024 •

edited

Loading

nenb commented May 28, 2024

Add OpenAI API compatible assistant #424

Add OpenAI API compatible assistant #424

Conversation

nenb commented May 24, 2024 • edited Loading

pmeier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nenb May 28, 2024 • edited Loading

Choose a reason for hiding this comment

nenb commented May 28, 2024

nenb commented May 24, 2024 •

edited

Loading

nenb May 28, 2024 •

edited

Loading