Issue Description:
The endpoint http://localhost:8000/v1/chat/completions is failing to return a proper response when integrated with third-party AI chat frontends such as Chatbox and OpenCat. These applications are compatible with OpenAI endpoints and function correctly with other OpenAI-compatible endpoints, such as Ollama.
Operating System:
Python Version:
Affected Applications:
Expected Behavior:
When using the endpoint with the third-party AI chat applications, the prompt should be sent to the server, processed correctly, and the response should be returned and displayed within the frontend application.
Observed Behavior:
- The third-party application sends a request to the endpoint, and the request is received and processed correctly by the
optillm.py server (e.g., using the re2-gpt4o-mini model).
- The server generates a response, but when attempting to send the response back via POST, the calling app does not retrieve the response text.
- The chat output appears empty within the Chatbox and OpenCat applications, even though the server logs indicate that the response was generated.
Steps to Reproduce:
- Set up a local server with the endpoint
http://localhost:8000/v1/chat/completions.
- Integrate with a third-party AI chat frontend like Chatbox or OpenCat (configured for OpenAI-compatible endpoints).
- Submit a prompt via the application.
- Observe the server logs, showing the request is processed correctly.
- The response is not displayed within the frontend application (empty output).
Additional Information:
- The issue does not occur when using the Ollama endpoint, which works fine in the same applications.
- The server does return a response, as seen in the terminal, but the response is not visible in the frontend chat UI.
Possible Causes:
- There may be an issue with how the response is being returned via POST.
- The format of the response may not match exactly what the third-party applications expect, which in theory should be a OpenAI compatible response
Issue Description:
The endpoint
http://localhost:8000/v1/chat/completionsis failing to return a proper response when integrated with third-party AI chat frontends such as Chatbox and OpenCat. These applications are compatible with OpenAI endpoints and function correctly with other OpenAI-compatible endpoints, such as Ollama.Operating System:
Python Version:
Affected Applications:
Expected Behavior:
When using the endpoint with the third-party AI chat applications, the prompt should be sent to the server, processed correctly, and the response should be returned and displayed within the frontend application.
Observed Behavior:
optillm.pyserver (e.g., using there2-gpt4o-minimodel).Steps to Reproduce:
http://localhost:8000/v1/chat/completions.Additional Information:
Possible Causes: