Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor assistant streaming and create OpenAI compliant base class #425

Merged
merged 5 commits into from
May 28, 2024

Conversation

pmeier
Copy link
Member

@pmeier pmeier commented May 27, 2024

This came from an offline discussion with @nenb and supersedes #424. It also paves the way for #375.

The main two changes are

  1. Factor out the streaming protocol logic, i.e. SSE and JSONL streaming, to avoid code duplication and easy switching if this is the only change between assistants. The latter part lead directly to 2.
  2. Implement a generic OpenAI compliant base class allowing the selection of the streaming protocol, an arbitrary URL as well as optional model selection.

More details inline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was renamed to _http_api.py but has enough changes for git to not recognize it as such.



class HttpApiAssistant(Assistant):
_API_KEY_ENV_VAR: Optional[str]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API key is now optional. See #375 for a discussion.

@@ -21,8 +21,8 @@ def _make_system_content(self, sources: list[Source]) -> str:
)
return instruction + "\n\n".join(source.content for source in sources)

async def _call_api(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The def _call_api abstraction for def answer was just a remnant of an old implementation that I forgot to clean up earlier:

async def answer(
self, prompt: str, sources: list[Source], *, max_new_tokens: int = 256
) -> AsyncIterator[str]:
async for chunk in self._call_api(
prompt, sources, max_new_tokens=max_new_tokens
):
yield chunk

This PR removes it and all subclass simply implement def answer directly.



class AnthropicApiAssistant(ApiAssistant):
class AnthropicAssistant(HttpApiAssistant):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driveby rename to align it with other provider base classes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a demonstration how easy it is after this PR to add new OpenAI compliant assistants.

yield cast(str, choice["delta"]["content"])


class OpenaiAssistant(OpenaiCompliantHttpApiAssistant):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public OpenAI API fits the new scheme nicely.

Comment on lines +21 to +24
@pytest.mark.parametrize(
"assistant",
[assistant for assistant in HTTP_API_ASSISTANTS if assistant._API_KEY_ENV_VAR],
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #375


@property
def _url(self) -> str:
base_url = os.environ.get("RAGNA_LLAMAFILE_BASE_URL", "http://localhost:8080")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nenb is port 8080 the default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the case.

@pmeier
Copy link
Member Author

pmeier commented May 27, 2024

If this PR is accepted, I'll have a go at #376 and bring it up to speed.

Copy link
Contributor

@nenb nenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, delighted that ragna can connect with many local LLMs so easily now, thank you!


@property
def _url(self) -> str:
base_url = os.environ.get("RAGNA_LLAMAFILE_BASE_URL", "http://localhost:8080")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the case.

@pmeier
Copy link
Member Author

pmeier commented May 28, 2024

Touching on #424 (comment)

me having a different (incorrect?) definition of what a compliant API is

That is certainly up for debate. I think the most practical thing given the variety of cases here is: Any REST API is OpenAI compliant if it uses the same request and response schema as OpenAI.

For practicality reasons, we allow the following deviations:

  • The model can be passed in the request (OpenAI, Ollama), but doesn't have to in case the deployment only features one model (Azure OpenAI, Llamafile)
  • The streaming can either be performed with SSE (OpenAI, Azure OpenAI) or JSONL (Llamafile, Ollama)

@pmeier pmeier merged commit a45bd90 into main May 28, 2024
10 checks passed
@pmeier pmeier deleted the http-api-assistants branch May 28, 2024 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants