Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of the inference system #869

Merged
merged 5 commits into from Jan 21, 2023
Merged

Conversation

yk
Copy link
Collaborator

@yk yk commented Jan 20, 2023

This PR introduces:

  • A server for coordination
  • A minimal worker
  • A text client

all building on redis lists to stream data as it is being produced

@yk yk marked this pull request as ready for review January 21, 2023 13:53
@yk yk requested a review from andreaskoepf as a code owner January 21, 2023 13:53
Copy link
Collaborator

@andreaskoepf andreaskoepf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's get the initial impl in.

await asyncio.sleep(1)
continue

chat.message_request_state = MessageRequestState.in_progress
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have >1 "message-broker" instances and CHATS in db/redis then this "dequeue" operation will become a congestion point. One idea would be to define clear "configuration" tiers, e.g. based on GPU memory requirements and have independent task queues for them.

@andreaskoepf andreaskoepf merged commit 1709dc0 into main Jan 21, 2023
@andreaskoepf andreaskoepf deleted the initial-inference branch January 21, 2023 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants