Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added database to inference server #1446

Merged
merged 4 commits into from Feb 10, 2023
Merged

Added database to inference server #1446

merged 4 commits into from Feb 10, 2023

Conversation

yk
Copy link
Collaborator

@yk yk commented Feb 10, 2023

also some robustifications of the inference code and some cleanup

await asyncio.sleep(1)
continue

chat.message_request_state = MessageRequestState.in_progress
chat_repository.set_chat_state(chat.id, interface.MessageRequestState.in_progress)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "dequeue" operation needs to set the chat immediately into a state that prevents others clients from also dequeuing it. Imagine we have 5k connected inference clients and all want to handle chats... this dequeue op will be a massive congestion point. A possible strategy could be to find and update in one sql-statement sent to the db .. or to use a dedicated queue service.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes absolutely, this part will have to be re-worked anyway, probably best to do it with redis, too.
I'll merge this one in and improve later

Copy link
Collaborator

@andreaskoepf andreaskoepf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Need to actually test to get a feeling for it. My major concern is the chat-dequeue/worker-handshake op .. as it is implemented currently the DB session would probably have to run in serializable mode. The deque-op (find and assign matching chat) IMO must be atomic from a db-client perspective so that even 10k concurrent requests are no problem.

if time.time() > time_limit:
raise
logger.warning("Inference server not ready. Retrying...")
time.sleep(5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could randomize this delay to de-synchronize client retry requests. Otherwise all clients will try to reconnect at a similar time in case of a server restart etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

@yk yk merged commit 90c3d56 into main Feb 10, 2023
@yk yk deleted the inference-db branch February 10, 2023 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants