New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added database to inference server #1446
Conversation
await asyncio.sleep(1) | ||
continue | ||
|
||
chat.message_request_state = MessageRequestState.in_progress | ||
chat_repository.set_chat_state(chat.id, interface.MessageRequestState.in_progress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "dequeue" operation needs to set the chat immediately into a state that prevents others clients from also dequeuing it. Imagine we have 5k connected inference clients and all want to handle chats... this dequeue op will be a massive congestion point. A possible strategy could be to find and update in one sql-statement sent to the db .. or to use a dedicated queue service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes absolutely, this part will have to be re-worked anyway, probably best to do it with redis, too.
I'll merge this one in and improve later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Need to actually test to get a feeling for it. My major concern is the chat-dequeue/worker-handshake op .. as it is implemented currently the DB session would probably have to run in serializable mode. The deque-op (find and assign matching chat) IMO must be atomic from a db-client perspective so that even 10k concurrent requests are no problem.
inference/worker/utils.py
Outdated
if time.time() > time_limit: | ||
raise | ||
logger.warning("Inference server not ready. Retrying...") | ||
time.sleep(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could randomize this delay to de-synchronize client retry requests. Otherwise all clients will try to reconnect at a similar time in case of a server restart etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
also some robustifications of the inference code and some cleanup