fastapi batch inference server demo. Fixes OpenNMT/CTranslate2/issues… #2489

dongxiaolong · 2023-10-12T02:14:23Z

Based on the discussion in OpenNMT/CTranslate2#1140, I've submitted a simple batch translation demo using fastapi. This code draws inspiration from both @hobodrifterdavid and ChatGPT. After performance testing, I found it to run efficiently. We plan to deploy it in our initial products and hope to open-source it, allowing the community to use and enhance it together.

…/1140

vince62s · 2023-10-12T07:00:43Z

Hello @dongxiaolong thanks for this submission !
As is, I think a better place would be in the CTranslate2 repo under a new sub-folder of examples.
However, I had in mind to revamp the OpenNMT-py server to make it clearer, more modular and fastapi compatible.

In this scenario it would require more work. Please have a look at the following files:
onmt/translate/translation_server.py
onmt/bin/server.py

The server.py is the frontend using flask and waitress (could be replaced by fastapi / uvicorn) and the other file is the logic using either OpenNMT-py or CTranslate2 for inference.
Code is quite old but should work and needs to be revamped a bit, especially with the new inference_engine stuff.

dongxiaolong · 2023-10-13T01:56:49Z

Hello @dongxiaolong thanks for this submission ! As is, I think a better place would be in the CTranslate2 repo under a new sub-folder of examples. However, I had in mind to revamp the OpenNMT-py server to make it clearer, more modular and fastapi compatible.

In this scenario it would require more work. Please have a look at the following files: onmt/translate/translation_server.py onmt/bin/server.py

The server.py is the frontend using flask and waitress (could be replaced by fastapi / uvicorn) and the other file is the logic using either OpenNMT-py or CTranslate2 for inference. Code is quite old but should work and needs to be revamped a bit, especially with the new inference_engine stuff.

I apologize for the delayed response. I remember you mentioned in a previous issue about wanting to enhance the OpenNMT-py server, so I made this submission here. Both CTranslate2 and OpenNMT are indeed outstanding projects. They support various task models. I also hope they can introduce more features, like FastChat and Vllm, bringing in functionalities such as continuous batch. This would allow users to deploy more conveniently and implement them rapidly in real-world products. I'm eagerly looking forward to seeing the updated server soon and collaborating with the community to refine it. Thank you once again for your contributions and prompt response.

rakesh-krishna · 2023-10-26T09:03:24Z

thanks for this code, the performace 3x because of the request batching

fastapi batch inference server demo. Fixes OpenNMT/CTranslate2/issues…

9fba479

…/1140

dongxiaolong closed this Oct 13, 2023

rakesh-krishna mentioned this pull request Oct 26, 2023

[Question] What is the value range for inputs for batch_generator and more OpenNMT/CTranslate2#1523

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fastapi batch inference server demo. Fixes OpenNMT/CTranslate2/issues… #2489

fastapi batch inference server demo. Fixes OpenNMT/CTranslate2/issues… #2489

dongxiaolong commented Oct 12, 2023

vince62s commented Oct 12, 2023

dongxiaolong commented Oct 13, 2023 •

edited

Loading

rakesh-krishna commented Oct 26, 2023

fastapi batch inference server demo. Fixes OpenNMT/CTranslate2/issues… #2489

fastapi batch inference server demo. Fixes OpenNMT/CTranslate2/issues… #2489

Conversation

dongxiaolong commented Oct 12, 2023

vince62s commented Oct 12, 2023

dongxiaolong commented Oct 13, 2023 • edited Loading

rakesh-krishna commented Oct 26, 2023

dongxiaolong commented Oct 13, 2023 •

edited

Loading