-
Notifications
You must be signed in to change notification settings - Fork 606
Closed
Labels
enhancementNew feature or requestNew feature or requestresearchDetermine technical constraintsDetermine technical constraints
Milestone
Description
Description
Replace Flask with another web framework (Fast API?)
Motivation
Improve API, concurrency, performance and efficiency.
Additional Context
- currently, waitress serves 4 threads (default)
- ensure readiness probe/health check is responded to in a timely fashion, otherwise, pod status and in load balancing may be affected (requests may start being queued in load balancer if readiness probes are not responding in a timely fashion)
- https://www.reddit.com/r/MachineLearning/comments/dy8hjh/p_cortex_deploy_models_from_any_framework_as
Note from previous ticket #525:
Something like FastAPI would support multithreading, which may improve throughput
API refactor checklist
- Revisit Python error wrapping
- Expose multiple-workers for parallelism
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestresearchDetermine technical constraintsDetermine technical constraints