Skip to content
Merged
12 changes: 12 additions & 0 deletions docs/deployments/realtime-api/predictors.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,10 @@ class PythonPredictor:
Useful for tasks that the client doesn't need to wait on before
receiving a response such as recording metrics or storing results.

Note: post_predict() and predict() run in the same thread pool. The
size of the thread pool can be increased by updating
`threads_per_process` in the api configuration yaml.

Args:
response (optional): The response as returned by the predict method.
payload (optional): The request payload (see below for the possible
Expand Down Expand Up @@ -245,6 +249,10 @@ class TensorFlowPredictor:
Useful for tasks that the client doesn't need to wait on before
receiving a response such as recording metrics or storing results.

Note: post_predict() and predict() run in the same thread pool. The
size of the thread pool can be increased by updating
`threads_per_process` in the api configuration yaml.

Args:
response (optional): The response as returned by the predict method.
payload (optional): The request payload (see below for the possible
Expand Down Expand Up @@ -353,6 +361,10 @@ class ONNXPredictor:
Useful for tasks that the client doesn't need to wait on before
receiving a response such as recording metrics or storing results.

Note: post_predict() and predict() run in the same thread pool. The
size of the thread pool can be increased by updating
`threads_per_process` in the api configuration yaml.

Args:
response (optional): The response as returned by the predict method.
payload (optional): The request payload (see below for the possible
Expand Down
2 changes: 1 addition & 1 deletion pkg/workloads/cortex/serve/serve.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ def predict(request: Request):

if util.has_method(predictor_impl, "post_predict"):
kwargs = build_post_predict_kwargs(prediction, request)
tasks.add_task(predictor_impl.post_predict, **kwargs)
request_thread_pool.submit(predictor_impl.post_predict, **kwargs)

if len(tasks.tasks) > 0:
response.background = tasks
Expand Down