I created a project for detection and classification on image frames.
I implemented detection and classification engines using the following constructor:
def __init__(
self,
verbose: Optional[bool] = False,
workspace: Optional[int] = 8,
) -> None:
self.trt_logger = trt.Logger(trt.Logger.INFO)
if verbose:
self.trt_logger.min_severity = trt.Logger.Severity.VERBOSE
trt.init_libnvinfer_plugins(self.trt_logger, namespace="")
self.builder = trt.Builder(self.trt_logger)
self.config = self.builder.create_builder_config()
self.config.max_workspace_size = workspace * (2**30)
self.network = None
self.parser = None
self.batch_size = None
I designed my pipeline to run inference on detection first, then pass all detected boxes into classification. I deployed this as an API. However, when I scaled the service by replicating it, the Requests Per Second (RPS) remained the same.
For example, when I replicated the service 5 times, GPU usage increased 5x, but the RPS did not improve.
Do you have any ideas on how to double the RPS?
ps. I run this API using docker compose and use nginx as lb
I created a project for detection and classification on image frames.
I implemented detection and classification engines using the following constructor:
I designed my pipeline to run inference on detection first, then pass all detected boxes into classification. I deployed this as an API. However, when I scaled the service by replicating it, the Requests Per Second (RPS) remained the same.
For example, when I replicated the service 5 times, GPU usage increased 5x, but the RPS did not improve.
Do you have any ideas on how to double the RPS?
ps. I run this API using docker compose and use nginx as lb