v2: update triton, grpc payload size, cli raw output processing #4560

ukclivecox · 2023-01-08T19:33:48Z

Updates to a more recent triton server
Ensure CLI handles raw payload conversion when there is non-null payload but with all entries null
Increase max-grpc message sizes

Note - MLServer would need an increase for this to fully work. So far have not seen an issue with Triton server and increased payload sizes for gRPC above the 4MB default.

review-notebook-app · 2023-01-08T19:33:52Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

sakoush

I left some minor comments

sakoush · 2023-01-09T09:23:05Z

scheduler/pkg/agent/rproxy_grpc.go

@@ -119,6 +119,7 @@ func (rp *reverseGRPCProxy) Start() error {
 		opts = append(opts, grpc.Creds(rp.tlsOptions.Cert.CreateServerTransportCredentials()))
 	}
 	opts = append(opts, grpc.MaxConcurrentStreams(grpcProxyMaxConcurrentStreams))
+	opts = append(opts, grpc.MaxRecvMsgSize(util.GrpcMaxMsgSizeBytes))


why we are not setting MaxCallSendMsgSize as the other ones here as well?

From what I can see, MaxCall = client options whereas Max... (without Call) means server options.

I do agree that Send and Rcv should probably both be set. If so, using the same value would make sense.

will update

sakoush · 2023-01-09T09:25:30Z

scheduler/pkg/agent/v2.go

@@ -116,6 +116,7 @@ func getV2GrpcConnection(host string, plainTxtPort int) (*grpc.ClientConn, error

 	opts := []grpc.DialOption{
 		grpc.WithTransportCredentials(insecure.NewCredentials()),
+		grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(util.GrpcMaxMsgSizeBytes), grpc.MaxCallSendMsgSize(util.GrpcMaxMsgSizeBytes)),


do we need to set this for control plane operations?

It's not a bad thing to be explicit about what our defaults are, I think, rather than relying on whatever a library happens to set (which may or may not make sense for us, or may not even be set). With that said, I'd generally expect the control plane to use shorter messages, so wouldn't expect the same maxima.

I'm assuming this value might be used when creating buffers. I'd imagine smaller buffers are used when possible and expanded up to the maximum, but just in case we might want to use a lower value to avoid guzzling too much memory.

will remove here

sakoush · 2023-01-09T09:27:14Z

scheduler/pkg/util/constants.go

@@ -21,5 +21,6 @@ import "time"
 const (
 	GrpcRetryBackoffMillisecs         = 100
 	GrpcRetryMaxCount                 = 5 // around 3.2s in total wait duration
+	GrpcMaxMsgSizeBytes               = 100 * 1024 * 1024


is this aligning with max kafka payload size? as we are using it in the pipeline gateway.

That's currently 1Gb so can increase by 10 here

sakoush · 2023-01-09T09:28:30Z

k8s/Makefile

@@ -13,7 +13,7 @@ RCLONE_IMG ?= ${DOCKERHUB_USERNAME}/seldon-rclone:${CUSTOM_IMAGE_TAG}
 SCHEDULER_IMG ?= ${DOCKERHUB_USERNAME}/seldon-scheduler:${CUSTOM_IMAGE_TAG}

 MLSERVER_IMG ?= seldonio/mlserver:1.2.1
-TRITON_IMG ?= nvcr.io/nvidia/tritonserver:22.05-py3
+TRITON_IMG ?= nvcr.io/nvidia/tritonserver:22.11-py3


any particular reason why are upgrading?

the pytorch backend is main reason as its tied to a particular version of pytorch

sakoush · 2023-01-09T09:31:01Z

Note - MLServer would need an increase for this to fully work. So far have not seen an issue with Triton server and increased payload sizes for gRPC above the 4MB default.

Have you created an issue on MLServer just not to forget?

ukclivecox · 2023-01-09T09:42:02Z

Note - MLServer would need an increase for this to fully work. So far have not seen an issue with Triton server and increased payload sizes for gRPC above the 4MB default.

Have you created an issue on MLServer just not to forget?

@adriangonz is aware

agrski

A few thoughts for improvements, but largely look sensible

agrski · 2023-01-09T12:24:15Z

operator/pkg/cli/infer.go

+	if contents == nil {
+		return true
+	} else {
+		if contents.Fp32Contents == nil &&


🙃 Alphabetical order would be slightly easier to read and compare

agrski · 2023-01-09T12:26:37Z

operator/pkg/cli/infer.go

+			contents.Uint64Contents == nil &&
+			contents.BytesContents == nil &&
+			contents.IntContents == nil &&
+			contents.Int64Contents == nil {


💭 Checking for just null values might not be very reliable. We probably want to check that len(contents.X) == 0, as this will cover both the null case and the not-null-but-empty case.

I think we might want to be careful for non-null empty as that might be a valid result, e.g. existence

agrski · 2023-01-09T12:42:23Z

scheduler/pkg/agent/rproxy_grpc.go

@@ -119,6 +119,7 @@ func (rp *reverseGRPCProxy) Start() error {
 		opts = append(opts, grpc.Creds(rp.tlsOptions.Cert.CreateServerTransportCredentials()))
 	}
 	opts = append(opts, grpc.MaxConcurrentStreams(grpcProxyMaxConcurrentStreams))
+	opts = append(opts, grpc.MaxRecvMsgSize(util.GrpcMaxMsgSizeBytes))


From what I can see, MaxCall = client options whereas Max... (without Call) means server options.

I do agree that Send and Rcv should probably both be set. If so, using the same value would make sense.

agrski · 2023-01-09T12:50:54Z

scheduler/pkg/agent/v2.go

@@ -116,6 +116,7 @@ func getV2GrpcConnection(host string, plainTxtPort int) (*grpc.ClientConn, error

 	opts := []grpc.DialOption{
 		grpc.WithTransportCredentials(insecure.NewCredentials()),
+		grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(util.GrpcMaxMsgSizeBytes), grpc.MaxCallSendMsgSize(util.GrpcMaxMsgSizeBytes)),


It's not a bad thing to be explicit about what our defaults are, I think, rather than relying on whatever a library happens to set (which may or may not make sense for us, or may not even be set). With that said, I'd generally expect the control plane to use shorter messages, so wouldn't expect the same maxima.

I'm assuming this value might be used when creating buffers. I'd imagine smaller buffers are used when possible and expanded up to the maximum, but just in case we might want to use a lower value to avoid guzzling too much memory.

agrski · 2023-01-09T12:55:48Z

scheduler/pkg/util/constants.go

@@ -21,5 +21,6 @@ import "time"
 const (
 	GrpcRetryBackoffMillisecs         = 100
 	GrpcRetryMaxCount                 = 5 // around 3.2s in total wait duration
+	GrpcMaxMsgSizeBytes               = 100 * 1024 * 1024


💭 This is a reasonable default for probably quite a lot of use cases, but might not be sufficient for all. For example, think about sending image or video data, especially if uncompressed. Alternatively, there might be organisational requirements to allow at least X (and for headers) or to not use more than Y for whatever reason.

Given that we want all our data-plane components to be consistent with this value, and also given users might need to change it, should we add a config value for this? Fine for that to be a separate PR, as we'd also need to do validation (e.g. is numeric, is not negative, etc.).

agrski · 2023-01-09T12:57:13Z

scheduler/pkg/kafka/pipeline/grpcserver.go

@@ -88,6 +88,7 @@ func (g *GatewayGrpcServer) Start() error {
 		opts = append(opts, grpc.Creds(g.tlsOptions.Cert.CreateServerTransportCredentials()))
 	}
 	opts = append(opts, grpc.MaxConcurrentStreams(maxConcurrentStreams))
+	opts = append(opts, grpc.MaxRecvMsgSize(util.GrpcMaxMsgSizeBytes))


Same point here as in other server components about setting the max send size to match this. A response from Kafka might be larger than the input (e.g. in the case of something generative, an up-scaler, etc.).

sakoush

LGTM

update triton, grpc payload size, cli raw output processing

c740749

ukclivecox added the v2 label Jan 8, 2023

sakoush reviewed Jan 9, 2023

View reviewed changes

agrski reviewed Jan 9, 2023

View reviewed changes

Review fixes

83bfa14

sakoush approved these changes Jan 10, 2023

View reviewed changes

ukclivecox merged commit 37255bd into SeldonIO:v2 Jan 10, 2023

adriangonz mentioned this pull request Jan 11, 2023

Bump MLServer's message size #4571

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2: update triton, grpc payload size, cli raw output processing #4560

v2: update triton, grpc payload size, cli raw output processing #4560

ukclivecox commented Jan 8, 2023

review-notebook-app bot commented Jan 8, 2023

sakoush left a comment

sakoush Jan 9, 2023

agrski Jan 9, 2023

ukclivecox Jan 10, 2023

sakoush Jan 9, 2023

agrski Jan 9, 2023

ukclivecox Jan 10, 2023

sakoush Jan 9, 2023

ukclivecox Jan 10, 2023

sakoush Jan 9, 2023

ukclivecox Jan 9, 2023

sakoush commented Jan 9, 2023

ukclivecox commented Jan 9, 2023

agrski left a comment

agrski Jan 9, 2023

agrski Jan 9, 2023

ukclivecox Jan 10, 2023

agrski Jan 9, 2023

agrski Jan 9, 2023

agrski Jan 9, 2023

agrski Jan 9, 2023

sakoush left a comment

v2: update triton, grpc payload size, cli raw output processing #4560

v2: update triton, grpc payload size, cli raw output processing #4560

Conversation

ukclivecox commented Jan 8, 2023

review-notebook-app bot commented Jan 8, 2023

sakoush left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sakoush commented Jan 9, 2023

ukclivecox commented Jan 9, 2023

agrski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sakoush left a comment

Choose a reason for hiding this comment