-
Notifications
You must be signed in to change notification settings - Fork 830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2: update triton, grpc payload size, cli raw output processing #4560
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some minor comments
@@ -119,6 +119,7 @@ func (rp *reverseGRPCProxy) Start() error { | |||
opts = append(opts, grpc.Creds(rp.tlsOptions.Cert.CreateServerTransportCredentials())) | |||
} | |||
opts = append(opts, grpc.MaxConcurrentStreams(grpcProxyMaxConcurrentStreams)) | |||
opts = append(opts, grpc.MaxRecvMsgSize(util.GrpcMaxMsgSizeBytes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we are not setting MaxCallSendMsgSize
as the other ones here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can see, MaxCall
= client options whereas Max...
(without Call
) means server options.
I do agree that Send
and Rcv
should probably both be set. If so, using the same value would make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update
scheduler/pkg/agent/v2.go
Outdated
@@ -116,6 +116,7 @@ func getV2GrpcConnection(host string, plainTxtPort int) (*grpc.ClientConn, error | |||
|
|||
opts := []grpc.DialOption{ | |||
grpc.WithTransportCredentials(insecure.NewCredentials()), | |||
grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(util.GrpcMaxMsgSizeBytes), grpc.MaxCallSendMsgSize(util.GrpcMaxMsgSizeBytes)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to set this for control plane operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a bad thing to be explicit about what our defaults are, I think, rather than relying on whatever a library happens to set (which may or may not make sense for us, or may not even be set). With that said, I'd generally expect the control plane to use shorter messages, so wouldn't expect the same maxima.
I'm assuming this value might be used when creating buffers. I'd imagine smaller buffers are used when possible and expanded up to the maximum, but just in case we might want to use a lower value to avoid guzzling too much memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove here
scheduler/pkg/util/constants.go
Outdated
@@ -21,5 +21,6 @@ import "time" | |||
const ( | |||
GrpcRetryBackoffMillisecs = 100 | |||
GrpcRetryMaxCount = 5 // around 3.2s in total wait duration | |||
GrpcMaxMsgSizeBytes = 100 * 1024 * 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this aligning with max kafka payload size? as we are using it in the pipeline gateway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's currently 1Gb so can increase by 10 here
@@ -13,7 +13,7 @@ RCLONE_IMG ?= ${DOCKERHUB_USERNAME}/seldon-rclone:${CUSTOM_IMAGE_TAG} | |||
SCHEDULER_IMG ?= ${DOCKERHUB_USERNAME}/seldon-scheduler:${CUSTOM_IMAGE_TAG} | |||
|
|||
MLSERVER_IMG ?= seldonio/mlserver:1.2.1 | |||
TRITON_IMG ?= nvcr.io/nvidia/tritonserver:22.05-py3 | |||
TRITON_IMG ?= nvcr.io/nvidia/tritonserver:22.11-py3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any particular reason why are upgrading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pytorch backend is main reason as its tied to a particular version of pytorch
Have you created an issue on MLServer just not to forget? |
@adriangonz is aware |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts for improvements, but largely look sensible
operator/pkg/cli/infer.go
Outdated
if contents == nil { | ||
return true | ||
} else { | ||
if contents.Fp32Contents == nil && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙃 Alphabetical order would be slightly easier to read and compare
operator/pkg/cli/infer.go
Outdated
contents.Uint64Contents == nil && | ||
contents.BytesContents == nil && | ||
contents.IntContents == nil && | ||
contents.Int64Contents == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 Checking for just null values might not be very reliable. We probably want to check that len(contents.X) == 0
, as this will cover both the null case and the not-null-but-empty case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want to be careful for non-null empty as that might be a valid result, e.g. existence
@@ -119,6 +119,7 @@ func (rp *reverseGRPCProxy) Start() error { | |||
opts = append(opts, grpc.Creds(rp.tlsOptions.Cert.CreateServerTransportCredentials())) | |||
} | |||
opts = append(opts, grpc.MaxConcurrentStreams(grpcProxyMaxConcurrentStreams)) | |||
opts = append(opts, grpc.MaxRecvMsgSize(util.GrpcMaxMsgSizeBytes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can see, MaxCall
= client options whereas Max...
(without Call
) means server options.
I do agree that Send
and Rcv
should probably both be set. If so, using the same value would make sense.
scheduler/pkg/agent/v2.go
Outdated
@@ -116,6 +116,7 @@ func getV2GrpcConnection(host string, plainTxtPort int) (*grpc.ClientConn, error | |||
|
|||
opts := []grpc.DialOption{ | |||
grpc.WithTransportCredentials(insecure.NewCredentials()), | |||
grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(util.GrpcMaxMsgSizeBytes), grpc.MaxCallSendMsgSize(util.GrpcMaxMsgSizeBytes)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a bad thing to be explicit about what our defaults are, I think, rather than relying on whatever a library happens to set (which may or may not make sense for us, or may not even be set). With that said, I'd generally expect the control plane to use shorter messages, so wouldn't expect the same maxima.
I'm assuming this value might be used when creating buffers. I'd imagine smaller buffers are used when possible and expanded up to the maximum, but just in case we might want to use a lower value to avoid guzzling too much memory.
scheduler/pkg/util/constants.go
Outdated
@@ -21,5 +21,6 @@ import "time" | |||
const ( | |||
GrpcRetryBackoffMillisecs = 100 | |||
GrpcRetryMaxCount = 5 // around 3.2s in total wait duration | |||
GrpcMaxMsgSizeBytes = 100 * 1024 * 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 This is a reasonable default for probably quite a lot of use cases, but might not be sufficient for all. For example, think about sending image or video data, especially if uncompressed. Alternatively, there might be organisational requirements to allow at least X (and for headers) or to not use more than Y for whatever reason.
Given that we want all our data-plane components to be consistent with this value, and also given users might need to change it, should we add a config value for this? Fine for that to be a separate PR, as we'd also need to do validation (e.g. is numeric, is not negative, etc.).
@@ -88,6 +88,7 @@ func (g *GatewayGrpcServer) Start() error { | |||
opts = append(opts, grpc.Creds(g.tlsOptions.Cert.CreateServerTransportCredentials())) | |||
} | |||
opts = append(opts, grpc.MaxConcurrentStreams(maxConcurrentStreams)) | |||
opts = append(opts, grpc.MaxRecvMsgSize(util.GrpcMaxMsgSizeBytes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same point here as in other server components about setting the max send size to match this. A response from Kafka might be larger than the input (e.g. in the case of something generative, an up-scaler, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Note - MLServer would need an increase for this to fully work. So far have not seen an issue with Triton server and increased payload sizes for gRPC above the 4MB default.