Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2: update triton, grpc payload size, cli raw output processing #4560

Merged
merged 2 commits into from
Jan 10, 2023
Merged

v2: update triton, grpc payload size, cli raw output processing #4560

merged 2 commits into from
Jan 10, 2023

Conversation

ukclivecox
Copy link
Contributor

  • Updates to a more recent triton server
  • Ensure CLI handles raw payload conversion when there is non-null payload but with all entries null
  • Increase max-grpc message sizes

Note - MLServer would need an increase for this to fully work. So far have not seen an issue with Triton server and increased payload sizes for gRPC above the 4MB default.

@ukclivecox ukclivecox added the v2 label Jan 8, 2023
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@sakoush sakoush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments

@@ -119,6 +119,7 @@ func (rp *reverseGRPCProxy) Start() error {
opts = append(opts, grpc.Creds(rp.tlsOptions.Cert.CreateServerTransportCredentials()))
}
opts = append(opts, grpc.MaxConcurrentStreams(grpcProxyMaxConcurrentStreams))
opts = append(opts, grpc.MaxRecvMsgSize(util.GrpcMaxMsgSizeBytes))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we are not setting MaxCallSendMsgSize as the other ones here as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see, MaxCall = client options whereas Max... (without Call) means server options.

I do agree that Send and Rcv should probably both be set. If so, using the same value would make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update

@@ -116,6 +116,7 @@ func getV2GrpcConnection(host string, plainTxtPort int) (*grpc.ClientConn, error

opts := []grpc.DialOption{
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(util.GrpcMaxMsgSizeBytes), grpc.MaxCallSendMsgSize(util.GrpcMaxMsgSizeBytes)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to set this for control plane operations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a bad thing to be explicit about what our defaults are, I think, rather than relying on whatever a library happens to set (which may or may not make sense for us, or may not even be set). With that said, I'd generally expect the control plane to use shorter messages, so wouldn't expect the same maxima.

I'm assuming this value might be used when creating buffers. I'd imagine smaller buffers are used when possible and expanded up to the maximum, but just in case we might want to use a lower value to avoid guzzling too much memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove here

@@ -21,5 +21,6 @@ import "time"
const (
GrpcRetryBackoffMillisecs = 100
GrpcRetryMaxCount = 5 // around 3.2s in total wait duration
GrpcMaxMsgSizeBytes = 100 * 1024 * 1024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this aligning with max kafka payload size? as we are using it in the pipeline gateway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's currently 1Gb so can increase by 10 here

@@ -13,7 +13,7 @@ RCLONE_IMG ?= ${DOCKERHUB_USERNAME}/seldon-rclone:${CUSTOM_IMAGE_TAG}
SCHEDULER_IMG ?= ${DOCKERHUB_USERNAME}/seldon-scheduler:${CUSTOM_IMAGE_TAG}

MLSERVER_IMG ?= seldonio/mlserver:1.2.1
TRITON_IMG ?= nvcr.io/nvidia/tritonserver:22.05-py3
TRITON_IMG ?= nvcr.io/nvidia/tritonserver:22.11-py3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason why are upgrading?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pytorch backend is main reason as its tied to a particular version of pytorch

@sakoush
Copy link
Member

sakoush commented Jan 9, 2023

Note - MLServer would need an increase for this to fully work. So far have not seen an issue with Triton server and increased payload sizes for gRPC above the 4MB default.

Have you created an issue on MLServer just not to forget?

@ukclivecox
Copy link
Contributor Author

Note - MLServer would need an increase for this to fully work. So far have not seen an issue with Triton server and increased payload sizes for gRPC above the 4MB default.

Have you created an issue on MLServer just not to forget?

@adriangonz is aware

Copy link
Contributor

@agrski agrski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts for improvements, but largely look sensible

if contents == nil {
return true
} else {
if contents.Fp32Contents == nil &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙃 Alphabetical order would be slightly easier to read and compare

contents.Uint64Contents == nil &&
contents.BytesContents == nil &&
contents.IntContents == nil &&
contents.Int64Contents == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Checking for just null values might not be very reliable. We probably want to check that len(contents.X) == 0, as this will cover both the null case and the not-null-but-empty case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to be careful for non-null empty as that might be a valid result, e.g. existence

@@ -119,6 +119,7 @@ func (rp *reverseGRPCProxy) Start() error {
opts = append(opts, grpc.Creds(rp.tlsOptions.Cert.CreateServerTransportCredentials()))
}
opts = append(opts, grpc.MaxConcurrentStreams(grpcProxyMaxConcurrentStreams))
opts = append(opts, grpc.MaxRecvMsgSize(util.GrpcMaxMsgSizeBytes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see, MaxCall = client options whereas Max... (without Call) means server options.

I do agree that Send and Rcv should probably both be set. If so, using the same value would make sense.

@@ -116,6 +116,7 @@ func getV2GrpcConnection(host string, plainTxtPort int) (*grpc.ClientConn, error

opts := []grpc.DialOption{
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(util.GrpcMaxMsgSizeBytes), grpc.MaxCallSendMsgSize(util.GrpcMaxMsgSizeBytes)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a bad thing to be explicit about what our defaults are, I think, rather than relying on whatever a library happens to set (which may or may not make sense for us, or may not even be set). With that said, I'd generally expect the control plane to use shorter messages, so wouldn't expect the same maxima.

I'm assuming this value might be used when creating buffers. I'd imagine smaller buffers are used when possible and expanded up to the maximum, but just in case we might want to use a lower value to avoid guzzling too much memory.

@@ -21,5 +21,6 @@ import "time"
const (
GrpcRetryBackoffMillisecs = 100
GrpcRetryMaxCount = 5 // around 3.2s in total wait duration
GrpcMaxMsgSizeBytes = 100 * 1024 * 1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 This is a reasonable default for probably quite a lot of use cases, but might not be sufficient for all. For example, think about sending image or video data, especially if uncompressed. Alternatively, there might be organisational requirements to allow at least X (and for headers) or to not use more than Y for whatever reason.

Given that we want all our data-plane components to be consistent with this value, and also given users might need to change it, should we add a config value for this? Fine for that to be a separate PR, as we'd also need to do validation (e.g. is numeric, is not negative, etc.).

@@ -88,6 +88,7 @@ func (g *GatewayGrpcServer) Start() error {
opts = append(opts, grpc.Creds(g.tlsOptions.Cert.CreateServerTransportCredentials()))
}
opts = append(opts, grpc.MaxConcurrentStreams(maxConcurrentStreams))
opts = append(opts, grpc.MaxRecvMsgSize(util.GrpcMaxMsgSizeBytes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point here as in other server components about setting the max send size to match this. A response from Kafka might be larger than the input (e.g. in the case of something generative, an up-scaler, etc.).

Copy link
Member

@sakoush sakoush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ukclivecox ukclivecox merged commit 37255bd into SeldonIO:v2 Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants