Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stepfunctions-tasks): add support for ModelClientConfig to SageMakerCreateTransformJob #11892

Merged

Conversation

setu4993
Copy link
Contributor

@setu4993 setu4993 commented Dec 6, 2020

Noticed support for ModelClientConfig was missing from this particular type of job, so attempted to add it.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@gitpod-io
Copy link

gitpod-io bot commented Dec 6, 2020

/**
* Configures the timeout and maximum number of retries for processing a transform job invocation.
*
* @default
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No default value is specified on the API page or the description page for the ModelClientConfig.

I have observed that the default value for InvocationsMaxRetries is 3 in practice, and InvocationsTimeoutInSeconds is less than 1200 (not sure about the exact value), but could be wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than Config, we use the suffix Options to specify a bag of properties within the CDK.
perhaps this property type should be called ModelClientOptions and the property modelClientOptions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that's the value we should document as the default text - if it's not documented we should deploy it and find out :)

@setu4993
Copy link
Contributor Author

setu4993 commented Dec 6, 2020

@shivlaks : The tests are succeeding now and this PR is ready for review. Looking forward to your recommendations on making this better.

/**
* Configures the timeout and maximum number of retries for processing a transform job invocation.
*
* @default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than Config, we use the suffix Options to specify a bag of properties within the CDK.
perhaps this property type should be called ModelClientOptions and the property modelClientOptions

/**
* Configures the timeout and maximum number of retries for processing a transform job invocation.
*
* @default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that's the value we should document as the default text - if it's not documented we should deploy it and find out :)

packages/@aws-cdk/aws-stepfunctions-tasks/README.md Outdated Show resolved Hide resolved
*
* @default
*/
readonly modelClientConfig?: ModelClientConfig;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to specify this property through state input? (i.e. a string that says $.field - if so we might need an enum like class here. if not, you can disregard this comment.

@shivlaks shivlaks changed the title feat(aws-stepfunctions-tasks): add support for ModelClientConfig to SageMakerCreateTransformJob feat(stepfunctions-tasks): add support for ModelClientConfig to SageMakerCreateTransformJob Dec 9, 2020
@setu4993
Copy link
Contributor Author

Thanks for your feedback, @shivlaks! Will update the PR shortly with your recommendations.

@setu4993 setu4993 force-pushed the feature/step-functions-transform-model-client-config branch from 281fa26 to 1244b37 Compare December 10, 2020 07:32
@mergify mergify bot dismissed shivlaks’s stale review December 10, 2020 07:32

Pull request has been modified.

Copy link
Contributor

@shivlaks shivlaks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@setu4993 thanks for the fast turnaround!! - couple of small suggestions.
take a look and let me know if you have any questions. we should be able to get this one merged soon :)

private renderModelClientOptions(config: ModelClientOptions): { [key: string]: any } {
return {
ModelClientConfig: {
InvocationsMaxRetries: config.invocationsMaxRetries ?? 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we should explicitly set retries to zero. it feels like a bad default, we should let the service defaults take over (which I think is 3?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 is actually the default for retries. Typically, after the first try fails, the job fails. I just ran into it today and so had a chance to confirm it :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you recommend we let the service defaults take over? The reason I added those was to avoid the case when one of invocationsMaxRetries or invocationsTimeout was set, but not both. That is the only case when it would get to this function. Maybe simply replace this with: InvocationsMaxRetries: config.invocationsMaxRetries? would do it?

Copy link
Contributor Author

@setu4993 setu4993 Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated these to InvocationsMaxRetries: config.invocationsMaxRetries? and similar for the timeout. So, CDK doesn't set the default values but lets the service-level defaults take over.

Had to revert this because it was failing builds.

Would appreciate any recommendation you have on what to do here given 0 is the default at the service-level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to set it after some basic validation? - i.e. it exists, and is greater than 0? - otherwise, I think it should be fine if the service defaults are 0. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it should also be < 3, so I can add validations for both > 0 and < 3 to it later today.

return {
ModelClientConfig: {
InvocationsMaxRetries: config.invocationsMaxRetries ?? 0,
InvocationsTimeoutInSeconds: config.invocationsTimeout?.toSeconds() ?? 300,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment here, rather than explicitly setting one, we should fall back to the service defaults? If it's 300, we're only documenting that behaviour but not explicitly setting it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I was trying to avoid the corner case when only 1 of them is defined. I can update these values based on what I've observed from practice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 3 the limit on retries? I was just thinking the > 0 part for validation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 3 is the limit on the number of retries.

Valid Range: Minimum value of 0. Maximum value of 3.

@mergify mergify bot dismissed shivlaks’s stale review December 11, 2020 07:24

Pull request has been modified.

@setu4993
Copy link
Contributor Author

@shivlaks : Updated the defaults based on what I observed over a couple batch transform jobs I had a chance to run today. The defaults appear to be 0 retries with 60 second timeout (same as for a SageMaker endpoint).

Docs from the endpoint page:

A customer's model containers must respond to requests within 60 seconds. The model itself can have a maximum processing time of 60 seconds before responding to invocations. If your model is going to take 50-60 seconds of processing time, the SDK socket timeout should be set to be 70 seconds.

Comment on lines 185 to 192
const retries = config.invocationsMaxRetries;
if (retries? (retries < 0 || retries > 3): false) {
throw new RangeError(`invocationsMaxRetries should be between 0 and 3, not ${retries}.`);
}
const timeout = config.invocationsTimeout?.toSeconds();
if (timeout? (timeout < 1 || timeout > 3600): false) {
throw new RangeError(`invocationsTimeout should be between 1 and 3600 seconds, not ${timeout}.`);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added validation for both the properties based on what is documented on the API page.

Copy link
Contributor

@shivlaks shivlaks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost there! one last thing regarding tokens.
check out my comment and let me know if you have any questions there.

@@ -173,6 +181,23 @@ export class SageMakerCreateTransformJob extends sfn.TaskStateBase {
};
}

private renderModelClientOptions(config: ModelClientOptions): { [key: string]: any } {
const retries = config.invocationsMaxRetries;
if (retries? (retries < 0 || retries > 3): false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these values may be represented as tokens (i.e. if they're passed in as parameters to the stack). we can only validate if they're not tokens, so we would want to add a check that !Token.isUnResolved(retries) before proceeding with the check

checkout Token.isUnresolved in the repo for a few examples.
Same comment applies to the timeout.

Co-authored-by: Shiv Lakshminarayan <shivlaks@amazon.com>
@mergify mergify bot dismissed shivlaks’s stale review December 15, 2020 07:54

Pull request has been modified.

Copy link
Contributor

@shivlaks shivlaks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@setu4993 thanks for contributing!!

@mergify
Copy link
Contributor

mergify bot commented Dec 15, 2020

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: 1eee0da
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify
Copy link
Contributor

mergify bot commented Dec 15, 2020

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit bf05092 into aws:master Dec 15, 2020
@setu4993
Copy link
Contributor Author

Thanks for your support on this, @shivlaks!

@setu4993 setu4993 deleted the feature/step-functions-transform-model-client-config branch December 15, 2020 19:03
flochaz pushed a commit to flochaz/aws-cdk that referenced this pull request Jan 5, 2021
…akerCreateTransformJob (aws#11892)

Noticed support for [ModelClientConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTransformJob.html#sagemaker-CreateTransformJob-request-ModelClientConfig) was missing from this particular type of job, so attempted to add it.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants