Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(glue): Timeout and workerType missing validation for Ray job types #29612

Open
haljarrett opened this issue Mar 26, 2024 · 2 comments
Open

(glue): Timeout and workerType missing validation for Ray job types #29612

haljarrett opened this issue Mar 26, 2024 · 2 comments
Labels
@aws-cdk/aws-glue Related to AWS Glue bug This issue is a bug. effort/medium Medium work item – several days of effort p3

Comments

@haljarrett
Copy link

Describe the bug

When developing Ray jobs via CDK, I bumped into the fact that specifying a timeout or not specifying a worker type on Job for Ray job types will succeed in synthesis, but fail deployment.

Expected Behavior

I'd expect an error to be thrown if workerType is not provided or timeout is provided, as both of these will cause deployment failures for Ray job types.

Current Behavior

Synthesis passes and failure occurs on deployment:

 UPDATE_FAILED [...] Timeout not supported for Ray jobs

or

CREATE_FAILED [...] Worker type cannot be null and only [Z.2X] worker types are supported for glueray jobs

Reproduction Steps

#!/usr/bin/env node
import * as cdk from 'aws-cdk-lib';
import { Code, GlueVersion, Job, JobExecutable, PythonVersion, Runtime, WorkerType } from '@aws-cdk/aws-glue-alpha';

const app = new cdk.App();
const stack = new cdk.Stack(app, 'glue-job-issue-stack');
const job = new Job(stack, 'MyJob', {
  jobName: 'glue-issue-test-job',
  executable: JobExecutable.pythonRay({
    glueVersion: GlueVersion.V4_0,
    pythonVersion: PythonVersion.THREE_NINE,
    runtime: Runtime.RAY_TWO_FOUR,
    script: Code.fromAsset('lib/test-job.py')
  }),
  workerType: WorkerType.G_025X,
  workerCount: 1,
  timeout: cdk.Duration.minutes(10),
});

Possible Solution

Additional validation in packages/@aws-cdk/aws-glue-alpha/lib/job.ts would address this and provide a slightly smoother developer experience when working with ray job types. I would be interested in contributing a fix if triage determines this is valid bug.

Additional Information/Context

No response

CDK CLI Version

2.133.0

Framework Version

2.133.0

Node.js Version

20.10.0

OS

macOS

Language

TypeScript

Language Version

5.3.3

Other information

No response

@haljarrett haljarrett added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 26, 2024
@github-actions github-actions bot added the @aws-cdk/aws-glue Related to AWS Glue label Mar 26, 2024
@haljarrett haljarrett changed the title (aws-glue-alpha): Timeout and workerType missing validation for Ray job types (glue): Timeout and workerType missing validation for Ray job types Mar 26, 2024
@tim-finnigan tim-finnigan self-assigned this Mar 26, 2024
@tim-finnigan tim-finnigan added investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels Mar 26, 2024
@tim-finnigan
Copy link

Thanks for creating this issue as well as the PR. I think adding validation as you described makes sense if the deployment is failing. Linking Ray job documentation for reference:

Did not see much on the topics in this issue, maybe more needs to be added on that.

@tim-finnigan tim-finnigan added p2 effort/medium Medium work item – several days of effort and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Mar 26, 2024
@tim-finnigan tim-finnigan removed their assignment Mar 26, 2024
@haljarrett
Copy link
Author

Hey Tim, thanks for attaching the docs there - in the same vein, this is also not noted in:

The worker type constraint is mentioned, but for both, timeout is just listed as an optional parameter, no mention that it must be omitted in some circumstances. Having the only documentation be in the request failure message is not ideal, but not sure if it's something that even can be enforced at the L1 / Cfn level, so validating at the L2 construct level should hopefully avoid some frustration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-glue Related to AWS Glue bug This issue is a bug. effort/medium Medium work item – several days of effort p3
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants