Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run existing Glue Job and How to create a new Glue Job #165

Closed
litethesko opened this issue Oct 2, 2022 · 3 comments
Closed

How to run existing Glue Job and How to create a new Glue Job #165

litethesko opened this issue Oct 2, 2022 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@litethesko
Copy link

litethesko commented Oct 2, 2022

Hi Team,

Can someone help me on how to run existing Glue Job and How to create a new Glue Job with the pyspark code files. I tried doing it multiple ways but I am unable to do it.

An AWS case # 10860038241 created to get the help from AWS Support team but did not get the great help.

Below is our conversation:

Message from AWS Support on Sep 27 2022:

It was a pleasure talking to you. Over the chime call, you explained that you were using the DDK workshop [1] and you were just trying to add a pipeline stage for the glue job with the following: code snippets.

glue_job_stage = GlueTransformStage(
self,
"ddk-glue-job",
environment_id=environment_id,
job_name="my_glue_job",
job_role=f"arn:aws:iam::512604200947:role/service-role/AWSGlueServiceRole-sample-raw-role",
crawler_role=f"arn:aws:iam::512604200947:role/service-role/AWSGlueServiceRole-sample-raw-role"

    )

.
.
.
.add_stage(glue_job_stage)

You were getting "target" property missing error but you showed me that it was an optional parameter.

As I mentioned, I have not worked on DDK before and since this issue is not directly related to CDK, I am going to need sometime to try to set up my own DDK workshop and see if I can set up a glue pipeline.

Please note that code development is outside of AWS Support, however, I will put forward my best effort to help you.

Please feel free to reply if you have any other questions.

We value your feedback. Please share your experience by rating this and other correspondences in the AWS Support Center. You can rate a correspondence by selecting the stars in the top right corner of the correspondence.

Message from AWS Support on Sep 28, 2022:

Hope you are doing well!

In order to understand the issues you were facing, I went through the ddk workshop provided in the DDK GitHub[1] and set it up in my account. I did not face any issue with the workshop sample.

Then I added my "GlueTransformStage" [2] in the ddk_app_stack.py as shown in the snippet below:

    glue_job_stage = GlueTransformStage(
        scope=self,
        id="ddk-glue-job",
        environment_id=environment_id,

.
.
.
.add_stage(glue_job_stage)

I was getting this error "TypeError: init() missing 1 required keyword-only argument: 'executable'"

Further, when I tried adding the job_name or job_role or both, I was getting the same error you got "AttributeError: 'NoneType' object has no attribute 'role_arn'".

I do not understand why we are getting these errors with those parameters as they are showing optional in the doc [2] . I thought maybe there was something extra we needed to do, so I tried adding a different stage "S3EventStage" . This worked without any issues.

Then i took a deep dive in the documentations and found this test app [3] and referring to the code, I provided the job_name and the crawler name in my DDK app and it deployed the app successfully. Please see my latest working "ddk_app_stack.py" (attached). So, it looks like there is some issue with the code because the doc says "If the Glue job or crawler names are not supplied, then they are created." And also the only required parameters are "scope", "id" and "environment_id".

Since DDK is a community driven open source framework and not a service like CDK, we don't have a dedicated support model. You can create an issue on the GitHub [1] to get their support.

Having said that, I am doing all I can to get a little bit more information for you on this issue. I cannot promise if I'd get an answer but if I do, by end of tomorrow, I will definitely relay it to you. Otherwise, I will be closing this case and you may continue working with them via a GitHub issue.

Please let me know if you have any other questions.

Have a wonderful rest of the week!

Reference:
[1] DDK GitHub
https://github.com/awslabs/aws-ddk

[2] GlueTransformStage doc
https://awslabs.github.io/aws-ddk/release/latest/api/core/stubs/aws_ddk_core.stages.GlueTransformStage.html

[3] test_basic_data_pipeline.py
https://github.com/awslabs/aws-ddk/blob/main/core/tests/unit/test_basic_data_pipeline.py#L56

We value your feedback. Please share your experience by rating this and other correspondences in the AWS Support Center. You can rate a correspondence by selecting the stars in the top right corner of the correspondence.

Best regards,
Amazon Web Services

Attachments
[ddk_app_stack.py.txt]
ddk_app_stack.py.txt

Message from AWS Support on Sep 29, 2002:

I was able to find a little more information for you.

In order for GlueTransformStage to create the glue job, then the executable parameter must be supplied. An example can be found in [1] .

If you want to use the existing job, then I believe you have to provide the job_name and the crawler_name. I was not able to do it with just "job_name".

I request you to create an issue on the DDK GitHub if you haven't already, to get their support.

If you have any other issue related to CDK, please create a new case.

Reference:
[1] Parameter executable example
https://github.com/awslabs/aws-ddk/blob/main/core/tests/unit/test_glue_transform_stage.py#L58

[1] CDK class class JobExecutable
https://docs.aws.amazon.com/cdk/api/v2/docs/@aws-cdk_aws-glue-alpha.JobExecutable.html

We value your feedback. Please share your experience by rating this and other correspondences in the AWS Support Center. You can rate a correspondence by selecting the stars in the top right corner of the correspondence.

Best regards,
Amazon Web Services

@litethesko litethesko added the question Further information is requested label Oct 2, 2022
@malachi-constant malachi-constant self-assigned this Oct 6, 2022
@malachi-constant
Copy link
Contributor

Hi @litethesko, I put together this example, take a look and let me know if you have any questions

@malachi-constant
Copy link
Contributor

@litethesko Was the example of any help to you?

@malachi-constant
Copy link
Contributor

Closing issue for now, please reopen if there are further questions/issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants