Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(glue-alpha): cannot create 2 partitionIndexes simultaneously #24813

Open
clueleaf opened this issue Mar 28, 2023 · 7 comments
Open

(glue-alpha): cannot create 2 partitionIndexes simultaneously #24813

clueleaf opened this issue Mar 28, 2023 · 7 comments
Labels
@aws-cdk/aws-glue Related to AWS Glue bug This issue is a bug. effort/medium Medium work item – several days of effort p3

Comments

@clueleaf
Copy link
Contributor

clueleaf commented Mar 28, 2023

Describe the bug

When passing 2 indexes to partitionIndexes of glue.Table, table creation fails.

Expected Behavior

Glue table and indexes are created.

Current Behavior

Table indexes creation fails.

Index index2 is in CREATING state. Only 1 index can be created or deleted simultaneously per table.

Reproduction Steps

Create a glue table with 2 indexes.

const bucket = new s3.Bucket(stack, 'DataBucket');
const database = new glue.Database(stack, 'MyDatabase', {
  databaseName: 'database',
});

const csvTable = new glue.Table(stack, 'CSVTable', {
  database,
  bucket,
  tableName: 'csv_table',
  columns: [
    { name: 'col1', type: glue.Schema.STRING },
    { name: 'col2', type: glue.Schema.STRING },
    { name: 'col3', type: glue.Schema.STRING },
  ],
  partitionKeys: [
    { name: 'year', type: glue.Schema.SMALL_INT },
    { name: 'month', type: glue.Schema.BIG_INT },
  ],
  partitionIndexes: [
    { indexName: 'index1', keyNames: ['month'] },
    { indexName: 'index2', keyNames: ['month', 'year'] },
  ],
  dataFormat: glue.DataFormat.CSV,
});

It fails sometimes even if only one index is passed to partitionIndexes and the rest is added using table.addPartitionIndex.

const csvTable = new glue.Table(stack, 'CSVTable', {
  database,
  bucket,
  tableName: 'csv_table',
  columns: [
    { name: 'col1', type: glue.Schema.STRING },
    { name: 'col2', type: glue.Schema.STRING },
    { name: 'col3', type: glue.Schema.STRING },
  ],
  partitionKeys: [
    { name: 'year', type: glue.Schema.SMALL_INT },
    { name: 'month', type: glue.Schema.BIG_INT },
  ],
  partitionIndexes: [{ indexName: 'index1', keyNames: ['month'] }],
  dataFormat: glue.DataFormat.CSV,
});

csvTable.addPartitionIndex({ indexName: 'index2', keyNames: ['month', 'year'] })

Possible Solution

I think this a restriction of Glue service.

Additional Information/Context

No response

CDK CLI Version

2.70.0

Framework Version

No response

Node.js Version

18

OS

macOS Ventura

Language

Typescript

Language Version

No response

Other information

No response

@clueleaf clueleaf added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 28, 2023
@github-actions github-actions bot added the @aws-cdk/aws-glue Related to AWS Glue label Mar 28, 2023
@khushail khushail added needs-reproduction This issue needs reproduction. investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Mar 28, 2023
@khushail khushail self-assigned this Mar 28, 2023
@khushail khushail removed the needs-triage This issue or PR still needs to be triaged. label Mar 28, 2023
@khushail
Copy link
Contributor

Hi @clueleaf , thanks for reaching out.

Its stated in the available documentation that you can have a maximum of 3 partition indexes in the table. But its also stated here - `

  • Partition indexes must be created one at a time. To avoid
  • race conditions, we store the resource and add dependencies
  • each time a new partition index is created.
    `
    I am also getting the error while creating 2 indexes at the same time but it succeeds when I am adding Partition Index later on. Since workaround is there, currently I am marking this as P2 which means our team won't be able to work on it immediately. However if you would like to contribute to resolving this bug, that would be great. Here is a contributing guide to get started.

We also use +1s to help prioritize our work, and are happy to re-evaluate this issue based on community feedback. You can reach out to the cdk.dev community on Slack to solicit support for re-prioritization. (edited)

@khushail khushail removed their assignment Mar 28, 2023
@khushail khushail added p2 effort/small Small work item – less than a day of effort response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. needs-reproduction This issue needs reproduction. labels Mar 28, 2023
@khushail khushail self-assigned this Mar 28, 2023
@clueleaf
Copy link
Contributor Author

clueleaf commented Mar 29, 2023

@khushail Thank you for your investigation.
One wired thing is that even if I use addPartitionIndex to add index later on, it fails just as the same.
It's hard to tell why it succeeds sometimes but not always.

const bucket = new s3.Bucket(stack, 'DataBucket');
const database = new glue.Database(stack, 'MyDatabase', {
  databaseName: 'database',
});

const csvTable = new glue.Table(stack, 'CSVTable', {
  database,
  bucket,
  tableName: 'csv_table',
  columns: [
    { name: 'col1', type: glue.Schema.STRING },
    { name: 'col2', type: glue.Schema.STRING },
    { name: 'col3', type: glue.Schema.STRING },
  ],
  partitionKeys: [
    { name: 'year', type: glue.Schema.SMALL_INT },
    { name: 'month', type: glue.Schema.BIG_INT },
  ],
  partitionIndexes: [{ indexName: 'index1', keyNames: ['month'] }],
  dataFormat: glue.DataFormat.CSV,
});
csvTable.addPartitionIndex({ indexName: 'index2', keyNames: ['month', 'year'] })

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Mar 29, 2023
@khushail
Copy link
Contributor

@clueleaf , could you please share the error that you see when it fails. As I am not able to repro this error, it might be helpful for reference while creating a PR.

@khushail khushail added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Mar 29, 2023
@clueleaf
Copy link
Contributor Author

Sure.

**:**:** ** | CREATE_FAILED        | Custom::AWS           | CSVTablepartitionindexindex16247ABF6
Received response status [FAILED] from custom resource. Message returned: Index index2 is in CREATING state. Only 1 index can be created or deleted simultaneously per table. (RequestId: 9a709d0e-4e9d-49e3-8202-fd781b73266b)

 ❌  MyStack (MyStack) failed: Error: The stack named MyStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Received response status [FAILED] from custom resource. Message returned: Index index2 is in CREATING state. Only 1 index can be created or deleted simultaneously per table. (RequestId: 9a709d0e-4e9d-49e3-8202-fd781b73266b)
    at FullCloudFormationDeployment.monitorDeployment (/Users/***/node_modules/aws-cdk/lib/index.js:380:10236)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async deployStack2 (/Users/***/node_modules/aws-cdk/lib/index.js:383:145458)
    at async /Users/***/node_modules/aws-cdk/lib/index.js:383:128776
    at async run (/Users/***/node_modules/aws-cdk/lib/index.js:383:126782)

 ❌ Deployment failed: Error: Stack Deployments Failed: Error: The stack named MyStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Received response status [FAILED] from custom resource. Message returned: Index index2 is in CREATING state. Only 1 index can be created or deleted simultaneously per table. (RequestId: 9a709d0e-4e9d-49e3-8202-fd781b73266b)
    at deployStacks (/Users/***/node_modules/aws-cdk/lib/index.js:383:129083)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async CdkToolkit.deploy (/Users/***/node_modules/aws-cdk/lib/index.js:383:147507)
    at async exec4 (/Users/***/node_modules/aws-cdk/lib/index.js:438:51799)

Stack Deployments Failed: Error: The stack named MyStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Received response status [FAILED] from custom resource. Message returned: Index index2 is in CREATING state. Only 1 index can be created or deleted simultaneously per table. (RequestId: 9a709d0e-4e9d-49e3-8202-fd781b73266b)

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Mar 30, 2023
@khushail
Copy link
Contributor

thanks @clueleaf .

@khushail khushail removed their assignment Mar 30, 2023
@yuntaoL
Copy link

yuntaoL commented May 3, 2023

I have same issue, it worked previously.

@colifran colifran added effort/medium Medium work item – several days of effort and removed effort/small Small work item – less than a day of effort labels May 28, 2024
@colifran colifran changed the title (aws-glue-alpha): cannot create 2 partitionIndexes simultaneously (glue-alpha): cannot create 2 partitionIndexes simultaneously May 28, 2024
@pahud pahud added p3 and removed p2 labels Jun 11, 2024
@prazian
Copy link

prazian commented Jun 28, 2024

IMO, the best thing is to avoid returning nothing in the addPartitionIndex function and instead return the object, so then we could chain dependencies between the two indexes.

Something like this (currently doesn't work because it returns void):

        const table = new S3Table(this, 'Something', {
              .
              .
              .
             });


        const pI1 = table.addPartitionIndex({
                    indexName: 'year_month_day',
                    keyNames: ['year', 'month', 'day']
                });
        const pI2 = table.addPartitionIndex({
                    indexName: 'country_site',
                    keyNames: ['country', 'site']
                });
        pI1.addDependency(pI2); # Does't work because pI1 and pI2 are void

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-glue Related to AWS Glue bug This issue is a bug. effort/medium Medium work item – several days of effort p3
Projects
None yet
Development

No branches or pull requests

6 participants