Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-stepfunctions: Distributed Map State Item Processor executionType doesn't recognize ProcessorType.EXPRESS #30194

Closed
Commas929 opened this issue May 14, 2024 · 4 comments · Fixed by #30301 · 4 remaining pull requests
Closed

aws-stepfunctions: Distributed Map State Item Processor executionType doesn't recognize ProcessorType.EXPRESS #30194

Commas929 opened this issue May 14, 2024 · 4 comments · Fixed by #30301 · 4 remaining pull requests
Assignees
Labels
@aws-cdk/aws-stepfunctions Related to AWS StepFunctions bug This issue is a bug. effort/small Small work item – less than a day of effort p2

Comments

@Commas929
Copy link

Describe the bug

I have the following piece of CDK code but when I deploy it and check in AWS console, the step function execution type for the map run is Standard instead of Express.

The code:

    const qotOverrideDistributedMapState = new DistributedMap(this, 'Distributed Map State', {
      maxConcurrency: 1000,
      itemBatcher: new ItemBatcher({
        maxItemsPerBatch: 10
      }),
      itemReader: new S3CsvItemReader({
        bucket: qotOverrideBucket,
        key: JsonPath.stringAt('$.s3Key'),
        csvHeaders: CsvHeaders.useFirstRow()
      }),
      resultWriter: new ResultWriter({
        bucket: qotOverrideBucket,
        prefix: 'QoTOverrideProcessingResult',
      }),
      toleratedFailureCount: 0,
    }).itemProcessor(new LambdaInvoke(this, 'Invoke Lambda Processor', {
      lambdaFunction: qotOverrideProcessor,
    }), {
      mode: ProcessorMode.DISTRIBUTED,
      executionType: ProcessorType.EXPRESS,
    })

    const qotOverrideFileProcessor = new StateMachine(this, 'QoT Override File Processor', {
      role: Role.fromRoleArn(this, 'QoT-Override-File-Processor-SfnRole', qotOverrideSfnRole.roleArn, {
        mutable: false,
      }),
      definitionBody: DefinitionBody.fromChainable(qotOverrideDistributedMapState),
      stateMachineName: 'QoT-Override-File-Processor-Sfn',
      stateMachineType: StateMachineType.STANDARD,
      logs: {
        destination: new LogGroup(this, 'QoT-Override-File-Processor-LogGroup', {
          retention: RetentionDays.THREE_MONTHS,
          logGroupName: 'QoT-Override-File-Processor-Log',
        }),
        level: LogLevel.ERROR,
        includeExecutionData: true,
      },
    })

The generated State Machine definition copied from AWS console:

{
  "StartAt": "Distributed Map State",
  "States": {
    "Distributed Map State": {
      "Type": "Map",
      "End": true,
      "ItemProcessor": {
        "ProcessorConfig": {
          "Mode": "DISTRIBUTED",
          "ExecutionType": "STANDARD"
        },
        "StartAt": "Invoke Lambda Processor",
        "States": {
          "Invoke Lambda Processor": {
            "End": true,
            "Retry": [
              {
                "ErrorEquals": [
                  "Lambda.ClientExecutionTimeoutException",
                  "Lambda.ServiceException",
                  "Lambda.AWSLambdaException",
                  "Lambda.SdkClientException"
                ],
                "IntervalSeconds": 2,
                "MaxAttempts": 6,
                "BackoffRate": 2
              }
            ],
            "Type": "Task",
            "Resource": "arn:aws:states:::lambda:invoke",
            "Parameters": {
              "FunctionName": "arn:aws:lambda:us-east-1:502523373620:function:QoTOverrideProcessor",
              "Payload.$": "$"
            }
          }
        }
      },
      "MaxConcurrency": 1000,
      "ItemReader": {
        "Resource": "arn:aws:states:::s3:getObject",
        "ReaderConfig": {
          "InputType": "CSV",
          "CSVHeaderLocation": "FIRST_ROW"
        },
        "Parameters": {
          "Bucket.$": "$.s3Bucket",
          "Key.$": "$.s3Key"
        }
      },
      "ItemBatcher": {
        "MaxItemsPerBatch": 10
      },
      "ResultWriter": {
        "Resource": "arn:aws:states:::s3:putObject",
        "Parameters": {
          "Bucket.$": "$.s3Bucket",
          "Prefix": "QoTOverrideProcessingResult"
        }
      }
    }
  }
}

Expected Behavior

The ItemProcessor execution type should be Express

Current Behavior

The ItemProcessor execution type is Standard and the specified type in CDK is ignored.

Reproduction Steps

    const bucket = new Bucket(this, 'Bucket', {});
    const distributedMapState = new DistributedMap(this, 'Distributed Map State', {
      maxConcurrency: 1000,
      itemReader: new S3CsvItemReader({
        bucket: bucket,
        key: JsonPath.stringAt('$.s3Key'),
        csvHeaders: CsvHeaders.useFirstRow()
      }),
      toleratedFailureCount: 0,
    }).itemProcessor(new Pass(this, 'Pass State', {}), {
      mode: ProcessorMode.DISTRIBUTED,
      executionType: ProcessorType.EXPRESS,
    })

    const fileProcessor = new StateMachine(this, 'File Processor', {
      definitionBody: DefinitionBody.fromChainable(distributedMapState),
      stateMachineName: 'File-Processor-Sfn',
      stateMachineType: StateMachineType.STANDARD,
    })

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.136.0

Framework Version

No response

Node.js Version

NodeJS-18

OS

macOS Sonoma 14.4.1

Language

TypeScript

Language Version

No response

Other information

No response

@Commas929 Commas929 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels May 14, 2024
@github-actions github-actions bot added the @aws-cdk/aws-stepfunctions Related to AWS StepFunctions label May 14, 2024
@khushail khushail added investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels May 14, 2024
@khushail khushail self-assigned this May 14, 2024
@khushail
Copy link
Contributor

khushail commented May 16, 2024

Hey @Commas929 , thanks for reporting this bug.

On investigating this issue, I checked the CDK Docs and found that this PR introduced the support for itemProcessor() where executiontype can be set to required enum value-

public itemProcessor(processor: IChainable, config: ProcessorConfig = {}): DistributedMap {

However later on this value was overwritten by change introduced in this PR -

rendered.ItemProcessor.ProcessorConfig.ExecutionType = this.mapExecutionType;

So I tried reunning the code with giving value of mapExecutionType to EXPRESS and it worked by assigning the same value to executionType. However I feel this might not be intended as such.

Sharing the code -

    const distributedMapState = new DistributedMap(this, 'Distributed Map State', {
      maxConcurrency: 1000,
      itemReader: new S3CsvItemReader({
        bucket: bucket,
        key: JsonPath.stringAt('$.s3Key'),
        csvHeaders: CsvHeaders.useFirstRow()
      }),
      mapExecutionType: StateMachineType.EXPRESS,
      resultPath: JsonPath.DISCARD,
      toleratedFailureCount: 0,
    }).itemProcessor(new Pass(this, 'Pass State', {}), {
      mode: ProcessorMode.DISTRIBUTED,
      executionType: ProcessorType.STANDARD,
    })

    const fileProcessor = new StateMachine(this, 'File Processor', {
      definitionBody: DefinitionBody.fromChainable(distributedMapState),
      stateMachineName: 'File-Processor-Sfn',
      stateMachineType: StateMachineType.STANDARD,
    })

and synthesized template -


"FileProcessor4197797F": {
   "Type": "AWS::StepFunctions::StateMachine",
   "Properties": {
    "DefinitionString": {
     "Fn::Join": [
      "",
      [
       "{\"StartAt\":\"Distributed Map State\",\"States\":{\"Distributed Map State\":{\"Type\":\"Map\",\"ResultPath\":null,\"End\":true,\"ItemProcessor\":{\"ProcessorConfig\":{\"Mode\":\"DISTRIBUTED\",\"ExecutionType\":\"EXPRESS\"},\"StartAt\":\"Pass State\",\"States\":{\"Pass State\":{\"Type\":\"Pass\",\"End\":true}}},\"MaxConcurrency\":1000,\"ItemReader\":{\"Resource\":\"arn:",
       {
        "Ref": "AWS::Partition"
       },
       ":states:::s3:getObject\",\"ReaderConfig\":{\"InputType\":\"CSV\",\"CSVHeaderLocation\":\"FIRST_ROW\"},\"Parameters\":{\"Bucket\":\"",
       {
        "Ref": "Bucket2001301D4930"
       },
       "\",\"Key.$\":\"$.s3Key\"}}}}}"
      ]
     ]
    },

You could try using Escape hatches for workaround. Marking the issue as appropriate.

@khushail khushail added p2 effort/small Small work item – less than a day of effort response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels May 16, 2024
@mazyu36
Copy link
Contributor

mazyu36 commented May 18, 2024

I encountered this problem before , and I’m working on it.

I think the executionType property in the itemProcessor config is not needed to set when use the DistributedMap Construct.

The mapExecutionType in the DistributedMap Construct is used to set the Processing Mode of the ItemProcessor, and its purpose overlaps with the executionType property in the ItemProcessor Config.

In the current implementation, the mapExecutionType in the DistributedMap Construct always overwrites the Processing Mode of the ItemProcessor.
So the executionType property in the itemProcessor config is meaningless.

The mapExecutionType in the DistributedMap Construct may have been an unnecessary property, but removing it would cause a breaking change, so it should be avoided.

@mergify mergify bot closed this as completed in #30301 Jun 13, 2024
mergify bot pushed a commit that referenced this issue Jun 13, 2024
… executionType in the ProcessorConfig (#30301)

### Issue # (if applicable)

Closes #30194 

### Reason for this change
In #27913, the ItemProcessor was introduced for use with the Map Class. With the executionType in the ProcessorConfig, it was possible to specify the executionType for the Map.
On the other hand, in #28821, the DistributedMap Class was introduced. The mapExecutionType of the DistributedMap class always overwrites the executionType of the ProcessorConfig.
Therefore, when using the DistributedMap class, the implementation ignores the executionType of the ProcessorConfig. However, this behavior cannot be inferred from the documentation.

### Description of changes
* Added to the docs that when using the DistributedMap Class, the executionType in the ProcessorConfig is ignored.
* Also added a warning.


### Description of how you validated changes
Add unit tests and integ tests



### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

sarangarav pushed a commit to sarangarav/aws-cdk that referenced this issue Jun 21, 2024
… executionType in the ProcessorConfig (aws#30301)

### Issue # (if applicable)

Closes aws#30194 

### Reason for this change
In aws#27913, the ItemProcessor was introduced for use with the Map Class. With the executionType in the ProcessorConfig, it was possible to specify the executionType for the Map.
On the other hand, in aws#28821, the DistributedMap Class was introduced. The mapExecutionType of the DistributedMap class always overwrites the executionType of the ProcessorConfig.
Therefore, when using the DistributedMap class, the implementation ignores the executionType of the ProcessorConfig. However, this behavior cannot be inferred from the documentation.

### Description of changes
* Added to the docs that when using the DistributedMap Class, the executionType in the ProcessorConfig is ignored.
* Also added a warning.


### Description of how you validated changes
Add unit tests and integ tests



### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
mazyu36 added a commit to mazyu36/aws-cdk that referenced this issue Jun 22, 2024
… executionType in the ProcessorConfig (aws#30301)

### Issue # (if applicable)

Closes aws#30194 

### Reason for this change
In aws#27913, the ItemProcessor was introduced for use with the Map Class. With the executionType in the ProcessorConfig, it was possible to specify the executionType for the Map.
On the other hand, in aws#28821, the DistributedMap Class was introduced. The mapExecutionType of the DistributedMap class always overwrites the executionType of the ProcessorConfig.
Therefore, when using the DistributedMap class, the implementation ignores the executionType of the ProcessorConfig. However, this behavior cannot be inferred from the documentation.

### Description of changes
* Added to the docs that when using the DistributedMap Class, the executionType in the ProcessorConfig is ignored.
* Also added a warning.


### Description of how you validated changes
Add unit tests and integ tests



### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@aws-cdk-automation
Copy link
Collaborator

Comments on closed issues and PRs are hard for our team to see. If you need help, please open a new issue that references this one.

@aws aws locked as resolved and limited conversation to collaborators Jul 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.