Skip to content

NIFI-15969 Fixed PutS3Object multipart upload data corruption for concurrent FlowFiles with same S3 key#11279

Closed
rakesh-rsky wants to merge 1 commit into
apache:mainfrom
rakesh-rsky:fix/NIFI-15969-puts3object-multipart-uuid-key
Closed

NIFI-15969 Fixed PutS3Object multipart upload data corruption for concurrent FlowFiles with same S3 key#11279
rakesh-rsky wants to merge 1 commit into
apache:mainfrom
rakesh-rsky:fix/NIFI-15969-puts3object-multipart-uuid-key

Conversation

@rakesh-rsky
Copy link
Copy Markdown
Contributor

NIFI-15969 Fixed PutS3Object multipart upload data corruption for concurrent FlowFiles with same S3 key.

Previously the multipart upload state was tracked using only the processor identifier, bucket name, and object key. When two FlowFiles with the same name were uploaded concurrently to the same bucket, they shared the same state tracking key, causing parts from different uploads to be interleaved and resulting in a corrupt S3 object.

Fix:

Included the FlowFile UUID in the state tracking key so each FlowFile maintains its own independent multipart upload state. Retries of the same FlowFile retain the same UUID and continue to benefit from state resumption. A FlowFile with a new UUID starts a fresh upload rather than inheriting stale state.

Summary

NIFI-15969

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000
  • Pull request contains commits signed with a registered key indicating Verified status

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using ./mvnw clean install -P contrib-check
    • JDK 21
    • JDK 25

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

…current FlowFiles with same S3 key

Previously the multipart upload state was tracked using only the processor identifier,
bucket name, and object key. When two FlowFiles with the same name were uploaded
concurrently to the same bucket, they shared the same state tracking key, causing
parts from different uploads to be interleaved and resulting in a corrupt S3 object.

Included the FlowFile UUID in the state tracking key so each FlowFile maintains
its own independent multipart upload state. Retries of the same FlowFile retain
the same UUID and continue to benefit from state resumption. A FlowFile with a
new UUID starts a fresh upload rather than inheriting stale state.
@turcsanyip turcsanyip self-requested a review May 26, 2026 10:11
Copy link
Copy Markdown
Contributor

@turcsanyip turcsanyip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is straightforward.

+1 merging

Thanks @rakesh-rsky!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants