Skip to content

NIFI-9972 Put blob can now pull from another blob source#6765

Closed
malthe wants to merge 1 commit intoapache:mainfrom
malthe:issue-9972-put-blob-from-url-v2
Closed

NIFI-9972 Put blob can now pull from another blob source#6765
malthe wants to merge 1 commit intoapache:mainfrom
malthe:issue-9972-put-blob-from-url-v2

Conversation

@malthe
Copy link
Contributor

@malthe malthe commented Dec 7, 2022

Summary

NIFI-9972

This adds support for the "Put Blob from URL" operation which provides service-to-service copying of blobs – the client is only responsible for the orchestration which copies individual blocks, not for transferring the data in those blocks.

The functionality is added as an optional new data source of the PutAzureBlobStorage_v12 processor.

The test case is not comprehensive in the sense that only the built-in credential is tested (account key). Meanwhile, all credentials are indeed supported (defined by the AzureStorageCredentialsType enum):

  • Account key
  • SAS token
  • Access token
  • Managed identity
  • Service principal

The logic of the authentication code is that it's a SAS token (provided directly or derived using an account key) or it's based on OAuth2 (which captures the remaining cases).

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 8
    • JDK 11
    • JDK 17

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@malthe malthe force-pushed the issue-9972-put-blob-from-url-v2 branch 2 times, most recently from bbfa8d7 to 3ce9453 Compare December 7, 2022 19:02
Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for proposing this improvement @malthe. The feature of copying from one Blob location to another makes sense, but it raises the question as to whether it should be implemented in the Put Processor. The introduction of additional credentials handling for the source location adds a good bit of code that is specific to the copy operation. What do you think about implementing this capability in a new Processor named something like CopyAzureBlobStorage?

@malthe
Copy link
Contributor Author

malthe commented Jan 25, 2023

@exceptionfactory I thought about it and the justification for me to have it in the Put Processor is that it's the same REST endpoint that Azure provides. The copy source is implemented using special headers on the PUT request.

Of course, that's not to say that we shouldn't have a CopyAzureBlobStorage but I think it's a reasonable justification to be honest.

@exceptionfactory
Copy link
Contributor

Thanks for the reply @malthe. Although there is some value in co-locating functionality given the same REST endpoint, from a usability and maintenance perspective, a separate Processor might be better. On the other hand, I can see where keeping the implementation in this Processor makes some things easier. The additional authentication handling code for the source warrants some further review.

As @turcsanyip has some experience with these Azure Blob Processors, perhaps he has some additional thoughts about the best approach to support this capability.

@malthe malthe force-pushed the issue-9972-put-blob-from-url-v2 branch from 3ce9453 to 662607b Compare January 27, 2023 10:39
@malthe
Copy link
Contributor Author

malthe commented Feb 9, 2023

@exceptionfactory bump?

@exceptionfactory
Copy link
Contributor

Thanks for the reminder @malthe. With focus on the NiFi 1.20 release, I was waiting for some additional input from @turcsanyip.

Considering the proposed changes from another perspective, I'm also wondering about the copy-from-url feature from a usability perspective. Although the end result is putting a blob in the destination, the copy effectively ignores the input FlowFile content if I am following the implementation. For a new user, this may not be the most intuitive approach if the desired capability is to copy from one URL to another. Although the destination URL might be the same in the case of using FlowFile content, the expected flow design would be different. It can be challenging to decide when it makes sense to bundle functionality in one Processors, versus creating a new one, but at this point, a new Processor still seems like a better approach. Do you have any additional thoughts along these lines?

@turcsanyip
Copy link
Contributor

@malthe Sorry for my late response here.
I agree with @exceptionfactory that this "copy from url" functionality should rather go into its own processor.
The general contract of 'Put' processors is to upload the FlowFile content into the target external system. To keep the PutBlob processor aligned with this convention, I would not mix extra functionalities with the regular 'Put' role. This way also the configuration would be more straightforward for the users, I believe.
I understand that the same Azure endpoint is used and there are some common code parts but I think it could be handled via moving that code to abstract/util classes.

@exceptionfactory
Copy link
Contributor

Thanks for the additional perspective @turcsanyip!

Based on the feedback, I am closing this pull request.

@malthe, if you are interested in working on this feature, a new pull request with a Processor named something along the lines of CopyAzureBlobStorage sounds useful. Feel free to discuss any concerns or potential design approaches on the Jira issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants