NIFI-9972 Put blob can now pull from another blob source#6765
NIFI-9972 Put blob can now pull from another blob source#6765malthe wants to merge 1 commit intoapache:mainfrom
Conversation
bbfa8d7 to
3ce9453
Compare
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for proposing this improvement @malthe. The feature of copying from one Blob location to another makes sense, but it raises the question as to whether it should be implemented in the Put Processor. The introduction of additional credentials handling for the source location adds a good bit of code that is specific to the copy operation. What do you think about implementing this capability in a new Processor named something like CopyAzureBlobStorage?
|
@exceptionfactory I thought about it and the justification for me to have it in the Put Processor is that it's the same REST endpoint that Azure provides. The copy source is implemented using special headers on the Of course, that's not to say that we shouldn't have a CopyAzureBlobStorage but I think it's a reasonable justification to be honest. |
|
Thanks for the reply @malthe. Although there is some value in co-locating functionality given the same REST endpoint, from a usability and maintenance perspective, a separate Processor might be better. On the other hand, I can see where keeping the implementation in this Processor makes some things easier. The additional authentication handling code for the source warrants some further review. As @turcsanyip has some experience with these Azure Blob Processors, perhaps he has some additional thoughts about the best approach to support this capability. |
3ce9453 to
662607b
Compare
|
@exceptionfactory bump? |
|
Thanks for the reminder @malthe. With focus on the NiFi 1.20 release, I was waiting for some additional input from @turcsanyip. Considering the proposed changes from another perspective, I'm also wondering about the copy-from-url feature from a usability perspective. Although the end result is putting a blob in the destination, the copy effectively ignores the input FlowFile content if I am following the implementation. For a new user, this may not be the most intuitive approach if the desired capability is to copy from one URL to another. Although the destination URL might be the same in the case of using FlowFile content, the expected flow design would be different. It can be challenging to decide when it makes sense to bundle functionality in one Processors, versus creating a new one, but at this point, a new Processor still seems like a better approach. Do you have any additional thoughts along these lines? |
|
@malthe Sorry for my late response here. |
|
Thanks for the additional perspective @turcsanyip! Based on the feedback, I am closing this pull request. @malthe, if you are interested in working on this feature, a new pull request with a Processor named something along the lines of |
Summary
NIFI-9972
This adds support for the "Put Blob from URL" operation which provides service-to-service copying of blobs – the client is only responsible for the orchestration which copies individual blocks, not for transferring the data in those blocks.
The functionality is added as an optional new data source of the PutAzureBlobStorage_v12 processor.
The test case is not comprehensive in the sense that only the built-in credential is tested (account key). Meanwhile, all credentials are indeed supported (defined by the
AzureStorageCredentialsTypeenum):The logic of the authentication code is that it's a SAS token (provided directly or derived using an account key) or it's based on OAuth2 (which captures the remaining cases).
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000NIFI-00000Pull Request Formatting
mainbranchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
mvn clean install -P contrib-checkLicensing
LICENSEandNOTICEfilesDocumentation