NIFI-10710 implement processor for AWS Polly, Textract, Translate, Tr…#6589
NIFI-10710 implement processor for AWS Polly, Textract, Translate, Tr…#6589KalmanJantner wants to merge 16 commits intoapache:mainfrom
Conversation
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for the contribution @KalmanJantner, these look like useful new components!
The class files are missing the Apache license header, so that is one general adjustment to make. I recommend running a local Maven build with the -P contrib-check option to catch formatting issues.
At a high level, this looks like an opportunity to start with version 2 of the AWS SDK, instead of using the existing version 1.
...ndle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMlProcessor.java
Outdated
Show resolved
Hide resolved
...ndle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMlProcessor.java
Outdated
Show resolved
Hide resolved
...ndle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMlProcessor.java
Outdated
Show resolved
Hide resolved
...cessors/src/main/java/org/apache/nifi/processors/aws/ml/AwsResponseMetadataDeserializer.java
Show resolved
Hide resolved
markap14
left a comment
There was a problem hiding this comment.
Thanks for putting up the Pull Request @KalmanJantner ! I think these will be really helpful processors. But I think there are a lot of conventions that we need to make sure that we are following here. Processor names, descriptions, documentation, best practices, etc. And given @exceptionfactory 's feedback that we're using the 1.x SDK instead of the 2.x SDK I think we need to make sure that these get addressed.
...fi-aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMLFetcherProcessor.java
Outdated
Show resolved
Hide resolved
...fi-aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMLFetcherProcessor.java
Outdated
Show resolved
Hide resolved
...fi-aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMLFetcherProcessor.java
Outdated
Show resolved
Hide resolved
...ndle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMlProcessor.java
Outdated
Show resolved
Hide resolved
...ndle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMlProcessor.java
Outdated
Show resolved
Hide resolved
...ifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/polly/PollyProcessor.java
Outdated
Show resolved
Hide resolved
...aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/textract/TextractFetcher.java
Outdated
Show resolved
Hide resolved
...aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/textract/TextractFetcher.java
Outdated
Show resolved
Hide resolved
...aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/textract/TextractFetcher.java
Outdated
Show resolved
Hide resolved
...aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/textract/TextractFetcher.java
Outdated
Show resolved
Hide resolved
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for the updates @KalmanJantner. I noted a number of additional suggestions, and several files still missing license headers.
...ifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMLJobStatusGetter.java
Outdated
Show resolved
Hide resolved
...ifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMLJobStatusGetter.java
Outdated
Show resolved
Hide resolved
...ifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMLJobStatusGetter.java
Outdated
Show resolved
Hide resolved
...ifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMLJobStatusGetter.java
Outdated
Show resolved
Hide resolved
...i-aws-processors/src/main/java/org/apache/nifi/processors/aws/ml/polly/StartAwsPollyJob.java
Show resolved
Hide resolved
...rs/src/main/java/org/apache/nifi/processors/aws/ml/transcribe/GetAwsTranscribeJobStatus.java
Outdated
Show resolved
Hide resolved
...-processors/src/test/java/org/apache/nifi/processors/aws/ml/polly/GetAwsPollyStatusTest.java
Show resolved
Hide resolved
...essors/src/test/java/org/apache/nifi/processors/aws/ml/polly/MockAwsCredentialsProvider.java
Outdated
Show resolved
Hide resolved
...processors/src/test/java/org/apache/nifi/processors/aws/ml/textract/TextractFetcherTest.java
Show resolved
Hide resolved
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for continuing to work on these new Processors @KalmanJantner. I noted a number of adjustments, mostly minor changes.
...-processors/src/test/java/org/apache/nifi/processors/aws/ml/polly/GetAwsPollyStatusTest.java
Show resolved
Hide resolved
...-processors/src/test/java/org/apache/nifi/processors/aws/ml/polly/GetAwsPollyStatusTest.java
Outdated
Show resolved
Hide resolved
...-processors/src/test/java/org/apache/nifi/processors/aws/ml/polly/GetAwsPollyStatusTest.java
Outdated
Show resolved
Hide resolved
...-processors/src/test/java/org/apache/nifi/processors/aws/ml/polly/GetAwsPollyStatusTest.java
Outdated
Show resolved
Hide resolved
...-processors/src/test/java/org/apache/nifi/processors/aws/ml/polly/GetAwsPollyStatusTest.java
Outdated
Show resolved
Hide resolved
KalmanJantner
left a comment
There was a problem hiding this comment.
@exceptionfactory Thank you for the feedbacks. I implemented the changes based on your suggestions.
...s-processors/src/main/java/org/apache/nifi/processors/aws/ml/polly/GetAwsPollyJobStatus.java
Outdated
Show resolved
Hide resolved
...processors/src/main/java/org/apache/nifi/processors/aws/ml/textract/StartAwsTextractJob.java
Show resolved
Hide resolved
...-processors/src/test/java/org/apache/nifi/processors/aws/ml/polly/GetAwsPollyStatusTest.java
Show resolved
Hide resolved
| } | ||
|
|
||
| if (status == JobStatus.COMPLETED) { |
...-processors/src/test/java/org/apache/nifi/processors/aws/ml/polly/GetAwsPollyStatusTest.java
Outdated
Show resolved
Hide resolved
...-processors/src/test/java/org/apache/nifi/processors/aws/ml/polly/GetAwsPollyStatusTest.java
Outdated
Show resolved
Hide resolved
...-processors/src/test/java/org/apache/nifi/processors/aws/ml/polly/GetAwsPollyStatusTest.java
Outdated
Show resolved
Hide resolved
...-processors/src/test/java/org/apache/nifi/processors/aws/ml/polly/GetAwsPollyStatusTest.java
Outdated
Show resolved
Hide resolved
|
@KalmanJantner, I will take a closer look at the latest changes soon. Please avoid introducing merge commits, instead, please rebase and force-push changes to align with the main branch. Thanks! |
Thanks @exceptionfactory, I have removed the merge commit. |
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for the latest set of updates @KalmanJantner. I noted a number of minor stylistic adjustments, but this looks to be nearing completion.
Tagging @markap14 for additional review.
...processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMachineLearningJobStarter.java
Outdated
Show resolved
Hide resolved
...processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMachineLearningJobStarter.java
Show resolved
Hide resolved
...processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMachineLearningJobStarter.java
Outdated
Show resolved
Hide resolved
...rs/src/main/java/org/apache/nifi/processors/aws/ml/AwsMachineLearningJobStatusProcessor.java
Outdated
Show resolved
Hide resolved
...rs/src/main/java/org/apache/nifi/processors/aws/ml/AwsMachineLearningJobStatusProcessor.java
Outdated
Show resolved
Hide resolved
...rc/test/java/org/apache/nifi/processors/aws/ml/transcribe/GetAwsTranscribeJobStatusTest.java
Outdated
Show resolved
Hide resolved
...rc/test/java/org/apache/nifi/processors/aws/ml/transcribe/GetAwsTranscribeJobStatusTest.java
Outdated
Show resolved
Hide resolved
.../src/test/java/org/apache/nifi/processors/aws/ml/translate/GetAwsTranslateJobStatusTest.java
Outdated
Show resolved
Hide resolved
.../src/test/java/org/apache/nifi/processors/aws/ml/translate/GetAwsTranslateJobStatusTest.java
Outdated
Show resolved
Hide resolved
.../src/test/java/org/apache/nifi/processors/aws/ml/translate/GetAwsTranslateJobStatusTest.java
Outdated
Show resolved
Hide resolved
pvillard31
left a comment
There was a problem hiding this comment.
Playing with this PR. Some remarks:
- set mime.type to json when we know that the generated flow file is going to contain a JSON payload (example: StartAwsTextractJob, GetAwsTextractJobStatus, etc).
- please set description on all properties and enum values where it makes sense. We should not assume the user knows the difference between Text Analysis and Document Text Detection for example (in StartAwsTextractJob).
- please add additionalDetails with examples of expected JSON payloads + links to AWS documentation if that makes sense (example: StartAwsTextractJob)
- the type of extract in GetAwsTextractJobStatus is a string and not an enum. If that's because we want to be able to reference a flow file attribute, then we should directly reference the flow file attribute that we created in StartAwsTextractJob. In this case:
${type-of-service}. However when trying to use EL, it fails because it's not one of the allowable values. Something to fix here.
|
Note - I do see the additionalDetails page on the pull request but it's not showing in the usage page of the processor. Maybe something wrong somewhere when building the documentation? |
Thank you for the comments, I went through and fixed them. |
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for the latest round of updates @KalmanJantner.
Following some runtime testing, I noticed a few additional issues. For status retrieval Processors, it seems like it would be helpful to add a new Task ID property that defaults to reading the value from a FlowFile attribute. As implemented right now, the FlowFile attribute is hard-coded, and not reflected in documentation. Although the documentation could be updated with the addition of ReadsAttribute annotations on the Processors, introducing a new Property with default value would avoid the implicit attribute handling while supporting the same basic behavior.
...essors/src/main/java/org/apache/nifi/processors/aws/ml/textract/GetAwsTextractJobStatus.java
Outdated
Show resolved
Hide resolved
...essors/src/main/java/org/apache/nifi/processors/aws/ml/textract/GetAwsTextractJobStatus.java
Outdated
Show resolved
Hide resolved
...essors/src/main/java/org/apache/nifi/processors/aws/ml/textract/GetAwsTextractJobStatus.java
Show resolved
Hide resolved
...processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMachineLearningJobStarter.java
Outdated
Show resolved
Hide resolved
...processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMachineLearningJobStarter.java
Outdated
Show resolved
Hide resolved
…stom textract validation
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for the latest round of updates @KalmanJantner. While testing some error conditions, it looks like there is an issue with handling FlowFiles in at least some of the Start Processors. Attempting to process a FlowFile without having valid AWS credentials results in the following uncaught exception.
2023-01-23 10:33:23,888 WARN [Timer-Driven Process Thread-3] o.a.n.controller.tasks.ConnectableTask Processing halted: uncaught exception in Component [StartAwsTextractJob[id=97c0dc20-0185-1000-d3db-4140509dc964]]
org.apache.nifi.processor.exception.FlowFileHandlingException: StandardFlowFileRecord[uuid=24bdcb1d-49f6-45ed-81cf-db6fb433afba,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1674491600189-1, container=default, section=1], offset=0, length=2],offset=0,name=24bdcb1d-49f6-45ed-81cf-db6fb433afba,size=2] transfer relationship not specified. This FlowFile was not created in this session and was not transferred to any Relationship via ProcessSession.transfer()
at org.apache.nifi.controller.repository.StandardProcessSession.validateCommitState(StandardProcessSession.java:262)
at org.apache.nifi.controller.repository.StandardProcessSession.checkpoint(StandardProcessSession.java:277)
at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:559)
at org.apache.nifi.controller.repository.StandardProcessSession.commitAsync(StandardProcessSession.java:513)
at org.apache.nifi.processors.aws.AbstractAWSProcessor.onTrigger(AbstractAWSProcessor.java:303)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1356)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:246)
at org.apache.nifi.controller.scheduling.AbstractTimeBasedSchedulingAgent.lambda$doScheduleOnce$0(AbstractTimeBasedSchedulingAgent.java:59)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
...processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMachineLearningJobStarter.java
Outdated
Show resolved
Hide resolved
|
@KalmanJantner Following additional testing, I realized the uncaught exception was the result of running with a previous build, the latest commit appears to have resolved that problem, and handles failures as expected through the failiure relationship. |
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for working through the feedback @KalmanJantner! The latest version looks like it covers previous code comments I had noted.
Do you have any additional feedback @pvillard31?
...processors/src/main/java/org/apache/nifi/processors/aws/ml/AwsMachineLearningJobStarter.java
Outdated
Show resolved
Hide resolved
...s-processors/src/main/java/org/apache/nifi/processors/aws/ml/polly/GetAwsPollyJobStatus.java
Outdated
Show resolved
Hide resolved
...s-processors/src/main/java/org/apache/nifi/processors/aws/ml/polly/GetAwsPollyJobStatus.java
Outdated
Show resolved
Hide resolved
...essors/src/main/java/org/apache/nifi/processors/aws/ml/textract/GetAwsTextractJobStatus.java
Outdated
Show resolved
Hide resolved
...rs/src/main/java/org/apache/nifi/processors/aws/ml/transcribe/GetAwsTranscribeJobStatus.java
Outdated
Show resolved
Hide resolved
…java/org/apache/nifi/processors/aws/ml/AwsMachineLearningJobStarter.java Co-authored-by: exceptionfactory <exceptionfactory@apache.org>
…java/org/apache/nifi/processors/aws/ml/polly/GetAwsPollyJobStatus.java Co-authored-by: exceptionfactory <exceptionfactory@apache.org>
…java/org/apache/nifi/processors/aws/ml/transcribe/GetAwsTranscribeJobStatus.java Co-authored-by: exceptionfactory <exceptionfactory@apache.org>
…java/org/apache/nifi/processors/aws/ml/textract/GetAwsTextractJobStatus.java Co-authored-by: exceptionfactory <exceptionfactory@apache.org>
…java/org/apache/nifi/processors/aws/ml/polly/GetAwsPollyJobStatus.java Co-authored-by: exceptionfactory <exceptionfactory@apache.org>
…java/org/apache/nifi/processors/aws/ml/AwsMachineLearningJobStarter.java Co-authored-by: exceptionfactory <exceptionfactory@apache.org>
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for working through all of the feedback @KalmanJantner, and thanks for the review confirmation @pvillard31, the latest version looks good! +1 merging
…anscribe
Summary
NIFI-10710
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000NIFI-00000Pull Request Formatting
mainbranchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
mvn clean install -P contrib-checkLicensing
LICENSEandNOTICEfilesDocumentation