NIFI-2892 Implement AWS Kinesis Stream Get Processor#4822
NIFI-2892 Implement AWS Kinesis Stream Get Processor#4822ChrisSamo632 wants to merge 15 commits intoapache:mainfrom
Conversation
34acc84 to
a6dccab
Compare
50df03c to
bf94778
Compare
|
Thanks! Looking forward to this piece going out |
5d26754 to
8231dc3
Compare
...processors/src/main/java/org/apache/nifi/processors/aws/kinesis/stream/GetKinesisStream.java
Outdated
Show resolved
Hide resolved
8231dc3 to
5e467ef
Compare
|
Hi ! I'm interested in testing this out. |
@auyer I think you'd need to download the source from my branch, build it locally and then copy the new NARs into your image in the So, something like: # clone the code
git clone https://github.com/ChrisSamo632/nifi.git
git checkout NIFI-2892
# build the code
cd nifi/nifi-nar-bundles/nifi-aws-bundle
mvn -T 2.0C clean install
# get the NARs
find . -name "*.nar" -exec {} /tmpCopy these into a custom Docker Image with something like the following: FROM apache/nifi:1.13.0
RUN rm lib/nifi-aws*.nar
COPY [ "*.nar", "lib/" ]Then run the custom image in the way you normally would. Note that I've not tested the above, so you may need to correct the instructions as you go - when developing NiFi, I just build from source and run natively, I've not got to the point of trying to copy the into a Docker container yet although that is how I run NiFi normally. |
|
I've been testing this for a few days, and had no issues so far. I should say that I'm not running NiFi in production yet, but plan to do so soon, and this feature helps a lot ! Note: I will remove this repo after this PR gets into the next release. |
|
@ChrisSamo632 Any timelines for merging/releasing it out? |
@bhaveshpatelh it's ready to go as far as I'm concerned, but it needs someone to review and then a committer to merge it (there's a big backlog of PRs, I've no idea when anyone will get to this... much as I'd like it in too because we want to use it in production) |
|
@ChrisSamo632 - out of curiosity, did you consider the addition of a record reader/writer like with ConsumeKafkaRecord processors? At the moment, as far as I can tell, we would have one record per flow file. Using the records abstraction would provide options such as schema validation, format transformation, as well as having multiple records in one flow file (which is greatly improving the performances in case there is a high number of messages per second). |
@pvillard31 think I considered this in our original Slack conversation (or was that with @bbende?), but thought I'd leave out record writers at the minute for simplicity and to better understand how the Kinesis message (vs. Record) structure works With the KCL worker and multi-threaded consumer approach taken by the KCL library, we'd need a way of combining the records in the processor too (how do we combine records from different consumers? How about where the consumers are reading from different shards and/or multiple consumers from the same shard?) So I thought a record writer may be a sensible extension (fully agree it would be good to include from a nifi perspective) once there's more understanding of how people use the processor and how it works with different kinesis setups (so far I've only really tested it with simple streams) |
|
@ChrisSamo632 Thanks for picking up this task and implementing this non-trivial processor! I did not review the code in detail (yet) but spotted some issues regarding thread handling: The I believe one Worker per processor should be enough and it is not necessary to maintain a pool of workers. A single Worker can run multiple threads for executing the Kinesis RecordProcessors. As far as I saw, the Worker spins up a thread for each assigned shard by default (RecordProcessor-xxxx threads). So the parallel processing is provided with one Worker too. The code is simpler in this way and there is less overhead at runtime (each Worker has its own "maintenance" threads like LeaseRenewer-xxxx, LeaseCoordinator-xxxx). The processor cannot stop cleanly due to a bug in KCL. Interestingly, it has just been reported to AWS by someone else: awslabs/amazon-kinesis-client#796 And an idea: the way the processor receives data via KCL is quite similar to other Consume*** processors (like the already mentioned |
|
@turcsanyip thanks for the comments. I'd been a bit confused/concerned about the thread handling and tried (but failed) to find a suitable example, what you suggest looks like a sensible approach on first inspection - I'll try to make the changes sooner than later Good spot with the KCL issue, will try the downgrade as you suggest (unless a fix comes from AWS in the meantime) @pvillard31 I'll look again at using Record Reader/Writer at the same time in a fashion similar to the processors suggested above (although I may still suggest it as a future improvement ticket) |
|
@turcsanyip started to have a look at making these changes. Using KCL 1.13.3 instead of 1.14.x appears to be straight forward (i.e. no big API changes). The refactoring to use a single Worker per processor and yield once the Worker has been setup looks fairly straight forward and I agree a simpler implementation except that if I am to follow the Is there a straight forward way around this do you think? OIne concern I had with the original implementation was the fact that I was holding on to a single I'll look again at this further, but any guidance is welcome! @pvillard31 having looked again at the Record processing now I've the Azure processor to compare with, I think includiung them here should be fairly straight forward (famous last words) EDIT: the Session Factory problem can be sorted by changing the base abstract AWS processor to extend |
5a063c6 to
58d89e8
Compare
|
@turcsanyip updated per your suggestions. Note that this includes:
|
|
@ChrisSamo632 When I tried following commands to build the processor, it's throwing an error. |
|
@bhaveshpatelh the The easiest (but probably not fastest) thing is probably to just build the whole of nifi. For a faster build, you could try: cd nifi-api
mvn clean install
cd ../nifi-nar-bundles/nifi-aws-bundle
mvn -T 2.0C clean install(Note: I've not tried this, so no guarantees it will work) |
...essors/src/main/java/org/apache/nifi/processors/aws/kinesis/stream/ConsumeKinesisStream.java
Show resolved
Hide resolved
|
Hey @ChrisSamo632 - It seems there are some build issues with the unit tests. |
@turcsanyip I'd noticed ZOMBIE logs a few times and thought it seemed a "normal" thing for KCL (from what I could tell, but note I'm hardly a Kinesis/KCL expert). Taking a look around on the internet, there are many threads about people seeing this kind of behaviour in many different versions of the KCL, for example:
A common thing between these seems to be that the KCL settings likely need tweaking depending upon one's setup. With the ability to configure most KCL settings via Dynamic Properties on the processor, it's largely open for users to figure this out I guess depending upon their setup. That said, a common theme seems to be the need to increase the |
...essors/src/main/java/org/apache/nifi/processors/aws/kinesis/stream/ConsumeKinesisStream.java
Outdated
Show resolved
Hide resolved
...essors/src/main/java/org/apache/nifi/processors/aws/kinesis/stream/ConsumeKinesisStream.java
Outdated
Show resolved
Hide resolved
...essors/src/main/java/org/apache/nifi/processors/aws/kinesis/stream/ConsumeKinesisStream.java
Outdated
Show resolved
Hide resolved
...essors/src/main/java/org/apache/nifi/processors/aws/kinesis/stream/ConsumeKinesisStream.java
Outdated
Show resolved
Hide resolved
...essors/src/main/java/org/apache/nifi/processors/aws/kinesis/stream/ConsumeKinesisStream.java
Outdated
Show resolved
Hide resolved
...essors/src/main/java/org/apache/nifi/processors/aws/kinesis/stream/ConsumeKinesisStream.java
Outdated
Show resolved
Hide resolved
...essors/src/main/java/org/apache/nifi/processors/aws/kinesis/stream/ConsumeKinesisStream.java
Outdated
Show resolved
Hide resolved
turcsanyip
left a comment
There was a problem hiding this comment.
@ChrisSamo632 NIFI-8431 (Redundant validation of Dynamic Properties) has been merged. Could you please rebase your PR onto main and update AbstractConfigurableComponent according to that?
Please also find my comments regarding tests.
...org/apache/nifi/processors/aws/kinesis/stream/record/TestAbstractKinesisRecordProcessor.java
Outdated
Show resolved
Hide resolved
...org/apache/nifi/processors/aws/kinesis/stream/record/TestAbstractKinesisRecordProcessor.java
Outdated
Show resolved
Hide resolved
@turcsanyip yep I'd planned to as soon as I could after the PR that replicated my fix was merged (earlier today). Will include a part of the next commit |
…tKinesisStream dynamic properties
…sisStream; improve Session and Thread handling when initialising Kinesis Client Library Worker and processing Kinesis Records; prevent double-validation of dynamic processors (after NIFI-8266)
Set allowableValues for ConsumeKinesisStream#REPORT_CLOUDWATCH_METRICS property
…deprecated properties; improve provenance reporting)
…nstead of milliseconds)
…fields to avoid storing them and instead pull them from teh Context as needed; additional logging on KCL shutdown)
…revert AbstractProcessor change
turcsanyip
left a comment
There was a problem hiding this comment.
@ChrisSamo632 I found an error case that should be handled in this round in my opinion. If the configured Kinesis Stream does not exist (which is quite a typical user error), then it cannot be seen on the UI. The Kinesis Worker logs error messages but the processor looks fine. After a while, the worker stops but still nothing on the UI.
As far as I see, there is no straightforward way to handle worker errors (eg. pass exception handler in to the worker). WorkerStateChangeListener may be used to monitor worker state changes and log error on the processor when the worker stopped for some reason.
...essors/src/main/java/org/apache/nifi/processors/aws/kinesis/stream/ConsumeKinesisStream.java
Show resolved
Hide resolved
|
@turcsanyip I've implemented a check in Unit test added to show the behaviour (happily the unit tests don't connect to a Kinesis/DynamoDB instance so the Worker always fails eventually, now we throw a Newer KCL versions allow for better handling of this scenario, but we're limited to when we can do in KCL 1.13.3 (see NIFI-8531 for upgrading to KCL 2.x and a note about improving this scenario). |
…t Library Worker transitions to a SHUT_DOWN state without the processor being shutdown first
...essors/src/main/java/org/apache/nifi/processors/aws/kinesis/stream/ConsumeKinesisStream.java
Show resolved
Hide resolved
…isStream with WorkerState has unexpectedly SHUT_DOWN
turcsanyip
left a comment
There was a problem hiding this comment.
+1 LGTM
Merging to main.
@udaykale Thanks for the initial commit / PR.
@ChrisSamo632 Thanks for the further improvements and getting it across the finish line.
|
Just wondering, which version of KCL has this fix or do we still need to use KCL 1.13.3? |
@anekar3416 the KCL version was upgraded as part of NIFI-9520 in PR #5632, released in NiFi 1.16.0. The version of KCL currently used (after the linked ticket & PR) is 1.14.7 - upgrade to KCL 2.x requires larger changes to NiFi first in the future, e.g. NIFI-8287 |
This closes apache#4822. Co-authored-by: uday <udaygkale@gmail.com> Signed-off-by: Peter Turcsanyi <turcsanyi@apache.org>
Thank you for submitting a contribution to Apache NiFi.
Please provide a short description of the PR here:
Description of PR
Enables fetching data from AWS Kinesis Data Streams.
Builds upon the original PR from @udaykale (and comments there by @turcsanyip and @jaynpearl).
Based largely upon the AWS Kinesis Consumer example classes.
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically
main)?Is your initial contribution a single, squashed commit? Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not
squashor use--forcewhen pushing to allow for clean monitoring of changes.For code changes:
mvn -Pcontrib-check clean installat the rootnififolder?[ ] If applicable, have you updated theLICENSEfile, including the mainLICENSEfile undernifi-assembly?NOTICEfile, including the mainNOTICEfile found undernifi-assembly?.displayNamein addition to .name (programmatic access) for each of the new properties?For documentation related changes:
[ ] Have you ensured that format looks appropriate for the output in which it is rendered?Note:
Please ensure that once the PR is submitted, you check GitHub Actions CI for build issues and submit an update to your PR as soon as possible.