-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-10442: Create PutIceberg processor #6368
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mark-bathori Thanks for implementing the Iceberg integration processor for NiFi! It looks interesting!
I managed to set up the processor quite easily to load data into a simple table. So the configuration seems to be straightforward. However, I found some improvement points like using Title Case for properties, fixing typos in descriptions, etc.
I also looked into the poms and added a couple of comments about provided vs compile dependencies, scope inherited from dependency management and also found some unused or duplicated dependencies. Pls. see my comments below.
Will continue with testing and reviewing advanced data conversions, data types, partitions, etc
...dle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java
Outdated
Show resolved
Hide resolved
...eberg-services-api/src/main/java/org/apache/nifi/services/iceberg/IcebergCatalogService.java
Outdated
Show resolved
Hide resolved
...fi-iceberg-services/src/main/java/org/apache/nifi/services/iceberg/HadoopCatalogService.java
Outdated
Show resolved
Hide resolved
...dle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java
Outdated
Show resolved
Hide resolved
...dle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java
Outdated
Show resolved
Hide resolved
nifi-nar-bundles/nifi-iceberg-bundle/nifi-iceberg-processors/pom.xml
Outdated
Show resolved
Hide resolved
nifi-nar-bundles/nifi-iceberg-bundle/nifi-iceberg-processors/pom.xml
Outdated
Show resolved
Hide resolved
...dle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java
Outdated
Show resolved
Hide resolved
...dle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mark-bathori!
Thank you for working this feature out! I'm sure you put a lot of work into it!
To be honest, I had issues with the testing, but probably it is due to my lack of knowledge in the hive world. When I tried to put data into a simple table, as @turcsanyip mentions, I got an error that the table is not an iceberg table:
Component is invalid: 'Component' is invalid because Failed to perform validation due to org.apache.iceberg.exceptions.NoSuchIcebergTableException: Not an iceberg table: hive-catalog.nifitest.icebergtable (type=null)
When I tried to create an Iceberg table with CREATE TABLE nifitest.ib1 (id int, name string) STORED BY ICEBERG;
I got the following exception from Hive:
Error: Error while compiling statement: FAILED: ParseException line 1:65 mismatched input 'ICEBERG' expecting StringLiteral near 'BY' in table file format specification (state=42000,code=40000)
As I said, I think this is a configuration issue on my side, and I don't think it is related to the code. I will continue to figure that out. Nevertheless, I've found a few things in the code that I'd like to share with you below.
Update:
Using STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
instead of just STORED BY ICEBERG
solves my issue.
...dle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java
Outdated
Show resolved
Hide resolved
...dle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java
Show resolved
Hide resolved
...dle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java
Outdated
Show resolved
Hide resolved
...dle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java
Outdated
Show resolved
Hide resolved
...essors/src/test/java/org/apache/nifi/processors/iceberg/TestPutIcebergWithHadoopCatalog.java
Show resolved
Hide resolved
...rc/main/java/org/apache/nifi/processors/iceberg/appender/avro/AvroWithNifiSchemaVisitor.java
Outdated
Show resolved
Hide resolved
...-processors/src/main/java/org/apache/nifi/processors/iceberg/KerberosAwareBaseProcessor.java
Outdated
Show resolved
Hide resolved
...i-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/NifiRecordWrapper.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mark-bathori I added some more comments. Please also rebase your branch to main because there are merge conflicts between the branches.
...dle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java
Outdated
Show resolved
Hide resolved
...dle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java
Outdated
Show resolved
Hide resolved
...dle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java
Outdated
Show resolved
Hide resolved
...rg-processors/src/main/java/org/apache/nifi/processors/iceberg/IcebergTaskWriterFactory.java
Outdated
Show resolved
Hide resolved
...-processors/src/main/java/org/apache/nifi/processors/iceberg/KerberosAwareBaseProcessor.java
Outdated
Show resolved
Hide resolved
Thanks @turcsanyip and @nandorsoma for the review comments. I tried to fix all of the mentioned issues in my current commit. I’ve also added a couple of new elements to my PR:
|
@mark-bathori Thanks for the latest changes! I set my review comments resolved and also tested the file format property and V1 tables. I looked into the data conversion code and now there are 3 separate implementations for Avro, Parquet and ORC conversions using the "low level" Iceberg API to write these data files. |
968cb29
to
34d974f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mark-bathori! Thank you for the additional changes. I've found a few things in the new code and also commented on the outdated ones where I thought it would need your attention.
...ssors/src/main/java/org/apache/nifi/processors/iceberg/converter/IcebergRecordConverter.java
Outdated
Show resolved
Hide resolved
...ssors/src/main/java/org/apache/nifi/processors/iceberg/converter/IcebergRecordConverter.java
Outdated
Show resolved
Hide resolved
...dle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java
Outdated
Show resolved
Hide resolved
...dle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java
Outdated
Show resolved
Hide resolved
...dle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java
Show resolved
Hide resolved
...essors/src/test/java/org/apache/nifi/processors/iceberg/TestPutIcebergWithHadoopCatalog.java
Show resolved
Hide resolved
...rc/main/java/org/apache/nifi/processors/iceberg/appender/avro/AvroWithNifiSchemaVisitor.java
Outdated
Show resolved
Hide resolved
...-processors/src/main/java/org/apache/nifi/processors/iceberg/KerberosAwareBaseProcessor.java
Outdated
Show resolved
Hide resolved
...rg-processors/src/main/java/org/apache/nifi/processors/iceberg/AbstractIcebergProcessor.java
Outdated
Show resolved
Hide resolved
...rg-processors/src/main/java/org/apache/nifi/processors/iceberg/AbstractIcebergProcessor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow I forgot to add this one to my previous set of review.
...dle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java
Outdated
Show resolved
Hide resolved
34d974f
to
e1553b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mark-bathori!
Thanks for the additional changes. I've found a few minor issues, but I'm still uncertain about handling the primitive types in the IcebergRecordConverter
. Please see my inline comments.
...essors/src/main/java/org/apache/nifi/processors/iceberg/converter/GenericDataConverters.java
Outdated
Show resolved
Hide resolved
...rg-processors/src/main/java/org/apache/nifi/processors/iceberg/AbstractIcebergProcessor.java
Outdated
Show resolved
Hide resolved
...ssors/src/main/java/org/apache/nifi/processors/iceberg/converter/IcebergRecordConverter.java
Outdated
Show resolved
Hide resolved
90de976
to
a6039f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @mark-bathori! Pr looks good to me! Once CI jobs pass, I'm fine with merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mark-bathori Thanks for adding this new processor!
@nandorsoma Thanks for your review!
+1 LGTM
Meging to main.
This closes apache#6368. Signed-off-by: Peter Turcsanyi <turcsanyi@apache.org>
This closes apache#6368. Signed-off-by: Peter Turcsanyi <turcsanyi@apache.org>
This closes apache#6368. Signed-off-by: Peter Turcsanyi <turcsanyi@apache.org>
Hi Guys, thanks for this great feature. I've installed nifi 1.20.0 but not able to find this new process PutIceberg ? |
Those NARs are not part of the convenience binary due to ASF rules around the binary size. You can download NARs from Maven central repos and drop those in NiFi: |
Thank you very much, I was not aware of the NARs project. Is there any chance for this to be included in the upcoming versions by default? or It will stays in NARs. |
It'll very likely remain like this. Many components are just provided as NARs via Maven Central. We have to keep the binary size as small as possible. |
Hi @pvillard31, Thanks for your help!.
Files added to /srv/nifi/extensions:/opt/nifi/nifi-current/extensions: Nifi version:1.20
Do I need to add additional NAR file to work with s3 files system? Thanks and Best regards |
@AbdelrahimKA The base nar files don't contain any cloud specific dependency due their sizes. |
Hi, @AbdelrahimKA |
Hi, @MohamedAdelHsn Hope this helps you. |
@mark-bathori Could you please give more details about this? How can we build the NAR file from nifi-iceberg-bundle using include-hadoop-aws profiles. Thanks |
@quydx I managed to do so by cloning the nifi repo, then run the following maven build command mvn clean install -Pinclude-hadoop-aws -pl nifi-nar-bundles/nifi-iceberg-bundle/nifi-iceberg-processors-nar -am -T2C |
Summary
NIFI-10442
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000
NIFI-00000
Pull Request Formatting
main
branchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
mvn clean install -P contrib-check
Licensing
LICENSE
andNOTICE
filesDocumentation