Skip to content

NIFI-13934 Remove Hive 3 Catalog from Iceberg Services#9452

Closed
exceptionfactory wants to merge 1 commit intoapache:mainfrom
exceptionfactory:NIFI-13934
Closed

NIFI-13934 Remove Hive 3 Catalog from Iceberg Services#9452
exceptionfactory wants to merge 1 commit intoapache:mainfrom
exceptionfactory:NIFI-13934

Conversation

@exceptionfactory
Copy link
Contributor

Summary

NIFI-13934 Removes the Hive 3 Catalog implementation from Iceberg services due to the Apache Hive project declaring Hive 3 end of life on 8 October 2024. Removing Hive 3 dependencies removes several flagged vulnerabilities associated with libthrift 0.9.3 and also avoids potential issues due to lack of future updates for Hive 3.

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

- Removed Iceberg Hive Metastore Catalog implementation due Hive 3 end of life declaration
- Removed associated dependency check suppressions no longer needed
@joewitt
Copy link
Contributor

joewitt commented Oct 26, 2024

That PR is a bit stunning in just how much nonsense gets deleted that was purely there to work around long standing reported vulns and pom wrangling and so on. Thanks for doing this. Will give it a look and verify results on the march to clean builds and vulnerability reports

@arpadboda
Copy link
Contributor

Sorry, but at first look I'm -1 to this PR:

First of all I would like to note that I completely agree with removing Hive 3 dependencies and removing these vulnerabilities, thank you David for bringing this topic up and initiating the discussion/changes!

My concerns lie with timing and complexity introduced by these changes.
By complexity I mean that this change actually remove a feature from a component, but leaves the component there, so whoever is willing to use it on his/her own risk needs to duplicate the whole thing with many potential conflicts later. Removing a NAR completely is fine, in case someone needs it later, just picks the last available version and it most probably works with latest NiFi. No vulnerabilities in NiFi, this is a win-win deal for everyone.
However this case is far more complex, leaving the NAR in NiFi but removing features used by members of the community, making their lives much more difficult.

By timining I mean:
-Do we want to have it in 2.0? As this is behind a profile, doesn't get built into NiFi unless the user explicitly wants it, no vulnerability is introduced by default, on the other hand we release 2.0 with half-functioning Iceberg services.
-Do we plan to merge it after 2.0? In this case we are not in a rush and we should do it properly (replacing the deprecated Hive 3 dependecy with Hive 4 libs, which should eliminate most of the security concerns raised here). I'm pretty sure some members of the community are happy to help in these activities, for eg @mark-bathori

@mark-bathori
Copy link
Contributor

+1 on @arpadboda's opinion, Hive 3 support should be removed but doing this right before the release and removing the hive support from Iceberg could lead to confusion.
Also I'm happy to work on the Hive4 upgrade and testing.

@exceptionfactory
Copy link
Contributor Author

Thanks for the feedback @joewitt, @arpadboda, and @mark-bathori.

Regarding removing a component versus removing a NAR, it is important to note that the HiveCatalogService is a separate Controller Service, distinct from the other implementations. Although the existing module layout placed this service together with the others, it could be repackaged in a separate NAR bundle for better isolation from the other implementations. From that perspective, I'm not quite following the concern regarding the complexity of removing a component versus removing a NAR.

Regarding the timing, I realize that Hive 4 is fairly new and that the decision to declare Hive 3 EOL is also fairly recent. As we have an opportunity to provide a cleaner baseline for NiFi 2.0, that is the reason for raising this right now. We could revisit this after the NiFi 2.0 release, and consider this an acceptable change given that Hive 3 is no longer receiving updates.

Anyone who needs Hive 3 Catalog support could continue to use the 2.0.0-M4 release version, with the understanding that it is not supported from either NiFi or Hive. I think there is actually more confusion from releasing the current version, because it implies support from the NiFi project, which we are not able to provide, given that Hive 3 is EOL.

@mark-bathori Do you have a rough idea of what would be involved in implementing support for Hive 4?

@arpadboda For clarification, if the current Hive 3 version of HiveCatalogService were in its own NAR, would you be in favor of removing it? I could see an intermediate change to first decouple that implementation to its own NAR, and then removing it if that is your primary concern.

@exceptionfactory
Copy link
Contributor Author

@arpadboda I should also note that the NiFi release is the source code, for which the project is responsible. Even though Iceberg support for nifi-assembly is behind a profile, the release process still publishes a NAR to Maven Central, releasing vulnerable and now unsupported components.

If I understand correctly, it sounds like you would be in favor of removal if this were in its own NAR? That sounds like a way forward, but it would be helpful if you can clarify.

@mark-bathori
Copy link
Contributor

@exceptionfactory Didn't dig too deep yet but the nifi-iceberg-test-utils will need some rework for sure because some of the classes used there were completle removed.

Originally the Service implementations were separate in the service NAR but they had to be moved due the Kerberos support (NIFI-11334). The problem is that for the processor instance isolation the Catalog implementation needs to be in the same NAR as the processor because the current implementation of @RequiresInstanceClassLoading doesn't create separated instances from other included NAR's. Based on these issues it is not an option currently to completely separate the Catalog logic from the processor NAR.

@exceptionfactory
Copy link
Contributor Author

Thanks for the additional background on the current structure @mark-bathori, I appreciate the challenges that the Kerberos support presents in terms of class loading.

In light of those issues, it sounds like more significant restructuring is necessary to provide the kind of decoupling required.

With general momentum around Polaris and REST Catalog capabilities, I expect more community interest in that direction, but of course there will still be different implementations in various places. Given the complexities that Kerberos support introduces, it seems like it may be necessary to decouple that, perhaps through a different PutIceberg Processor implementation. That should allow a different NAR hierarchy, supporting separate packaging. Do you have any additional thoughts along these lines? Aside from something like this, it seems that the only alternative is what I have proposed in terms of removing existing Hive 3 support, and then implementing Hive 4 support separately within the current structure.

@arpadboda
Copy link
Contributor

@exceptionfactory The decoupling you mention would be beneficial I agree, but that's far from the scope of this PR and I don't feel like anyone in the community has the capacity to do that in a very short term.

So in nutshell, here are the options I see:
A) The restructuring you mention to completely separate iceberg nar from hive catalog would be something we all like, but it's a very long road and I don't see that happening in the near future
B) Stay with the current, not too ideal structure, but update hive libs to v4, resolving the original goal of this discussion
C) just remove Hive 3 (merging this PR as is), which in my opinion would cause more pain than gain. If Hive 3 would be by default built into NiFi, introducing vulnerabilities, I would be +1 here, but it's not the case, so there is no vulnerability here unless someone explicitly wants to build and run NiFi with these extensions.

A is ideal but I see that very complex to happen in the near future, B seems feasible to me, in case of C I feel like it would introduce more pain than gain.

@exceptionfactory
Copy link
Contributor Author

@arpadboda and @mark-bathori In light of the current situation, I propose removing the current Iceberg bundle entirely for the initial release. We are not in a position to maintain EOL components, so in light of the current complexities, it seems much better to clear to current baseline and start with a new approach. Those who need the existing Iceberg support can continue to use the 2.0.0-M4 version. That will provide the opportunity to address the issues raised in a holistic manner.

I can put together a separate PR for that removal.

What do you think of that option?

@arpadboda
Copy link
Contributor

@exceptionfactory Thanks, sadly, but I have to say yes to that, please feel free to proceed!

@exceptionfactory
Copy link
Contributor Author

Thanks for the reply @arpadboda, and thanks again for pointing out the background for the current implementation @mark-bathori.

I am closing this PR and I have opened #9460 for NIFI-13938 to remove the existing Iceberg components for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants