Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17012: Update Apache Hadoop to 3.3.6 and Apache Curator to 5.5.0 #1743

Merged
merged 11 commits into from
Oct 4, 2023

Conversation

solrbot
Copy link
Collaborator

@solrbot solrbot commented Jul 1, 2023

This PR contains the following updates:

Package Type Update Change
org.apache.hadoop:hadoop-minikdc test patch 3.3.5 -> 3.3.6
org.apache.hadoop:hadoop-hdfs test patch 3.3.5 -> 3.3.6
org.apache.hadoop:hadoop-client-minicluster test patch 3.3.5 -> 3.3.6
org.apache.hadoop:hadoop-common dependencies patch 3.3.5 -> 3.3.6
org.apache.hadoop:hadoop-client-runtime dependencies patch 3.3.5 -> 3.3.6
org.apache.hadoop:hadoop-client-api dependencies patch 3.3.5 -> 3.3.6
org.apache.hadoop:hadoop-auth dependencies patch 3.3.5 -> 3.3.6
org.apache.hadoop:hadoop-annotations dependencies patch 3.3.5 -> 3.3.6

Configuration

📅 Schedule: Branch creation - "before 3am on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about these updates again.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Renovate Bot

@solrbot solrbot force-pushed the renovate/org.apache.hadoop branch 3 times, most recently from c02e998 to 7846230 Compare August 18, 2023 08:25
@HoustonPutman
Copy link
Contributor

I've given up hope for 3.4.0 at this point.

Copy link
Contributor

@risdenk risdenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to check some of the test results...

@risdenk risdenk self-assigned this Oct 3, 2023
@risdenk
Copy link
Contributor

risdenk commented Oct 3, 2023

8897 ERROR (jetty-launcher-8-thread-1) [n:127.0.0.1:43013_solr] o.a.s.s.CoreContainerProvider Could not start Solr. Check solr/home property and the logs
2023-10-02T21:23:26.3381926Z   2>           => java.lang.NoClassDefFoundError: org/apache/curator/framework/recipes/cache/CuratorCache

https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-project/pom.xml#L103

hadoop upgraded to curator 5.x - https://issues.apache.org/jira/browse/HADOOP-18515

@risdenk
Copy link
Contributor

risdenk commented Oct 3, 2023

#1427 is handled here

@risdenk risdenk changed the title Update org.apache.hadoop:* to v3.3.6 Update org.apache.hadoop:* to v3.3.6 and curator to 5.2.0 Oct 3, 2023
@solrbot solrbot changed the title Update org.apache.hadoop:* to v3.3.6 and curator to 5.2.0 Update org.apache.hadoop:* to v3.3.6 Oct 3, 2023
@risdenk risdenk changed the title Update org.apache.hadoop:* to v3.3.6 Update org.apache.hadoop:* to v3.3.6 and curator to 5.2.0 51 minutes ago Oct 3, 2023
@risdenk risdenk changed the title Update org.apache.hadoop:* to v3.3.6 and curator to 5.2.0 51 minutes ago Update org.apache.hadoop:* to v3.3.6 and curator to 5.2.0 Oct 3, 2023
@risdenk
Copy link
Contributor

risdenk commented Oct 3, 2023

The smaller subset of errors after upgrading to curator 5.2.0

 2> 2190 ERROR (Curator-SafeNotifyService-0) [] o.a.h.s.t.d.ZKDelegationTokenSecretManager Error while processing Curator tokenCacheListener NODE_CREATED / NODE_CHANGED event
  2> 2191 ERROR (Curator-SafeNotifyService-0) [] o.a.c.f.l.MappingListenerManager Listener (org.apache.curator.framework.recipes.cache.CuratorCacheListenerBuilderImpl$2@395db076) threw an exception
  2>           => java.io.UncheckedIOException: java.io.IOException: Unknown version of delegation token 49
  2> 	at org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.lambda$startThreads$2(ZKDelegationTokenSecretManager.java:340)
  2> java.io.UncheckedIOException: java.io.IOException: Unknown version of delegation token 49
  2> 	at org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.lambda$startThreads$2(ZKDelegationTokenSecretManager.java:340) ~[hadoop-common-3.3.6.jar:?]

@solrbot
Copy link
Collaborator Author

solrbot commented Oct 3, 2023

Edited/Blocked Notification

Renovate will not automatically rebase this PR, because it does not recognize the last commit author and assumes somebody else may have edited the PR.

You can manually request rebase by checking the rebase/retry box above.

Warning: custom changes will be lost.

@risdenk
Copy link
Contributor

risdenk commented Oct 4, 2023

Some of the Hadoop test failures were just normal thread leaks that were handled by de729bb

There were another subset of failures that were more interesting. I found a solution to the Hadoop test failures: 40a8228

The failure was that Solr through Hadoop's ZKDelegationTokenSecretManager could not create a znode since it already exists. There is a check but its a race condition against multiple Solr instances starting up - https://github.com/apache/hadoop/blame/rel/release-3.3.6/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/ZKDelegationTokenSecretManager.java#L270

There is probably a fix in ZKDelegationTokenSecretManager that would avoid the race condition, but making Solr startup more serial in tests worked.

236 ERROR (jetty-launcher-8-thread-1) [n:127.0.0.1:56203_solr] o.a.s.s.CoreContainerProvider Could not start Solr. Check solr/home property and the logs
          => java.lang.RuntimeException: Could not start class org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager: java.io.IOException: Could not create namespace
	at org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.init(DelegationTokenManager.java:149)
java.lang.RuntimeException: Could not start class org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager: java.io.IOException: Could not create namespace
	at org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.init(DelegationTokenManager.java:149) ~[hadoop-common-3.3.6.jar:?]
	at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.initTokenManager(DelegationTokenAuthenticationHandler.java:163) ~[hadoop-common-3.3.6.jar:?]
	at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.init(DelegationTokenAuthenticationHandler.java:131) ~[hadoop-common-3.3.6.jar:?]
	at org.apache.hadoop.security.authentication.server.AuthenticationFilter.initializeAuthHandler(AuthenticationFilter.java:194) ~[hadoop-auth-3.3.6.jar:?]
	at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.initializeAuthHandler(DelegationTokenAuthenticationFilter.java:215) ~[hadoop-common-3.3.6.jar:?]
	at org.apache.solr.security.hadoop.HadoopAuthFilter.initializeAuthHandler(HadoopAuthFilter.java:124) ~[main/:?]
	at org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:180) ~[hadoop-auth-3.3.6.jar:?]
	at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.init(DelegationTokenAuthenticationFilter.java:181) ~[hadoop-common-3.3.6.jar:?]
	at org.apache.solr.security.hadoop.HadoopAuthFilter.init(HadoopAuthFilter.java:75) ~[main/:?]
	at org.apache.solr.security.hadoop.HadoopAuthPlugin.init(HadoopAuthPlugin.java:135) ~[main/:?]
	at org.apache.solr.core.CoreContainer.initializeAuthenticationPlugin(CoreContainer.java:569) ~[solr-core-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT a3945a2c3710b1a355abdea7a2e63b5353ad0723 [snapshot build, details omitted]]
	at org.apache.solr.core.CoreContainer.reloadSecurityProperties(CoreContainer.java:1185) ~[solr-core-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT a3945a2c3710b1a355abdea7a2e63b5353ad0723 [snapshot build, details omitted]]
	at org.apache.solr.core.CoreContainer.loadInternal(CoreContainer.java:854) ~[solr-core-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT a3945a2c3710b1a355abdea7a2e63b5353ad0723 [snapshot build, details omitted]]
	at org.apache.solr.core.CoreContainer.load(CoreContainer.java:763) ~[solr-core-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT a3945a2c3710b1a355abdea7a2e63b5353ad0723 [snapshot build, details omitted]]
	at org.apache.solr.servlet.CoreContainerProvider.createCoreContainer(CoreContainerProvider.java:427) ~[solr-core-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT a3945a2c3710b1a355abdea7a2e63b5353ad0723 [snapshot build, details omitted]]
	at org.apache.solr.servlet.CoreContainerProvider.init(CoreContainerProvider.java:246) [solr-core-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT a3945a2c3710b1a355abdea7a2e63b5353ad0723 [snapshot build, details omitted]]
	at org.apache.solr.embedded.JettySolrRunner$1.lifeCycleStarted(JettySolrRunner.java:405) [solr-test-framework-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT a3945a2c3710b1a355abdea7a2e63b5353ad0723 [snapshot build, details omitted]]
	at org.eclipse.jetty.util.component.AbstractLifeCycle.setStarted(AbstractLifeCycle.java:253) [jetty-util-10.0.16.jar:10.0.16]
	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:94) [jetty-util-10.0.16.jar:10.0.16]
	at org.apache.solr.embedded.JettySolrRunner.retryOnPortBindFailure(JettySolrRunner.java:614) [solr-test-framework-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT a3945a2c3710b1a355abdea7a2e63b5353ad0723 [snapshot build, details omitted]]
	at org.apache.solr.embedded.JettySolrRunner.start(JettySolrRunner.java:552) [solr-test-framework-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT a3945a2c3710b1a355abdea7a2e63b5353ad0723 [snapshot build, details omitted]]
	at org.apache.solr.embedded.JettySolrRunner.start(JettySolrRunner.java:523) [solr-test-framework-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT a3945a2c3710b1a355abdea7a2e63b5353ad0723 [snapshot build, details omitted]]
	at org.apache.solr.cloud.MiniSolrCloudCluster.startJettySolrRunner(MiniSolrCloudCluster.java:508) [solr-test-framework-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT a3945a2c3710b1a355abdea7a2e63b5353ad0723 [snapshot build, details omitted]]
	at org.apache.solr.cloud.MiniSolrCloudCluster.lambda$new$0(MiniSolrCloudCluster.java:320) [solr-test-framework-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT a3945a2c3710b1a355abdea7a2e63b5353ad0723 [snapshot build, details omitted]]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:294) [solr-solrj-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT a3945a2c3710b1a355abdea7a2e63b5353ad0723 [snapshot build, details omitted]]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.io.IOException: Could not create namespace
	at org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.startThreads(ZKDelegationTokenSecretManager.java:275) ~[hadoop-common-3.3.6.jar:?]
	at org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.init(DelegationTokenManager.java:146) ~[hadoop-common-3.3.6.jar:?]
	... 28 more
Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /solr/security/zkdtsm/ZKDTSMRoot
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:125) ~[zookeeper-3.9.0.jar:3.9.0]
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:53) ~[zookeeper-3.9.0.jar:3.9.0]
	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1450) ~[zookeeper-3.9.0.jar:3.9.0]
	at org.apache.curator.framework.imps.CreateBuilderImpl$18.call(CreateBuilderImpl.java:1223) ~[curator-framework-5.2.0.jar:5.2.0]
	at org.apache.curator.framework.imps.CreateBuilderImpl$18.call(CreateBuilderImpl.java:1193) ~[curator-framework-5.2.0.jar:5.2.0]
	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) ~[curator-client-5.2.0.jar:?]
	at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1190) ~[curator-framework-5.2.0.jar:5.2.0]
	at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:605) ~[curator-framework-5.2.0.jar:5.2.0]
	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:595) ~[curator-framework-5.2.0.jar:5.2.0]
	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:573) ~[curator-framework-5.2.0.jar:5.2.0]
	at org.apache.curator.framework.imps.CreateBuilderImpl$4.forPath(CreateBuilderImpl.java:461) ~[curator-framework-5.2.0.jar:5.2.0]
	at org.apache.curator.framework.imps.CreateBuilderImpl$4.forPath(CreateBuilderImpl.java:391) ~[curator-framework-5.2.0.jar:5.2.0]
	at org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.startThreads(ZKDelegationTokenSecretManager.java:272) ~[hadoop-common-3.3.6.jar:?]
	at org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.init(DelegationTokenManager.java:146) ~[hadoop-common-3.3.6.jar:?]
	... 28 more

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good. We may not know for certain if there are perf implications on a single threaded executor here but given the support burden, I'd rather err on the side of our convenience.

The change in Curator version from 4x to 5x means losing compatibility with ZooKeeper 3.4 -- which is fine. But I think it should be documented in the upgrade notes. Obviously it'd only apply to the (small?) subset of users actually using Hadoop-auth.

https://curator.apache.org/zk-compatibility-34.html. (it refers to Curator 4.2 but I believe equally applies to later versions in 4x).

@risdenk
Copy link
Contributor

risdenk commented Oct 4, 2023

Argggg same exception with some nightly tests. It just dumb that there is this race condition. I'm poking around still to see if there is a way forward.

@risdenk
Copy link
Contributor

risdenk commented Oct 4, 2023

Sigh so I found apache/hadoop#4885 which tried to fix the problem but the race condition is still there :/ I think the fix should be to catch the exception if the znode exists.

@risdenk
Copy link
Contributor

risdenk commented Oct 4, 2023

So I reverted the single threaded portion since there are other code paths that spin up Solr in parallel.

I implemented ae141d8 instead which creates the znode the same way as Hadoop upfront. Before Hadoop has a chance to check. This will properly ignore if the znode already exists.

@risdenk risdenk requested a review from dsmiley October 4, 2023 14:59
@risdenk
Copy link
Contributor

risdenk commented Oct 4, 2023

I'm running tests and nightly tests hopefully this is a reasonable step forward.

@risdenk
Copy link
Contributor

risdenk commented Oct 4, 2023

nightly and regular tests all passed for me now.

@risdenk
Copy link
Contributor

risdenk commented Oct 4, 2023

re: upgrade notes - https://solr.apache.org/guide/solr/latest/upgrade-notes/major-changes-in-solr-9.html#solr-8-2 has a note about zookeeper 3.5 but happy to put a note in as well.

@risdenk risdenk changed the title Update org.apache.hadoop:* to v3.3.6 and curator to 5.2.0 Update org.apache.hadoop:* to v3.3.6 and curator to 5.5.0 Oct 4, 2023
@risdenk
Copy link
Contributor

risdenk commented Oct 4, 2023

upgraded to latest curator 5.5.0 since it had some decent bug fixes (no major features). only thing left is CHANGES and upgrade notes about curator 5.x and zookeeper 3.4

@risdenk risdenk changed the title Update org.apache.hadoop:* to v3.3.6 and curator to 5.5.0 Update Apache Hadoop to 3.3.6 and Apache Curator to 5.5.0 Oct 4, 2023
@risdenk risdenk changed the title Update Apache Hadoop to 3.3.6 and Apache Curator to 5.5.0 SOLR-17012: Update Apache Hadoop to 3.3.6 and Apache Curator to 5.5.0 Oct 4, 2023
@risdenk
Copy link
Contributor

risdenk commented Oct 4, 2023

I created https://issues.apache.org/jira/browse/SOLR-17012 since other work depends on curator being 5.x so wanted to call it out more.

@risdenk risdenk merged commit 621bc2a into apache:main Oct 4, 2023
2 checks passed
risdenk added a commit that referenced this pull request Oct 4, 2023
@dsmiley
Copy link
Contributor

dsmiley commented Oct 5, 2023

re: upgrade notes

The existing notes don't address my point at all. And as we embrace Curator (in a separate issue), the lack of ZK 3.4 compatibility will become more pronounced (not limited to Hadoop Auth).

@risdenk
Copy link
Contributor

risdenk commented Oct 5, 2023

re: upgrade notes

The existing notes don't address my point at all. And as we embrace Curator (in a separate issue), the lack of ZK 3.4 compatibility will become more pronounced (not limited to Hadoop Auth).

agreed I added new upgrade notes was just linking to the point that we had mentioned ZK 3.5 recommended a while ago so hopefully 3.4 isn't as big of an issue.

@epugh
Copy link
Contributor

epugh commented Oct 5, 2023

Is it worth having a ZK matrix in ref guide similar to the sections on Java in https://solr.apache.org/guide/solr/latest/deployment-guide/system-requirements.html#released-solr-and-java-versions ? I've seen in various forums ask "Is my ZK too old to work with Solr X"

@risdenk
Copy link
Contributor

risdenk commented Oct 5, 2023

FWIW I put together https://issues.apache.org/jira/browse/HADOOP-18922 and apache/hadoop#6150 to address the race condition upstream.

janhoy pushed a commit that referenced this pull request Dec 6, 2023
janhoy added a commit that referenced this pull request Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants