Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-14702: Remove oppressive language (part1) #1712

Closed

Conversation

MarcusSorealheis
Copy link
Contributor

@MarcusSorealheis MarcusSorealheis commented Aug 2, 2020

Description

Please refer to this PR: #1711

Solution

Replaces master with primary and slave with secondary.

Tests

Run standard tests.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the master branch.
  • I have run ant precommit and the appropriate test suite.
  • I have added tests for my changes.
  • I have added documentation for the Ref Guide (for Solr changes only).

@MarcusSorealheis
Copy link
Contributor Author

taking this out of Draft status. 👨🏾‍💻

@MarcusSorealheis
Copy link
Contributor Author

@murblanc
Copy link
Member

murblanc commented Aug 2, 2020

PR description says:

Replaces master with leader and slave with follower.

Actual replacement is primary/secondary.

I don't see the renamed .png files. Are they part of a different PR?

}
@SuppressWarnings({"rawtypes"})
NamedList master = (NamedList) initArgs.get("master");
boolean enableMaster = isEnabled( master );
NamedList primary = (NamedList) initArgs.get("primary");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a communication between different Solr nodes?
If the current node got updated to new vocabulary primary/secondary but the remote node did not and did send data using master/slave, we might run into issues.

It might be desirable (if the above is true) to support both primary and master as return values (and secondary and slave) until a subsequent version of Solr not expected to work with nodes running code using master/slave.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also same issue in the other direction: if an updated node sends "primary" to a non updated node expecting "master", the non updated node will fail. Maybe we need to send the same data under both "primary" and "master" and the same data under both "secondary" and "slave", and accept either one on the receiving end.

I'm not familiar with how this code exactly works so might be off here, but this rename is not internal so must use caution (and test with renamed nodes talking to non renamed nodes, on both roles).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to look through it. Hopefully I can grok it. Not sure, though. If I can find a more backcompat approach, I will drop it in the PR. Open to suggestions from others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initArgs.containsKey("primary") ? initArgs.get("primary") : initArgs.get("master")?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @chatman commented in the general discussion, maybe these changes happening in a new major version branch does no require them to be backward/forward compatible?

If 8x and 9.0 servers are expected to talk and understand each other, the suggestion looks good on the receiving end. We need something on the sending end as well (send both values, because "primary" being sent to a 8x server expecting only "master" will not be understood).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except for the backcompat issue (which I don't think matters at all).

What's the real issue? Let's accept as input "master" or "primary" in one case and "slave" or "secondary" in the other, as well as send both values over the wire as output.
This will make new code and old code compatible (old talking to new and new talking to old). Once only new code is running, we can safely remove sending or accepting the offending terms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this discussion is resolved?

What's the real issue? Let's accept as input "master" or "primary" in one case and "slave" or "secondary" in the other, as well as send both values over the wire as output.
This will make new code and old code compatible (old talking to new and new talking to old). Once only new code is running, we can safely remove sending or accepting the offending terms.

+1, this should allow us to be backward compatible (and I'd even suggest we keep that for the life of 9 to make upgrades easy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I added I thought @tflobbe Can you let me know where there other places where I should anticipate receiving slave or master

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rolling upgrade across major versions isn't supported, and I don't see any reason to bend over backwards to support that.

Not to derail this effort at all, which I think is very important, but we do claim to support rolling upgrades across major versions, see for example https://lucene.apache.org/solr/guide/8_6/major-changes-in-solr-8.html#rolling-upgrades-with-solr-8. The page on upgrading a Solr cluster is really only meant for SolrCloud users (that's what was meant by "cluster" when it was written, which is actually incorrect since you can have a "cluster" without using SolrCloud), but it doesn't say not to do a rolling upgrade across major versions because we have other docs that tell you how to do exactly that.

I don't think it's a major issue though, we just need to clearly spell out the proper upgrade steps for 9.0 (which doesn't have to be part of this PR). There is going to be a lot for people to do (if they're using removed features and need to install plugins), this isn't going to add much IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except for the backcompat issue (which I don't think matters at all).

What's the real issue? Let's accept as input "master" or "primary" in one case and "slave" or "secondary" in the other, as well as send both values over the wire as output.
This will make new code and old code compatible (old talking to new and new talking to old). Once only new code is running, we can safely remove sending or accepting the offending terms.

The real issue here is that the terms should've been removed years ago, or never used. They are super stressful to interact with and every time I combed through my changes here it was really frustrating. I was in college when the project was released. These terms were not used at Grinnell College. I could give a history lesson if you'd like to understand, but I think we should align with the standards put forth by the IETF and expressed in the Apache Code of Conduct at a minimum. We should also remember to prioritize community over code. We shouldn't even be looking to preserve rolling upgrades in favor of keeping such stressful nomenclature. There's so much changing in 9 already that is backward incompatible.

@MarcusSorealheis
Copy link
Contributor Author

PR description says:

Replaces master with leader and slave with follower.

Actual replacement is primary/secondary.

I've fixed the description. That was a vestige of my initial PR where I ran into problems when I tried to run/understand the changes.

I don't see the renamed .png files. Are they part of a different PR?

On the PNG's, I've fixed them. I was aware of them and forgot to address. I think subsequent PRs will follow, but haven't determined the scope yet. I want to move away from surface level fixes that have deeply emotional implications for code readers and developers on a social level to surface level fixes that have deeply emotional implications for code readers on a technical level. Lots of words there. Better developer experience.

Copy link
Contributor

@chatman chatman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, changes look good. Please make sure all tests pass.
Also, since there are things that are backward incompatible here, let us target this issue for a 9.0 release only, and add upgrade notes in the relevant ref guide page.

solr/solr-ref-guide/src/query-settings-in-solrconfig.adoc Outdated Show resolved Hide resolved
@MarcusSorealheis MarcusSorealheis marked this pull request as draft August 2, 2020 23:56
@MarcusSorealheis
Copy link
Contributor Author

After all the recent changes, making sure the tests pass

@MarcusSorealheis
Copy link
Contributor Author

I ran it again and shard split tests didn't fail so there seems to be some inconsistency there. ReplicationHandler tests still failed. I'm digging as to what they test to see if I can determine why they don't fail when isolated but do fail when run together.

See below:

[junit4]   - org.apache.solr.handler.TestReplicationHandler.doTestStressReplication
[junit4]   - org.apache.solr.handler.TestReplicationHandler.doTestRepeater
[junit4]   - org.apache.solr.handler.TestReplicationHandler.testRateLimitedReplication
[junit4]   - org.apache.solr.handler.TestReplicationHandler.doTestIndexFetchWithPrimaryUrl

So strange

@MarcusSorealheis
Copy link
Contributor Author

Looks like doTestStressReplicaion is a confirmed Bad Apple as defined here: https://issues.apache.org/jira/browse/SOLR-12028

In other words, moving on to check the others. I'll batch my comments accordingly.

@MarcusSorealheis
Copy link
Contributor Author

I cannot get the additional three tests to fail with consistency when isolated, but I cannot say with any certainty that they are not an issue. A second (maybe third) set of eyes would be really helpful here. It's really tricky for me to understand why they are failing the test config files look good afaik. let me know if this warrants further investigation.

@murblanc
Copy link
Member

murblanc commented Aug 3, 2020

I cannot get the additional three tests to fail with consistency when isolated, but I cannot say with any certainty that they are not an issue. A second (maybe third) set of eyes would be really helpful here. It's really tricky for me to understand why they are failing the test config files look good afaik. let me know if this warrants further investigation.

Did you run these tests with same scrutiny before your changes? Maybe behavior is similar?

@noblepaul
Copy link
Contributor

This PR should be merged as soon as possible because it is a very large PR and can fall out of date very quickly.

@chatman . We should get things correct before merging it. This has to be vetted properly

@MarcusSorealheis
Copy link
Contributor Author

Did you run these tests with same scrutiny before your changes? Maybe behavior is similar?

I had not. I am checking now.

@ctargett
Copy link
Contributor

ctargett commented Aug 3, 2020

I looked at the Ref Guide changes, +1 on those from me (but I can't make it a blanket +1 since I can't comment on all the other code changes).

@MarcusSorealheis
Copy link
Contributor Author

in my most recent test, the previously failing tests all passed. Thanks @ErickErickson.

However, a new set of tests failed. Tests with failures [seed: 358D229C22BF813]:

[junit4]   - org.apache.solr.cloud.CloudExitableDirectoryReaderTest.test
 [junit4]   - org.apache.solr.cloud.CloudExitableDirectoryReaderTest.testWhitebox
 [junit4]   - org.apache.solr.cloud.CloudExitableDirectoryReaderTest.testCreepThenBite {seed=[358D229C22BF813:D4908E36E9A995B2]}
 [junit4]   - org.apache.solr.cloud.CloudExitableDirectoryReaderTest.testCreepThenBite {seed=[358D229C22BF813:60C632CADD6B5E9E]}
 [junit4]   - org.apache.solr.cloud.CloudExitableDirectoryReaderTest.testCreepThenBite {seed=[358D229C22BF813:EE2FA4168CAF1655]}
 [junit4]   - org.apache.solr.cloud.CloudExitableDirectoryReaderTest.testCreepThenBite {seed=[358D229C22BF813:DFC10FF3E0514D7C]}
 [junit4]   - org.apache.solr.cloud.CloudExitableDirectoryReaderTest.testCreepThenBite {seed=[358D229C22BF813:93008A5E41598DC7]}
 [junit4]   - org.apache.solr.schema.TestBulkSchemaConcurrent (suite)

@MarcusSorealheis
Copy link
Contributor Author

Then they passed with the same seed.

BUILD SUCCESSFUL

Total time: 2 minutes 43 seconds
Marcuss-MacBook-Pro:lucene-solr marcussorealheis$ history 2
  600   ant test  -Dtestcase=CloudExitableDirectoryReaderTest -Dtests.method=testCreepThenBite -Dtests.seed=358D229C22BF813 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=sr-Cyrl -Dtests.timezone=America/Argentina/San_Juan -Dtests.asserts=true -Dtests.file.encoding=UTF-8

@MarcusSorealheis
Copy link
Contributor Author

MarcusSorealheis commented Aug 4, 2020

all tests pass:

image

Can we agree that I can add the back-compat changes to support rolling upgrades in a subsequent PR @risdenk @tflobbe @ctargett @thelabdude @murblanc @noblepaul @elyograg @janhoy @chatman

Copy link
Member

@tflobbe tflobbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at the PR only, do I don't know if there are missing references (other than those right next to some change). I left some comments, mostly regarding back compatibility.

Comment on lines +79 to +84
NamedList<Object> secondaryConfig = new NamedList<>();
secondaryConfig.add("fetchFromLeader", Boolean.TRUE);
secondaryConfig.add(ReplicationHandler.SKIP_COMMIT_ON_MASTER_VERSION_ZERO, switchTransactionLog);
secondaryConfig.add("pollInterval", pollIntervalStr);
NamedList<Object> replicationConfig = new NamedList<>();
replicationConfig.add("slave", slaveConfig);
replicationConfig.add("secondary", secondaryConfig);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is in the context of SolrCloud, so "secondary" doesn't apply and should be instead follower

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@@ -239,7 +239,7 @@ final private void replicate(String nodeName, SolrCore core, ZkNodeProps leaderp
}

ModifiableSolrParams solrParams = new ModifiableSolrParams();
solrParams.set(ReplicationHandler.MASTER_URL, leaderUrl);
solrParams.set(ReplicationHandler.PRIMARY_URL, leaderUrl);
solrParams.set(ReplicationHandler.SKIP_COMMIT_ON_MASTER_VERSION_ZERO, replicaType == Replica.Type.TLOG);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing master

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof. good catch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, this is a reference to the replication handler. One of the few links between the old way and the new one.

@@ -733,12 +733,12 @@ private void handleBootstrapStatus(SolrQueryRequest req, SolrQueryResponse rsp)
}

static class BootstrapCallable implements Callable<Boolean>, Closeable {
private final String masterUrl;
private final String primaryUrl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the context of SolrCloud, should be "leader"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole file should go away, but that's a PR for another night...but in progress! 😍

Object skipCommitOnMasterVersionZero = initArgs.get(SKIP_COMMIT_ON_MASTER_VERSION_ZERO);
if (skipCommitOnMasterVersionZero != null && skipCommitOnMasterVersionZero instanceof Boolean) {
this.skipCommitOnMasterVersionZero = (boolean) skipCommitOnMasterVersionZero;
Object skipCommitOnPrimaryVersionZero = initArgs.get(SKIP_COMMIT_ON_MASTER_VERSION_ZERO);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing an update here? Also, since you changed the parameter used, this needs to handle back compatibility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the only one that needs to change, possibly. Still stitching it all together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to handle back-compat in this PR. I can commit to it prior to 9.0's release and even prioritize the work before opening PRs on the other work I am doing.

Comment on lines +434 to +435
log.info("Primary's generation: {}", latestGeneration);
log.info("Primary's version: {}", latestVersion);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my point, all these log in IndexFetcher talk about Master/Primary even when running in SolrCloud mode. This is likely adding to the confusion that Shawn talked about in the Jira. If these were "leader"/"Follower", we'd have solve it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gothca

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shawn suggested that we shouldn't use the same terminology for the legacy architecture. I've updated the first PR in any event. At this time, keeping the same terminology seems to be the right way to go unless I rewrite the replication handler. There's an argument for that because I'm sure there are areas where we could improve, but it won't happen in this PR.

@@ -23,7 +23,7 @@ This screen provides status information about each collection & node in your clu
.Only Visible When using SolrCloud
[NOTE]
====
The "Cloud" menu option is only available on Solr instances running in <<getting-started-with-solrcloud.adoc#getting-started-with-solrcloud,SolrCloud mode>>. Single node or master/slave replication instances of Solr will not display this option.
The "Cloud" menu option is only available on Solr instances running in <<getting-started-with-solrcloud.adoc#getting-started-with-solrcloud,SolrCloud mode>>. Single node or primary/secondary replication instances of Solr will not display this option.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say "legacy replication" instead? (especially if we decide to go with leader/follower)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are the only one truly adamant about changing to leader follower that I have seen thus far, so I'm not sure we will go that route. Unless, of course, you are willing to make the changes in that branch. You have a had start. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not?. We can do a sweeping change after we are done with this PR and before merging. Shouldn't be too bad. Also, my comment is about the "mode", which has some other arguments also in that thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd never seen that email

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'm on that list.

@@ -24,7 +24,7 @@ In the left-hand navigation bar, you will see a pull-down menu titled "Collectio
====
The "Collection Selector" pull-down menu is only available on Solr instances running in <<solrcloud.adoc#solrcloud,SolrCloud mode>>.

Single node or master/slave replication instances of Solr will not display this menu, instead the Collection specific UI pages described in this section will be available in the <<core-specific-tools.adoc#core-specific-tools,Core Selector pull-down menu>>.
Single node or primary/secondary replication instances of Solr will not display this menu, instead the Collection specific UI pages described in this section will be available in the <<core-specific-tools.adoc#core-specific-tools,Core Selector pull-down menu>>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -19,7 +19,7 @@

The Core Admin API is primarily used under the covers by the <<collections-api.adoc#collections-api,Collections API>> when running a <<solrcloud.adoc#solrcloud,SolrCloud>> cluster.

SolrCloud users should not typically use the CoreAdmin API directly, but the API may be useful for users of single-node or master/slave Solr installations for core maintenance operations.
SolrCloud users should not typically use the CoreAdmin API directly, but the API may be useful for users of single-node or primary/secondary Solr installations for core maintenance operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -61,7 +61,7 @@ The following components support distributed search:

=== Shards Whitelist

The nodes allowed in the `shards` parameter is configurable through the `shardsWhitelist` property in `solr.xml`. This whitelist is automatically configured for SolrCloud but needs explicit configuration for master/slave mode. Read more details in the section <<distributed-requests.adoc#configuring-the-shardhandlerfactory,Configuring the ShardHandlerFactory>>.
The nodes allowed in the `shards` parameter is configurable through the `shardsWhitelist` property in `solr.xml`. This whitelist is automatically configured for SolrCloud but needs explicit configuration for primary/secondary mode. Read more details in the section <<distributed-requests.adoc#configuring-the-shardhandlerfactory,Configuring the ShardHandlerFactory>>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -16,10 +16,10 @@
// specific language governing permissions and limitations
// under the License.

The Replication screen shows you the current replication state for the core you have specified. <<solrcloud.adoc#solrcloud,SolrCloud>> has supplanted much of this functionality, but if you are still using Master-Slave index replication, you can use this screen to:
The Replication screen shows you the current replication state for the core you have specified. <<solrcloud.adoc#solrcloud,SolrCloud>> has supplanted much of this functionality, but if you are still using Primary-Secondary index replication, you can use this screen to:

. View the replicatable index state. (on a master node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed what here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"View the replicatable index state. (on a master node)"

@MarcusSorealheis
Copy link
Contributor Author

I suppose I should convert this back to draft. I don't think it all the comments makes sense, but there is one place I think it does make sense to preserve the name: ReplicationHandler. I didn't see the mail that went out from Anshum originally where there was some discussion. I started on this effort because I was sick of seeing things in the code base. I need to do some more investigation. I don't want to rush this out given the complexity and all the moving parts. I need to figure out if it makes sense to refactor some of these classes to separate concerns. Probably not because I don't want to add any duplicated logic unless there's a better reason than names. I'll come back sometime tomorrow, probably at night.

@tflobbe
Copy link
Member

tflobbe commented Aug 4, 2020

One way to get back compatibility and rolling upgrades could be to make 9.x code be able to read previous formats, but write new format, and make 8.x (since 8.7) read new and old, but write old? Anyone wanting to do a rolling upgrade to 9 would have to be on at least 8.7. Rolling upgrades to 8.7 would still work.
All the code other than the requests/responses could be changed in 8_x branch, in addition to master.

@noblepaul
Copy link
Contributor

+1 to @tflobbe

We should make the change not break rolling restarts. I like the suggestion

@MarcusSorealheis MarcusSorealheis marked this pull request as draft August 4, 2020 06:59
@MarcusSorealheis
Copy link
Contributor Author

PLEASE DO NOT MERGE — I've reverted back to the initial approach of using Leader/Follower because of various dependencies.

@MarcusSorealheis
Copy link
Contributor Author

Closing this PR in favor of the other: #1711

@MarcusSorealheis
Copy link
Contributor Author

MarcusSorealheis commented Aug 4, 2020

@tflobbe I like the suggestion. Can we save it for the subsequent PR? Will be. Much easier to read a review in that case. I'll set up a standalone 8_x cluster and upgrade some nodes on to 9_x provided rolling upgrades work today. That's the first test. Second test will be to add changes for handling legacy nomenclature to see what happens.

@ErickErickson
Copy link

ErickErickson commented Aug 4, 2020 via email

@ErickErickson
Copy link

ErickErickson commented Aug 4, 2020 via email

@MarcusSorealheis
Copy link
Contributor Author

closing this in favor of the finished product: #1711

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants