13993 #191

aweisberg · 2018-02-08T17:46:35Z

No description provided.

aweisberg

This gets the usual "well what about a test?" response, but I wouldn't hold up approval on that. I think it's possible to cook up a unit test for if you mock the gossiper and network connection stuff.

aweisberg · 2018-02-08T17:52:51Z

src/java/org/apache/cassandra/net/MessagingService.java

+    {
+        // grab a snapshot of the current cluster from Gossiper. this is completely prone to race conditions, but it's
+        // good enough for the purposes of blocking until some certain percentage of nodes are considered 'alive'/connected.
+        Set<Map.Entry<InetAddressAndPort, EndpointState>> peers = new HashSet<>(Gossiper.instance.getEndpointStates());


When we reach this how do we know that Gossiper will be seeded with any endpoint states so we know to wait on a realistic portion of the cluster?

I assume it's implicit in the bootstrapping process before we get to this point?

Yes, this is only called after we've joined the ring, done bootstrapping, started gossip, and so on.

aweisberg · 2018-02-08T17:53:30Z

src/java/org/apache/cassandra/net/MessagingService.java

@@ -254,6 +258,8 @@ public long getTimeout()
                return DatabaseDescriptor.getRangeRpcTimeout();
            }
        },
+        PING(),


Should this go this early in the enum before USUED_* and company? Does it make sense to ever insert anything into the middle of the enum?

Left a similar comment on ticket, but this should be added at the end:

// remember to add new verbs at the end, since we serialize by ordinal
UNUSED_1,
UNUSED_2,
UNUSED_3,
UNUSED_4,
UNUSED_5,

See discussion on ticket, but I'll clean up the enum and comments

aweisberg · 2018-02-08T17:56:31Z

src/java/org/apache/cassandra/net/MessagingService.java

+
+    public void blockForPeers()
+    {
+        // TODO make these yaml props?


So, should these be yaml props?

well, I wanted to get more input before shovelling more stuff into the yaml :) If we want users to actually use this in a 'standard' way, then they probably should be in the yaml

This is default on so no I don't think we need to add it to the YAML just clean up the TODO.

If we choose to not put these params (alivePercent and aliveTimeoutSecs) into the yaml, should they be documented and added to conf/jvm.options?

If we are going that route it seems like well then it should just be in the YAML. I would say jvm.options should be for things we don't control because they aren't directly part of Cassandra.

Another option is as a YAML option but undocumented if you want to not clutter the YAML, but then operators get annoyed and confused and ask why it's not a yaml option and not documented.

I never really know how to balance the tension of not putting out thousands of config options into a file with not having undocumented but important knobs. I think this one doesn't need to be in the documented category yet because setting it is really as an escape hatch not something people should have to worry about. This is just some feature that should silently work and people don't have to think about. Then later if we find out we are wrong we can add the option and document. This stems the tide of config options people need to think about which has value.

ok, sgtm. I will add the params to Config, but not add to the yaml.

aweisberg · 2018-02-08T17:57:52Z

src/java/org/apache/cassandra/net/MessagingService.java

+    {
+        // TODO make these yaml props?
+        int alivePercent = Integer.getInteger(Config.PROPERTY_PREFIX + "blockForPeers.percent", 70);
+        if (alivePercent < 0)


As a matter of practice silently clamping values always seems wrong to me. They asked for something and you aren't going to give it to them therefore someone is in error and it shouldn't be silent so we can come back and supply the value we intended. I suppose it's how I interpret the principle of least surprise.

This is fair, but I felt that giving some upper bound didn't seem unreasonable. Should we really delay starting the process by an extra order of magnitude just because an operator added an extra zero to the configuration? I understand your point about least surprise, but that might be beyond the spirit of this ticket. wdyt?

Well I think it really sucks when you think you are getting the benefit from some config option and your not and you don't know or don't know why. Out of scope arguments aren't really applicable because this is the ticket adding the feature driven by these config options.

I am not sure what you mean by order of magnitude. The zero clamp means someone tried to enable this feature, but didn't and so their stuff is silently unreliable.

The 100 clamp is they meant to set a value < 100 but had an extra digit and didn't and it's silently being slower to start. I doubt they are going to care as much about that in production use cases.

I don't think you'll find many operators that complain about being informed via a clear error message their config is not sane. This is also an option that people have to request explicitly in order to get an invalid value to clamp in the first place so I really do think they want to know.

Just to clarify: you do not like the bounding of aliveTimeoutSecs, correct? alivePercent is just bounding the percentage.

I don't like silently clamping any of them. If you feel strongly about upper bounding aliveTimeoutSecs it's the closest to being OK because the DB still comes up and is reliable at up to 100 seconds and most of the time it won't wait that long since it will get to 70%. If someone sets a value higher than that we "know" it's not useful so clamping it doesn't hurt them.

My issue isn't with having a bound it's with enforcing it silently. Some of the time you can pick a value for them, but not all of the time.

hmm, starting thinking about this, and, yeah, I guess we can leave the upper bound unenforced and instead just log that the operator can choose to ignore

aweisberg · 2018-02-08T18:09:03Z

src/java/org/apache/cassandra/net/MessagingService.java

+        logger.info("choosing to block until {}% of peers are marked alive; max time to wait = {} seconds", targetAlivePercent, aliveTimeoutSecs);
+
+        // first, send out a ping message to open up the non-gossip connections
+        AtomicInteger connectedCount = sendPingMessages(peers);


Should we also set up the large message connections? We can set up connections in parallel now with Netty right (sweet!). I think we should up all of them. Gossip, large, small.

I thought about that, as well. I can force a message to go out on the large message connection, but the REQUEST_RESPONSE will come back on the small message connection. Unless, of course, I send some empty byte array that exceeds the OutboundMessagingPool#LARGE_MESSAGE_THRESHOLD, which is currently 64k. Admittedly, I'm reticent to do that. I could, however, create variant of the Ping/Pong messages (or modify those) to switch between either large or small message connection.

I guess the concern i had was that many apps might not need the large message connection, and thus it becomes unused, but consumed, resources. Every instance will need the gossip and small message connections, but not every use case calls for the large connections. wdyt?

Do the connections consume resources if there are no queued messages? Are there any buffers allocated? The resources consumed should be minimal and even then if we can theoretically provision these resources then we will get more predictable performance and behavior if we provision them all up front instead of lazily where we find out oh we didn't have enough at some arbitrary later time.

Maybe we are trying to hack this process too much by having the transport system unaware of what we are attempting and if we made it possible to say "hey connect on all things and send this small message on each" it would still be simple. In fact do we even need to send a message or is just instructing the transport system to open them enough?

Do the connections consume resources if there are no queued messages?

Reviewing OutboundMessageConnection, no, they do not consume much in terms of resources. I'm fine either way creating the large message connections eagerly or lazily, so I'll add that in.

do we even need to send a message or is just instructing the transport system to open them enough?

Because connections are unidirectional, and as we'll want each "pair" of outbound/inbound connections for each connection type (gossip, small message, large message) to be established, the only way to do that is to send a 'real' message that the peer can respond to. That being said, perhaps I can work in something that lets a Ping/Pong message "select" which type of transport it wants to be sent on.

init all connection types; allow messages to set ConnType in MsgOut; add params to Config, not yaml move blockFor logic out of MessagingService into it's own class; added utests

aweisberg · 2018-02-21T16:34:12Z

src/java/org/apache/cassandra/net/MessageOut.java

+    /**
+     * Allows sender to explicitly state which connection type the message should be sent on.
+     */
+    public final ConnectionType connectionType;


A part of me wants to say don't make MessageOut bigger, but it's probably a drop in the bucket in terms of allocation rate for processing network messages.

aweisberg · 2018-02-21T17:03:02Z

src/java/org/apache/cassandra/net/StartupClusterConnectivityChecker.java

+        while (checkStatus(peers, connectedCount, startNanos, expirationNanos < System.nanoTime(), completedRounds) == State.CONTINUE)
+        {
+            completedRounds++;
+            Uninterruptibles.sleepUninterruptibly(1, TimeUnit.SECONDS);


I think we want to check the condition pretty aggressively so that startup in test harnesses is as fast as possible since we do it a lot. Like check every millisecond.

jolynch · 2018-03-27T01:06:18Z

@aweisberg since this has been merged should we close this one out?

aweisberg commented Feb 8, 2018

View reviewed changes

jasobrown force-pushed the 13993 branch from d1493c9 to 2f36165 Compare February 15, 2018 00:53

Jason Brown and others added 2 commits February 16, 2018 09:36

Implement block-for functionality

ad2c3dc

renamed a few variables

8011032

jasobrown force-pushed the 13993 branch from 45b140c to 7974a7e Compare February 16, 2018 17:36

after round 1 of feedback from Ariel:

4cb1e72

init all connection types; allow messages to set ConnType in MsgOut; add params to Config, not yaml move blockFor logic out of MessagingService into it's own class; added utests

jasobrown force-pushed the 13993 branch from 7974a7e to 4cb1e72 Compare February 19, 2018 13:41

aweisberg commented Feb 21, 2018

View reviewed changes

reduce thread sleep time from 1 second to 1 ms

dacd0c8

aweisberg closed this Apr 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

13993 #191

13993 #191

aweisberg commented Feb 8, 2018

aweisberg left a comment

aweisberg Feb 8, 2018

jasobrown Feb 14, 2018

aweisberg Feb 8, 2018

jmckenzie-dev Feb 8, 2018

jasobrown Feb 14, 2018

aweisberg Feb 8, 2018

jasobrown Feb 14, 2018

aweisberg Feb 14, 2018

jasobrown Feb 14, 2018

aweisberg Feb 14, 2018

jasobrown Feb 14, 2018

aweisberg Feb 8, 2018

jasobrown Feb 14, 2018

aweisberg Feb 14, 2018

jasobrown Feb 14, 2018

aweisberg Feb 14, 2018

aweisberg Feb 14, 2018

jasobrown Feb 14, 2018

aweisberg Feb 8, 2018

jasobrown Feb 14, 2018

aweisberg Feb 14, 2018

jasobrown Feb 14, 2018 •

edited

aweisberg Feb 21, 2018

aweisberg Feb 21, 2018

jolynch commented Mar 27, 2018

13993 #191

13993 #191

Conversation

aweisberg commented Feb 8, 2018

aweisberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasobrown Feb 14, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jolynch commented Mar 27, 2018

jasobrown Feb 14, 2018 •

edited