Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

13993 #191

Closed
wants to merge 4 commits into from
Closed

13993 #191

wants to merge 4 commits into from

Conversation

aweisberg
Copy link
Contributor

No description provided.

Copy link
Contributor Author

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets the usual "well what about a test?" response, but I wouldn't hold up approval on that. I think it's possible to cook up a unit test for if you mock the gossiper and network connection stuff.

{
// grab a snapshot of the current cluster from Gossiper. this is completely prone to race conditions, but it's
// good enough for the purposes of blocking until some certain percentage of nodes are considered 'alive'/connected.
Set<Map.Entry<InetAddressAndPort, EndpointState>> peers = new HashSet<>(Gossiper.instance.getEndpointStates());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we reach this how do we know that Gossiper will be seeded with any endpoint states so we know to wait on a realistic portion of the cluster?

I assume it's implicit in the bootstrapping process before we get to this point?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is only called after we've joined the ring, done bootstrapping, started gossip, and so on.

@@ -254,6 +258,8 @@ public long getTimeout()
return DatabaseDescriptor.getRangeRpcTimeout();
}
},
PING(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go this early in the enum before USUED_* and company? Does it make sense to ever insert anything into the middle of the enum?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a similar comment on ticket, but this should be added at the end:


// remember to add new verbs at the end, since we serialize by ordinal
UNUSED_1,
UNUSED_2,
UNUSED_3,
UNUSED_4,
UNUSED_5,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion on ticket, but I'll clean up the enum and comments


public void blockForPeers()
{
// TODO make these yaml props?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, should these be yaml props?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I wanted to get more input before shovelling more stuff into the yaml :) If we want users to actually use this in a 'standard' way, then they probably should be in the yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is default on so no I don't think we need to add it to the YAML just clean up the TODO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we choose to not put these params (alivePercent and aliveTimeoutSecs) into the yaml, should they be documented and added to conf/jvm.options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going that route it seems like well then it should just be in the YAML. I would say jvm.options should be for things we don't control because they aren't directly part of Cassandra.

Another option is as a YAML option but undocumented if you want to not clutter the YAML, but then operators get annoyed and confused and ask why it's not a yaml option and not documented.

I never really know how to balance the tension of not putting out thousands of config options into a file with not having undocumented but important knobs. I think this one doesn't need to be in the documented category yet because setting it is really as an escape hatch not something people should have to worry about. This is just some feature that should silently work and people don't have to think about. Then later if we find out we are wrong we can add the option and document. This stems the tide of config options people need to think about which has value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sgtm. I will add the params to Config, but not add to the yaml.

{
// TODO make these yaml props?
int alivePercent = Integer.getInteger(Config.PROPERTY_PREFIX + "blockForPeers.percent", 70);
if (alivePercent < 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a matter of practice silently clamping values always seems wrong to me. They asked for something and you aren't going to give it to them therefore someone is in error and it shouldn't be silent so we can come back and supply the value we intended. I suppose it's how I interpret the principle of least surprise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fair, but I felt that giving some upper bound didn't seem unreasonable. Should we really delay starting the process by an extra order of magnitude just because an operator added an extra zero to the configuration? I understand your point about least surprise, but that might be beyond the spirit of this ticket. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I think it really sucks when you think you are getting the benefit from some config option and your not and you don't know or don't know why. Out of scope arguments aren't really applicable because this is the ticket adding the feature driven by these config options.

I am not sure what you mean by order of magnitude. The zero clamp means someone tried to enable this feature, but didn't and so their stuff is silently unreliable.

The 100 clamp is they meant to set a value < 100 but had an extra digit and didn't and it's silently being slower to start. I doubt they are going to care as much about that in production use cases.

I don't think you'll find many operators that complain about being informed via a clear error message their config is not sane. This is also an option that people have to request explicitly in order to get an invalid value to clamp in the first place so I really do think they want to know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify: you do not like the bounding of aliveTimeoutSecs, correct? alivePercent is just bounding the percentage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like silently clamping any of them. If you feel strongly about upper bounding aliveTimeoutSecs it's the closest to being OK because the DB still comes up and is reliable at up to 100 seconds and most of the time it won't wait that long since it will get to 70%. If someone sets a value higher than that we "know" it's not useful so clamping it doesn't hurt them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My issue isn't with having a bound it's with enforcing it silently. Some of the time you can pick a value for them, but not all of the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, starting thinking about this, and, yeah, I guess we can leave the upper bound unenforced and instead just log that the operator can choose to ignore

logger.info("choosing to block until {}% of peers are marked alive; max time to wait = {} seconds", targetAlivePercent, aliveTimeoutSecs);

// first, send out a ping message to open up the non-gossip connections
AtomicInteger connectedCount = sendPingMessages(peers);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also set up the large message connections? We can set up connections in parallel now with Netty right (sweet!). I think we should up all of them. Gossip, large, small.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, as well. I can force a message to go out on the large message connection, but the REQUEST_RESPONSE will come back on the small message connection. Unless, of course, I send some empty byte array that exceeds the OutboundMessagingPool#LARGE_MESSAGE_THRESHOLD, which is currently 64k. Admittedly, I'm reticent to do that. I could, however, create variant of the Ping/Pong messages (or modify those) to switch between either large or small message connection.

I guess the concern i had was that many apps might not need the large message connection, and thus it becomes unused, but consumed, resources. Every instance will need the gossip and small message connections, but not every use case calls for the large connections. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the connections consume resources if there are no queued messages? Are there any buffers allocated? The resources consumed should be minimal and even then if we can theoretically provision these resources then we will get more predictable performance and behavior if we provision them all up front instead of lazily where we find out oh we didn't have enough at some arbitrary later time.

Maybe we are trying to hack this process too much by having the transport system unaware of what we are attempting and if we made it possible to say "hey connect on all things and send this small message on each" it would still be simple. In fact do we even need to send a message or is just instructing the transport system to open them enough?

Copy link
Contributor

@jasobrown jasobrown Feb 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the connections consume resources if there are no queued messages?

Reviewing OutboundMessageConnection, no, they do not consume much in terms of resources. I'm fine either way creating the large message connections eagerly or lazily, so I'll add that in.

do we even need to send a message or is just instructing the transport system to open them enough?

Because connections are unidirectional, and as we'll want each "pair" of outbound/inbound connections for each connection type (gossip, small message, large message) to be established, the only way to do that is to send a 'real' message that the peer can respond to. That being said, perhaps I can work in something that lets a Ping/Pong message "select" which type of transport it wants to be sent on.

init all connection types;
allow messages to set ConnType in MsgOut;
add params to Config, not yaml
move blockFor logic out of MessagingService into it's own class;
added utests
/**
* Allows sender to explicitly state which connection type the message should be sent on.
*/
public final ConnectionType connectionType;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A part of me wants to say don't make MessageOut bigger, but it's probably a drop in the bucket in terms of allocation rate for processing network messages.

while (checkStatus(peers, connectedCount, startNanos, expirationNanos < System.nanoTime(), completedRounds) == State.CONTINUE)
{
completedRounds++;
Uninterruptibles.sleepUninterruptibly(1, TimeUnit.SECONDS);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to check the condition pretty aggressively so that startup in test harnesses is as fast as possible since we do it a lot. Like check every millisecond.

@jolynch
Copy link
Contributor

jolynch commented Mar 27, 2018

@aweisberg since this has been merged should we close this one out?

@aweisberg aweisberg closed this Apr 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants