-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service Fabric cluster membership providers #2542
Service Fabric cluster membership providers #2542
Conversation
8abd85d
to
2b33fa2
Compare
52d5f47
to
10d5140
Compare
f8e8f5d
to
1d32075
Compare
Note that the first commit (56cf4e2) has a separate PR opened for it: #2557 at the request of @galvesribeiro, since the intention is to use The commit which this PR adds has only one change in non-ServiceFabric projects, just to specify the base for SF error codes. |
1d32075
to
3f61812
Compare
3f61812
to
ae3fe81
Compare
Similar to what I asked about Docker in #2569 (comment) - Is there an implied equivalency here of "service instance/partition is up" and "silo is running"? Can't we get into a situation that a silo is unresponsive, e.g. it's messaging queues are or threads are blocked, but SF will consider the instance/partition healthy and keep it in the list of silos on the cluster? |
/// </summary> | ||
/// <param name="configuration">The Orleans cluster configuration.</param> | ||
/// <returns>A <see cref="ServiceInstanceListener"/> which manages an Orleans silo.</returns> | ||
public static ServiceReplicaListener CreateStateful(ClusterConfiguration configuration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CreateStateful [](start = 45, length = 14)
This doesn't appear to be used anywhere. Did you intend to add another sample for it? StatefulCalculator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do intend to add such a sample, but the intention was to add one when we have deeper integration for stateful services. The code for stateless and stateful services is identical except for the type returned - so I'm comfortable including both without having samples for both.
If you rather, I can include a 'stateful' calculator service now, too - but the stateful part would be a misnomer since the Orleans part couldn't reasonably store state within SF until we specifically support that.
/// <param name="silos">The updated set of partitions.</param> | ||
public void OnUpdate(FabricSiloInfo[] silos) | ||
{ | ||
this.gateways = silos.Select(silo => silo.GatewayAddress.ToGatewayUri()).ToList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
silos.Select(silo => silo.GatewayAddress.ToGatewayUri()).ToList(); [](start = 28, length = 66)
Doesn't this assume that every silo is a gateway? Or does it rely on cluster config being the same for all nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it does, thanks for pointing that out. I've replaced this line with this:
var newGateways = new List<Uri>(silos.Length);
foreach (var silo in silos)
{
if (!silo.IsGateway) continue;
newGateways.Add(silo.GatewayAddress.ToGatewayUri());
}
this.gateways = newGateways;
@sergeybykov you're right. I'm considering the best approach for creating a more global view of an individual silo's health. The system used by the default oracle is quite good, but it's not easily transferable. The way that pings work today is How about if I divorce the ping functionality from the inbuilt The ping method is never actually hit, since we short-circuit it in the message receiver. (Search code for I'll then create an Finally, I'll write a |
Before we rush to a solution, I think we need to step back and try to think through this challenge holistically, as it applies to any 'outsourced' membership implementation, e.g. Docker. I'm curious to hear @gabikliot's opinion on this. Today, in the membership oracle those two concerns are intertwined, and for a (seemingly) good reason - we are able to detect network partitions and asymmetric disconnects in a cluster. Distribution of the remote monitoring (pinging) responsibility is integrated with updates to cluster state, so that we always have enough monitoring coverage. That part is conceptually easy to decouple from membership management. Although that would mean that we can't fully outsource failure detection, and would instead have to morph it into health monitoring. But still keep it fully functional as failure detection for the default case. Maybe we don't have a choice, but it feels clumsy to have to run an 'overlay' failure detection protocol on top of another cluster membership protocol like SF or Docker. The problem I think stems from the fact that we cannot make a decision of whether or not a node is healthy locally. Nor can we have a simple monitoring service that would ping every node. We have to have a sufficient mesh of cross-pinging to ensure we detect network partitions and partial connectivity cases because we require 100% bidirectional p2p traffic. I don't know if I'm overcomplicating this, but these are my initial thoughts. We could, of course, do nothing, and just document that on those cluster management systems certain classes of failures wouldn't be detected. But that doesn't strike me as a good long term strategy. |
/// <summary> | ||
/// Provides information about an Orleans cluster hosted on Service Fabric. | ||
/// </summary> | ||
internal class FabricServiceSiloResolver : IFabricServiceSiloResolver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FabricServiceSiloResolver [](start = 19, length = 25)
Nit: this name confused me initially. But once I figured out what it is responsible for, I couldn't come up with a better name. Maybe FabricServiceMonitor or FabricClusterMonitor? But I'm not convinced they are better.
Good points, I suppose these are the options:
I like the last one for the long-term, but it does come with a complication: the current implementation stores health info in the the membership table via EDIT: by the way, I see the |
Is it possible to provide health events for nodes other than yourself? |
|
Ok, about pings, health, etc.. Here is what I think:
That way you clearly distinguish distributed detection and networking failures from the in process health. Silo locally is ALWAYS better in determining its own health than someone remotely determining its health. Its decision will always be more accurate since it does not need to deal with delays in those decision, lost msgs, ... And all the distributed decision is about reachability, network and agreeing on the membership and is fully outsourced to the MBR implementation. I think having both SF MBR AND our pings is not a good idea, for multiple reasons. Does that make sense? Its basically the architecture that I saw multiple times in the past: sort of a hierarchical decision. |
@ReubenBond , the "SiloControl system target has a Ping() method already, so that could be used for pings." was not meant for MBR. It was meant for manual/admin like tools (Orleans manager and the like) to try to check if silo is alive. But not for an automatic MBR. |
Thanks for the input, @gabikliot.
The true test of health is whether or not it functions correctly when accessed remotely, so that's what needs to be tested. Almost every serious production system will have off-box 'synthetic transactions' ranging from a simple ping to a test of more complex functionality. There are too many issues that can't be detected locally and we can't trust So having some way to detect silos which are functional or not is beneficial. SF won't check that our service functions correctly, just that it's running and the node is accessible. My point is not to separate pings out, it's about making Orleans' fault detector available to components other than the membership provider. Maybe I'm misunderstanding what you're saying. |
This is a controversial statement to me. For example, in an network partition case a node can be unaware it's unreachable to others. Yes, cluster management systems are supposed to detect a complete loss of network connectivity. My concern though is that such a detection does not work up the stack - when failures happen at the messaging layer. We had a couple of bugs in the past when we ran out buffers, and hence were failing to send or receive messages. A cluster management system wouldn't detect that. Local health check? Maybe, maybe not. That's why, like @ReubenBond already mentioned, serious services set up external canary test solutions that periodically perform end-to-end transactions to determine whether the system is up from the external users perspective, even if it appears 100% healthy from inside. I think it is possible to refactor the built in p2p monitoring system in such a way that it continues to feed death votes into the existing membership oracle as is, but could also provide such votes to an external cluster management system. My concern here is the risk of such a refactoring. It seems like it can be treated as a P2 for the SF case because at least it performs p2p node failure detection. I'm less sure about Docker. If it merely relies on process exit to detect a container failure, then we have a problem. |
We can expose a health check API that the membership implementation will invoke, in addition to doing its own membership. That is fine. If we use someone else's membership, let them do all pings and if they support it, let them call our health check. That way they fully decide when to declare silo dead and reach global agreement on that. In our our current MBR Oracle we can add in addition to pings also those health checks and then we, as part of our MBR Oracle will decide how we combine the pings plus health checks into one coherent decision. I gave one option above (silo throws from ping responder), but that is just an example. What is important is that ONE protocol makes the decision, potentially based on multiple sources of data (ping + health check).
|
I should have been clearer yesterday, @gabikliot - I was almost asleep when I realized. What I'm advocating is not 2 systems making decisions on cluster membership, but more like this:
So in the most degenerate case, the entire cluster is partitioned in approx half at the application level (eg, firewall config on half the nodes is messed up) and the smaller partition is killed by the other every 5 minutes. Does it sound sound? |
Does it mean that in this case we don't use SF for health pinging, but just for rendezvous (find the total set of potential nodes)? In such a case we won't use SF livenes really, which I thought is a goal here. Similarly, we could use azure standard API to find total set. But we don't do that too. We could. Both ways could work, but one needs to decide why this way, and make sure overall it works correctly. Depends on your goals. Lots of choices. Maybe you don't want to use SF livenes at all, if that doesn't bring you value. |
SF's health pinging is its own business, and it covers probably 95+% of faults, but it can never tell us if our system is actually live. I'm saying:
i.e, we don't have two systems deciding on membership - just SF. We do, however, have our own system for making decisions about silo health and we act on those decisions in a way which happens to affect membership. If I'm still not making myself clear, then perhaps this is best discussed via a call |
In this proposal, are we sending remote health checks (health pings) from one silo to another? |
@gabikliot yes, the silos would monitor each other. We could alternatively have an entirely different SF service which performed the job of monitoring, but bundling them makes deployment easier on end-users. |
Ok, so that's what I am recommending is NOT to do: not to do 2 distributed monitoring: SF doing its own and us doing ours. I recommend using only one. I understand your proposal of laying them on top. It CAN work, the question is if this is a good/best design. My intuition is not. Notice, we could have done similar two level membership with regular azure, taking total membership from azure and doing pings within it. It would have worked. But we decided it adds an extra complications without benefit. So what you are suggesting CAN work, but the question if that the best way to build it. You can try, think through all the edges cases, write a pseudo protocol, and if u convince yourself it can work, than ... no problem, go ahead. |
These 2 distributed monitoring systems cover different things:
SF cannot ensure our liveness and we cannot disable SF's internal monitoring (not that we would want to). Again, practically every real distributed system has application-level monitoring, including most systems built on Service Fabric. We did with our services within Azure Active Directory team at MS - we had synthetic transactions which would exercise various functionality and fire alerts. Another system would monitor those alerts and perform automatic recovery actions (eg, rebooting the host). Edit: Sergey opened a proposal in #2580 |
OK, you can build it this way as well. As I wrote, I would recommend having 1 system to do both: either us doing liveness + health like now, OR SF doing liveness and us reporting health to SF (but NOT doing our own distributed health monitoring). |
Congrats @ReubenBond and @sergeybykov ! |
@gabikliot Using some of the code you wrote back in 2012. :-) |
I am glad it was useful, after all. It's always nice to know your work was not a throw away, even if it took 4 years to come around :-) Hopefully users will find it useful. |
Definitively it will, @gabikliot :) |
Service fabric allows you to report health about other services to their API. It's not 100% clear what path you guys decided on but if one silo detects failure in another you can report that failure to service fabric via it's health reporting APIs. They give an exmaple about a watchdog here. |
I'm aware, but currently we're not performing application-level health checks when Service Fabric membership is being used. We created an issue to track that separately: #2580 |
This is a step towards Deep Service Fabric Integration (#1059)
This adds support for using Service Fabric itself for cluster membership.
Unlike existing membership providers (Azure Tables, Consul, etc), this implements
IMembershipOracle
, notIMembershipTable
. There is also an implementation ofIMembershipTable
which you can see in this PR (inServiceFabricPropertyManagerMembershipProvider
, which will be deleted).I've tested that this recovers from faults in manual testing by killing random hosts and removing and redeploying the service. There are also some automated tests to verify the behavior of the
FabricMembershipOracle