Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heterogenous silos support #2443

Merged
merged 21 commits into from Dec 7, 2016

Conversation

benjaminpetit
Copy link
Member

This is a PR for #2379

Currently, the GrainInterfaceMap is built on each silo, and is sent to clients when they connect to a gateway.

Proposed changes

I tried to divide the PR in logical commits so it is easier to review:

  • To test this functionality easily, I added a list of grain types that are ignored by the AssemblyLoader. (bbc118d and f793f2f).
  • Add a notion of “cluster wide” GrainInterfaceMap, that is a merge of all GrainInterfaceMaps built locally on each silo. (568cf00, 4bdc77a and 3aab754)
  • Expose this new global GrainInterfaceMap to the clients (3aab754)
  • Rebuild the global GrainInterfaceMap when new silos join the cluster (a259b31 and adf6ea1)
  • Modify PlacementDirector to only consider “compatible”silos. (83057de and 77e80d8)

How and when the global GrainInterfaceMap is recalculated

To avoid race conditions and complicated error handling, we choose to refresh the GrainInterfaceMap using a timer: every timer tick, if a silo joined the cluster since the last computation, we will get the local GrainInterfaceMap from the new silo and merge it into the global map. If this fails, it will be retried on the next tick. This process is done on each silo in the cluster.
The drawback of this approach is that a silo that joined the cluster will not be chosen for placement by other silos until this timer fire. I don’t think this is an issue, and the timer interval can be tweaked in configuration.

Current implementation limitations

  • When a new GrainInterfaceMap is computed, it is not pushed to clients. Client will need to disconnect/reconnect to get a refreshed type map.
  • The “excluded grain types” should be use for test only

@sergeybykov sergeybykov added this to the 1.4.0 milestone Nov 23, 2016
/// <summary>
/// The number of seconds to refresh the cluster grain interface map
/// </summary>
public TimeSpan TypeMapRefreshTimeout { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment implies that this is a period (amount of time between refreshes), but the property name implies this is the amount of time before an exception will be thrown (max amount of time for a single refresh before failure).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I renamed to TypeMapRefreshInterval


private readonly PlacementStrategy defaultPlacementStrategy;
internal IDictionary<SiloAddress, GrainInterfaceMap> GrainInterfaceMapsBySilo
Copy link
Member

@ReubenBond ReubenBond Dec 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could instead be IReadOnlyDictionary<,> by returning the dictionary itself and avoiding the copy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right, completely missed that

public IList<SiloAddress> GetCompatibleSiloList(GrainId grain)
{
var typeCode = grain.GetTypeCode();
var compatibleSilos = GrainTypeManager.GetSupportedSilos(typeCode).Intersect(AllActiveSilos).ToList();
Copy link
Member

@ReubenBond ReubenBond Dec 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method has to be cheap, since it's on the warm/hot path. Currently, it is too allocation-happy for comfort.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is really more a warm path than a hot path (this code is executed for new placements only). I think that yes, we could optimize this method, with some caching mechanism. But it is not that trivial to implement (a bug here and we could miss forever a new silos that joined, or try to place activations on a silos dead for a long time, ect.).

So yes, it can be optimize, but it will add complexity. I am not sure it is worth doing it, especially in this PR.

I can add a comment in this method and open an issue to track this if you want?

Copy link
Member

@ReubenBond ReubenBond Dec 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense. We can consider looking into it afterwards, then

@@ -13,18 +15,26 @@ namespace Orleans.Runtime
internal class GrainTypeManager
{
private IDictionary<string, GrainTypeData> grainTypes;
private IReadOnlyDictionary<SiloAddress, GrainInterfaceMap> grainInterfaceMapsBySilo;
Copy link
Member

@ReubenBond ReubenBond Dec 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant in the other comment was to use Dictionary<,> here, since it's cheaper to enumerate:

    Method |       Mean |    StdDev |  Gen 0 | Allocated |
---------- |----------- |---------- |------- |---------- |
  Concrete | 27.2424 ns | 0.8356 ns |      - |       0 B |
 Interface | 51.0121 ns | 1.1681 ns | 0.0125 |      56 B |

EDIT: I don't think I ever sent that other comment - whoops

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

@@ -70,6 +70,16 @@ public static string GetGrainTypeName(this IPlacementContext @this, int typeCode
return grainClass;
}

public static string GetGrainTypeNameAndSupportedSilos(this IPlacementContext @this, GrainId grainId, out IList<SiloAddress> supportedSiloAddresses, string genericArguments = null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method used? I couldn't see any usages

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relics of a previous rebase, removed

@sergeybykov
Copy link
Contributor

@ReubenBond, please 'Squash and merge' when you all your feedback is taken care of.

@@ -173,6 +209,14 @@ internal bool ContainsGrainInterface(int interfaceId)
}
}

internal bool ContainsGrainImplementation(int typeCode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No usages of this?

@galvesribeiro
Copy link
Member

Awesome @benjaminpetit great work! 👍

@ReubenBond ReubenBond merged commit 3867d5a into dotnet:master Dec 7, 2016
@benjaminpetit benjaminpetit deleted the heterogeneous-rebase branch March 15, 2018 16:53
@github-actions github-actions bot locked and limited conversation to collaborators Dec 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants