Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log Consistency Providers #1854

Merged
merged 15 commits into from
Jan 14, 2017

Conversation

sebastianburckhardt
Copy link
Contributor

An alternative provider API for persistent grains whose state is defined as a view of a log. This will be used for supporting event sourcing, and geo-replicated grains.

@sebastianburckhardt sebastianburckhardt added this to the Multi-Cluster milestone Jun 28, 2016
@sergeybykov sergeybykov modified the milestones: 1.4.0, Multi-Cluster Sep 23, 2016
sebastianburckhardt added a commit to sebastianburckhardt/orleans that referenced this pull request Sep 28, 2016
@@ -8,7 +9,7 @@ namespace Orleans.GrainDirectory
/// Strategy objects are used as keys to select the proper registrar.
/// </summary>
[Serializable]
internal abstract class MultiClusterRegistrationStrategy
public abstract class MultiClusterRegistrationStrategy
Copy link
Contributor

@xiazen xiazen Oct 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have a public interface IMultiClusterRegistrationStrategy so that we don't need to make this internal class public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked into this a bit and I think this should be fixed by making the IProtocolServices more abstract. I will work on this.

{
public LogViewProviderAttribute()
{
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you set up the ProviderName here ? otherwise I'm confused why you want to inherit from PersistenceProviderAttribute

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They inherit from the same base attribute because they are alternative ways to specify a persistence mechanism for the grain. For log view grains, users can either specify a storage provider or a log view provider. Those are different kinds of persistence providers, but they are somewhat interchangeable (a standard storage provider can be used for log view grains). See the code inside Catalog.SetupPersistenceProvider - it finds out which persistence attribute is specified, and retrieves the appropriate type of provider.

There is no default log view provider (because the default is to use the default storage provider), therefore I am not setting the string there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably be much beneficial to extend some of these explanations straight to the code for the benefit of coder readers in later times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense.

/// <typeparam name="TView">The type for the log view, i.e. state of this grain.</typeparam>
/// <typeparam name="TLogEntry">The type for log entries.</typeparam>
/// </summary>
public abstract class LogViewGrain<TView, TLogEntry> :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference or usage between LogViewGrain and LogViewGrainBase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are additional alternatives to the LogViewGrain class (QueuedGrain, JournaledGrain, not in this PR) that inherit from LogViewGrainBase but hide the log-view API underneath a terminology that is more specific to the intended situation.

Such classes can be defined by the user, too - it gives them a chance to control how exactly the storage API should look like. If they inherit from LogViewGrain, they cannot hide the logview api.

public Task Close()
{
return TaskDone.Done;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this method intend to do? based on its implementation, it is not related to its context CustomStorageProvider

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All providers have this method, it's in the IProvider interface - I think it's used to clean up (e.g. close database connections) when the silo shuts down. This provider does not have any cleanup it needs to do, so it just returns.

public Task Close()
{
return TaskDone.Done;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

#endregion
}

public struct AB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does A B stands for ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a struct containing two fields A and B. These fields don't really have any meaning, they are used by a unit test. The struct is there so a grain method can return both A and B at the same time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would nice to have in the comments this explanation. I bet plenty of people reading the code afterwards will have this same question. In other words, it would lighten the cognitive load to have it in writing there in the code. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@xiazen
Copy link
Contributor

xiazen commented Oct 10, 2016

submited a PR to your branch to clean code styles :). Was assigned by @jdom to do a first pass on code style cleaning


public Logger Log { get; private set; }

private static int counter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this count?

In general (maybe someone has an idea), I wonder if this would be problematic when running multi-silo cluster on one computer (say, a dev-box in reliable setup) or if it would work together with the DI implementation currently coming along too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used to construct a unique id for the provider. Matters only for tracing. Therefore, it is actually safe even if you have multiple silos on one computer in the same app domain (but I think a LOT of other stuff would break - static fields seem to be used all over the place).

/// Bits are toggled when writing, so that the retry logic can avoid appending an entry twice
/// when retrying a failed append.
/// </summary>
public string WriteVector { get; set; }
Copy link
Contributor

@veikkoeeva veikkoeeva Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth considering: BitVector32 or BitArray? Otherwise a note for later refactoring, it looks like this is a candinate to turn into class of its own (and have easily separable tests for this part).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically, this is not actually a bitvector, but a map from strings to a bit. I will change the comment to clarify.

In an earlier version we used Dictionary<string,bool> here. But I changed it to use a special representation as a simple string to reduce serialization/deserialization overhead, which is somewhat critical here. This is also why it is not a separate class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind sacrificing maintainability for performance when necessary, but I don't yet see how it's necessary here. Can you help me understand why serialization speed is critical enough to justify this pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this metadata gets added to grain state of every grain, and is read/written from/to storage every time the grain state is read or written. If the grain state is already a large complicated object, perhaps it does not matter so much. But for grains with simple small state, serializing and deserializing an entire dictionary or hashset every time is a lot of unnecessary overhead and increases memory pressure. Also, it means you can't easily look at the content of that field in storage.

Copy link
Contributor

@jason-bragg jason-bragg Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Thanks for the clarification.
If I understand you're goals, you want to avoid the cost in cpu and space of serializing a dictionary, and you want the contents to be in a human readable format when written to storage. Is this correct?
If so, I suggest two properties. One for programmatic use, one for storage. Something like:

[NonSerialized]
public HashSet<string> Flag { get;set;}

public string FlagsString
{
    get
    {
        return Join(Flags, ",");
    }
    set
    {
        Flags = new HashSet<string>(Split(value, ","));
    }
}

The code uses the Flags fields, but when serialized the FlagsString property is used, converting Flags to/from a user friendly string.

Copy link
Contributor

@jason-bragg jason-bragg Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An additional consideration for this type of string manipulation, in both existing PR and my suggestion, is the case where the delimiter is used in the string. If it is possible (which I suspect that it is) for the serviceId or the notification origin to contain the delimiter, the logic will break. If we are going to use a delimiter we need to add logic to address this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is worth constructing a separate HashSet - reading the original string directly is faster and does not allocate any memory. Note that this field is accessed only once per time it is serialized, so the speed of HashSet does not buy us anything.

If you are seriously concerned about code maintainability, I can encapsulate the operations in a static class and add a unit test.

As for commas: yes, I thought of that. ClusterIds must not contain commas. The code in GlobalConfiguration.cs enforces this when parsing configurations.

I noticed the documentation does not state this clearly, I will change that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. So origin is also a cluster id. KK. Still a whole (programmatic config), but it's very small. Probably safe to ignore.

As for hashset, the hash set was not for speed of access as much as ability to check containment. You'd mentioned using a dictionary, but if containment check is all that is needed, hashset is simpler. I don't understand the single access per serialize comment as it's accessed at least twice in the write and once per notification.

I'm not going to push on this too hard, but in general, yes, code maintainability is very important, and writing custom logic to encode multiple fields into a string is not preferable to using a container. Having said that, the complexity is mostly isolated behind the Toggle and Contains calls, so potential harm is limited.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify the perf argument... The two things that happen frequently: are
a) toggle bit then, store and send
b) receive in notification

And here is the corresponding work required for each implementation choice:

  • if using HashSet:
    a) HashSet.lookup, HashSet.update, convert to string
    b) construct HashSet from string
  • if using only string:
    a) indexof, update string
    b) nothing

I think the latter is quite a bit faster... string operations are close to the metal, and the strings are not large in this case. HashSet puts more pressure on GC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tangential, but https://youtu.be/sWgDk-o-6ZE?t=1178. In other words, strings likely are closer to metal than HashSets, but I wouldn't say they are close. Which is the reason I wondered about the layout and if it could be separated cleanly (and later maybe refactored by someone). In the linked snippet (guru) Sean explains how O(n*lg(n)) algorithm are likely more impated by layout than operations. Then at https://youtu.be/sWgDk-o-6ZE?t=1716 there's an example. The deeper point is that in larger piece of software keeping data in cache is important for performance performance (throughput and latency). From personal experience I believe this to be true, but maybe I just wanted to share this video as it's so good. :)

sendworkers = new Dictionary<string, NotificationWorker>();
this.maxnotificationbatchsize = maxnotificationbatchsize;

foreach (var x in configuration.Clusters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A matter of style this brace thing. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will add them.

}

/// <summary>
/// last observed exception, or null if last notification attempts were successful for all clusters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Rationale or pointers to somewhere feels like would be nice here. Especially for those who come later and if they need to know this when working on other storage backends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that we are saving the last exception here (for users to look at if they want) would need to be documented at the level of the grain API (QueuedGrain, LogViewGrain, JournaledGrain).

However, I am actually going to change this after discussion with @creyke: rather than saving the exception for inspection, I will add a virtual method that users can override to get notified when the exception happens. They can then decide to save it themselves if they want, or do anything else (custom grain health monitoring stuff, from example).

}


public enum NotificationQueueState : byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like would ask for justification to changing the storage type. Layout?

We had a longish time ago discussion in other PR where some evidence indicated changing from int to byte was worse for performance (maybe @jason-bragg remembers better the exact issue).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naively I am assuming byte is better because it uses less storage. If anyone has any evidence to the contrary, let me know.

I did not actually know about using byte for enums until @jdom asked me to add it to one of the enums in one of my other PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not recall any perf issues with using byte enums, though it would not surprise me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concern was at #400 (comment). Byte enums look like causing boxing whereas int enums won't. I might misread, so better read. :) It does affect the storage layout though, if that matters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will remove it. It is not important in any way.

Thanks for the link, it's good be reminded that C# sometimes does boxing even if there is no good reason for it.

need_initial_read = true;
// kick off notification for initial read cycle with a bit of delay
// so that we don't do this several times if user does strong sync
await Task.Delay(10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a matter of preference and especially now that this is an "arbitrary technical delay", but using explicit TimeSpan would look more robust.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

// kick off notification for initial read cycle with a bit of delay
// so that we don't do this several times if user does strong sync
await Task.Delay(10);
Services.Verbose2("Notify (initial read)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this indicate rather this is nameof(KickOffInitialRead) just about to start an initial read worker.Notify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

public void SetALocal(string grainclass, int i, int a)
{
var grainRef = GrainClient.GrainFactory.GetGrain<ISimpleLogViewGrain>(i, grainclass);
grainRef.SetALocal(a).Wait();
Copy link
Contributor

@veikkoeeva veikkoeeva Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.GetAwaiter().GetResult() would be better and set a good tone in the code, see at aspnet/Security#59. See also in other functions. Could these functions be asynchronous?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense.

The functions are synchronous because they are called from a different AppDomain (the class derives from MarshalByRefObject). Don't know if there is another way to do it.

}
// check local stability
for (int i = 0; i < numclusters; i++)
AssertEqual(333, Client[i].GetALocal(grainClass, x), grainidentity);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coud a const int Xyz = 333 introduced here to give a name to the constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, why not.

t.Add(Task.Run(() =>
{
while (Client[c].GetALocal(grainClass, x) != 1)
System.Threading.Thread.Sleep(100);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this too could be async?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clients are in a different AppDomain so they are called synchronously.

But I suppose the sleep can be replaced with Task.Delay. I will do that.

};

Func<int, Task> checker7 = (int variation) => Task.Run(() =>
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this too could be async and use await Task.Delay?

Copy link
Contributor

@veikkoeeva veikkoeeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idle commenting while kids eat evening snack.


Services.Verbose("write apparently failed {0}", nextglobalstate);

while (true) // be stubborn until we can read what is there
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest using AsyncExecutorWithRetries rather than writing custom backoff logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think AsyncExecutorWithRetries is general enough... there are nested loops here, the control flow is very subtle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored the retry logic - still not using AsyncExecutor, but the code is now more readable because the retry delay is tracked in a separate class (ConnectionIssue).

Task<Task> waitForTaskTask = null;
Task waitForTask = null;

lock (this)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid lock 'this', 'this' is public so you can't protect it from other locks. No good way to be confident of avoiding deadlocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I don't usually do this. But here, the lock is actually exposed on purpose because code that is enqueueing work uses it.

Still, I don't like it much myself. It's probably a better idea for code maintenance to use a protected member, so only subclasses get access to the lock. I will do that.

…specific and not always accurate (depending on the consistency provider)
Copy link
Contributor

@jason-bragg jason-bragg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on naming and organization, but in general, it looks good.
Still need to walk through some details a little more, so will do another pass, but wanted to get these comments out for consideration.

public interface ILogViewAdaptorFactory
{
/// <summary> Returns true if a storage provider is required for constructing adaptors. </summary>
bool UsesStorageProvider { get; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently true in all cases. Can you elaborate on when it wouldn't be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orleans.EventSourcing.CustomStorage.LogConsistencyProvider returns false, because it does not use a storage provider for interfacing with storage (rather it uses user-supplied read and update functions).

/// Functionality for use by log view adaptors that use custom consistency or replication protocols.
/// Abstracts communication between replicas of the log-consistent grain in different clusters.
/// </summary>
public interface IProtocolServices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this interface is scoped by the namespace, but I suspect the naming is a bit too generic. Maybe IMultiClusterConsistencyService? IConsistencyProtocol? ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. I will go with ILogConsistencyProtocolServices, since it makes it very clear that this is related to the LogConsistencyProvider and the LogConsistentGrain classes.

/// <summary>
/// Grain interface for grains that participate in multi-cluster-protocols.
/// </summary>
public interface IProtocolParticipant : IGrain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As IProtocalService, name is a bit generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.


namespace Orleans.EventSourcing.Common
{
public static class StringEncodedWriteVector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's quite possible that someone who implements a LogConsistencyProvider in a different dll might want to reuse some of the classes inside OrleansEventSourcing/Common, including this one.

… behaves deterministically instead of starting after timed delay
… remote instances even though there is just a single global instance

public async Task<IProtocolMessage> SendMessage(IProtocolMessage payload, string clusterId)
{
var silo = Silo.CurrentSilo;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the silo be passed through the constructor instead? Similarly for InsideRuntimeClient.Current below. If not, that's fine, I'll submit a PR for it in the coming weeks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started refactoring, but this quickly caused cascading changes that are likely redundant with what you are doing in the other PR, so I will leave it the way it is for now.

/// <param name="mcConfig">The multi-cluster configuration</param>
/// <param name="myClusterId">The cluster id of this cluster</param>
/// <returns></returns>
IEnumerable<string> GetRemoteInstances(MultiClusterConfiguration mcConfig, string myClusterId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any usages of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intellisense glitch? At first I did not see any either, but now it reports 6 references. For example, in OrleansEventSourcing\Common\PrimaryBasedLogViewAdaptor.

///<para>Waits until all previously submitted entries appear in the confirmed prefix of the log, and forces a refresh of the confirmed prefix.</para>
/// </summary>
/// <returns></returns>
Task SynchronizeNowAsync();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Now' is superfluous here, right?

Given TryAppend isn't suffixed with Async, should this and the above have the same treatment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is somewhat superfluous ...I included "Now" to clarify that the only reason to call this function is if we want to wait for synchronization NOW - normally, we don't call it and it still happens in the background.

I was worried that someone who is not reading the documentation may see this method in Intellisense and think they have to call it after every change (as they used to with WriteAsync). I think I will add some more description to the XML comment.

BTW, I suppose that according to the concluding remark in #2324 by @sergeybykov, I should remove all the async suffixes from all new method names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me - I'm fine with SynchonizeNow and removing Async suffix from the methods.



/// <summary>
/// The id of this cluster. Returns "I" if no multi-cluster network is present.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not null? I would expect this value to equal Silo.ClusterId, which returns null when there is no multi-cluster configured. Similarly, I would expect ActiveClusters to return null or an empty collection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more convenient and consistent for the log-consistency providers if they don't each have to do a null check every time they look at this, and then handle it in some way, but instead all use the same fixed "default" multi-cluster configuration - and this default configuration works exactly as if running on a one-cluster multicluster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries 👍 do you think Silo.ClusterId could be made to return this same default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would make sense. But I guess we would need some other way to check whether there is a MultiClusterOracle present or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the code, Silo.ClusterId now always returns a non-null string (the pseudo-cluster-id if there is no multi-cluster network). To check whether there is a multi-cluster network, there is now a boolean Silo.HasMultiClusterNetwork.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need both ClusterId and DeploymentId as different values? Maybe we should consolidate? DeploymentId is really just a logical name for the cluster, it's just that it typically maps to the cloud service's deployment id when deploying to it, but it's an awful name when used in a different environment, as it doesn't map to its logical intended purpose

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need both ClusterId and DeploymentId as different values? Maybe we should consolidate? DeploymentId is really just a logical name for the cluster, it's just that it typically maps to the cloud service's deployment id when deploying to it, but it's an awful name when used in a different environment, as it doesn't map to its logical intended purpose

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think DeploymentId has an Azure-specific meaning. So, I don't think they are the same. ClusterId is a logical name for the cluster (e.g., "uswest", "europe", "front1", "front2"). DeploymentId is something like "a74ff75b9bd6424ebbff3bde9643074b" and changes everytime I redeploy the service.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly why I say it's an awful name, as it implies the wrong meaning. DeploymentId is the cluster name. It's just that when you deploy to azure cloud services (not even other flavors of azure), it picks its DeploymentId as the cluster name by default (and only if unset). Nevertheless when deploying in a different way (on premises, other clouds, other azure services, yams, etc) you must set the DeploymentId to a logical name explicitly, because it is in fact the cluster id.

{

// we use fake in-memory state as the storage
MyGrainState state;
Copy link
Member

@ReubenBond ReubenBond Dec 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the norm in this codebase is to use explicit access modifiers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,

/// The protocol gateway is a relay that forwards incoming protocol messages from other clusters
/// to the appropriate grain in this cluster.
/// </summary>
internal interface IProtocolGateway : ISystemTarget
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if restricting this to a SystemTarget puts a bound on throughput.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it probably does limit throughput, at least a little. I don't think it's too bad right now, it did not show up in my measurements as a bottleneck, at least for the configurations I tested. Though it may be worth to optimize this in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and it's not a blocker.



[ThreadStatic]
static Random random;
Copy link
Member

@ReubenBond ReubenBond Dec 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the static SafeRandom class be sufficient instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we already have two SafeRandom classes in our codebase, one static and one not. Unfortunately I don't have access to either one. But the code is quite simple even without using that class.

/// <summary>
/// Queue state
/// </summary>
public NotificationQueueState QueueState = NotificationQueueState.Empty;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is marked as public, but I'm not convinced it should be - at least not for writes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. I'll make it private.

bool first = true;
foreach (var e in logEntries)
{
SubmitInternal(time, e, pos++, first ? promise : null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the promise be specified on the first or the last entry? I would have assumed the last, so that the caller is notified on successful submission of the entire range of entries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not matter since all entries are submitted at the same time (as a batch).

Conflicts:
	src/Orleans/Logging/ErrorCodes.cs
	src/Orleans/Orleans.csproj
	src/OrleansRuntime/MultiClusterNetwork/MultiClusterOracle.cs
	src/OrleansRuntime/Silo/Silo.cs
	test/Tester/Tester.csproj
	test/TesterInternal/TesterInternal.csproj
/// <summary> Output the specified message at <c>Verbose2</c> log level. </summary>
void Verbose2(string format, params object[] args);
/// <summary> Output the specified message at <c>Verbose3</c> log level. </summary>
void Verbose3(string format, params object[] args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we putting logging specific facades here? It will make it even harder than now to move away from our own logging and it is very likely that when we move to a logging abstraction, Verbose2, 3 and customer log levels will vanish and here it would mean breaking an API contract, which makes the move really hard. We should have a better way getting around this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point here was to provide common log formatting for all log consistency providers.

But I agree that listing all these methods in the interface is not clean (from a software maintenance perspective).

I will instead just expose the Logger object, and use a static class to provide the common log formatting. This class would not need to be part of the interface and can be hidden away together with the provider implementations in OrleansEvent Sourcing.dll

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this did not really work... ErrorCodes are internal to runtime, so I can't format errors from outside the runtime. I don't want to create a dependency from OrleansEventSourcing.dll on OrleansRuntime.dll. And the error reporting should use the same formatting as the logging. So, for now I will keep this as it is mostly, and only remove the verbosity overloads.

/// </summary>
/// <returns></returns>
string MyClusterId { get; }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like comments got lost here because of file renaming.
@jdom

Do we really need both ClusterId and DeploymentId as different values? Maybe we should consolidate? DeploymentId is really just a logical name for the cluster, it's just that it typically maps to the cloud service's deployment id when deploying to it, but it's an awful name when used in a different environment, as it doesn't map to its logical intended purpose

@sebastianburckhardt

I think DeploymentId has an Azure-specific meaning. So, I don't think they are the same. ClusterId is a logical name for the cluster (e.g., "uswest", "europe", "front1", "front2"). DeploymentId is something like "a74ff75b9bd6424ebbff3bde9643074b" and changes everytime I redeploy the service.

@dom

That's exactly why I say it's an awful name, as it implies the wrong meaning. DeploymentId is the cluster name. It's just that when you deploy to azure cloud services (not even other flavors of azure), it picks its DeploymentId as the cluster name by default (and only if unset). Nevertheless when deploying in a different way (on premises, other clouds, other azure services, yams, etc) you must set the DeploymentId to a logical name explicitly, because it is in fact the cluster id.

Yes, ClusterId and DeploymentId are really synonyms. Although in the hindsight, ClusterId is probably a better name. But it's too late to change it in our existing APIs. However, if here MyClusterId is contained within the log consistency provider scope, and within the geo-distribution context ClusterId makes more sense in conjunction with the notion of multi-cluster, I don't see a problem using MyClusterId within this context. Unless I missed some other implications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not strongly attached to any particular naming conventions. Perhaps we can use DeploymentId everywhere instead of ClusterId. Some questions though:

  • Is DeploymentId always guaranteed to have non-empty, non-null value? This is something I currently rely on for ClusterId.

  • For my multi-cluster Azure services samples, things get a bit confusing because the Orleans DeploymentId (e.g. uswest, europewest), as configured, would now be something different from the Azure Services Deployment Id (e.g. "a74ff75b9bd6424ebbff3bde9643074b", as in the management portal).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeploymentId is indeed always non-null. I completely agree that ClusterId is a better name. I did not mean that we should consolidate the two in this PR, we should discuss in its own issue.
Perhaps since ClusterId is a better name, we can get rid of DeploymentId in the much bigger breaking change release (2.0), since config will probably be changed too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is DeploymentId always guaranteed to have non-empty, non-null value?

Yes. It is required for cluster membership to work.

For my multi-cluster Azure services samples, things get a bit confusing because the Orleans DeploymentId (e.g. uswest, europewest), as configured, would now be something different from the Azure Services Deployment Id (e.g. "a74ff75b9bd6424ebbff3bde9643074b", as in the management portal).

Here I'm confused. In Azure PaaS DeploymentId is always a GUID, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On MSDN:
public static string DeploymentId { get; }

"Deployment IDs are opaque strings and should not be parsed or otherwise inspected because their format might change."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, we use DeploymentId in AzureUtils by default, and that simplifies connecting silos and clients. We don't try to parse it. And app code can always override DeploymentId programmatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, the only think I will change is that I use the DeploymentId as a default ClusterId if ClusterId is not specified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, the only think I will change is that I use the DeploymentId as a default ClusterId if ClusterId is not specified.

Why not take it always? Why would somebody specify ClusterId separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I am just following @jdom's advice:

we should discuss in its own issue.

The question of how/when to consolidate ClusterId and DeploymentId should not block this PR. We already have both of them in master, ClusterId was already introduced with #1109.

@sebastianburckhardt
Copy link
Contributor Author

I think I have addressed all comments at this point.

@sergeybykov
Copy link
Contributor

I kicked off functional tests and a load test.

If you have concerns about merging this after tests pass, please raise them soon.

@sergeybykov sergeybykov merged commit 0f4ba04 into dotnet:master Jan 14, 2017
@galvesribeiro
Copy link
Member

🎆 weeee! :shipit:

Thanks @sebastianburckhardt ! 😃

@sebastianburckhardt sebastianburckhardt deleted the geo-logviewproviders branch January 17, 2017 18:23
@github-actions github-actions bot locked and limited conversation to collaborators Dec 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet