New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ActivitySource started activities aren't using an accurate utcnow on framework. #45111
Comments
Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti Issue DetailsDescriptionActivity source started events on .net framework don't have accurate duration. This appears to be due to the use of DateTime.UtcNow: runtime/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs Line 1020 in 8861c41
In other code paths GetUtcNow() is used to get an accurate timestamp. runtime/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs Line 576 in 8861c41
This can be observed by durations being 00:00:00.0000001 on .net framework. Configuration.net framework 4.8 Regression?Not a regression, as activity source is new, but diagnostic source had accurate durations. Other informationI suspect the only workaround is to provide an accurate timestamp to the calls to StartActivity.
|
|
internal static DateTime GetUtcNow()
{
// .NET Core CLR gives accurate UtcNow
return DateTime.UtcNow;
} internal static DateTime GetUtcNow()
{
// DateTime.UtcNow accuracy on .NET Framework is ~16ms, this method
// uses combination of Stopwatch and DateTime to calculate accurate UtcNow.
var tmp = timeSync;
// Timer ticks need to be converted to DateTime ticks
long dateTimeTicksDiff = (long)((Stopwatch.GetTimestamp() - tmp.SyncStopwatchTicks) * 10000000L /
(double)Stopwatch.Frequency);
// DateTime.AddSeconds (or Milliseconds) rounds value to 1 ms, use AddTicks to prevent it
return tmp.SyncUtcNow.AddTicks(dateTimeTicksDiff);
} ... My first question is going to be "are we sure this is still the case"? The comment implies that netfx is still pulling from a lower-accuracy clock in Windows (since this used to be a common limitation for all OSs/languages). Is switching to the better clock something on the table for netfx, if it is? |
@cg110 although we maintain and support the Full Framework, but we are not actively doing any development there. Also, I don't think Activity timestamps is meant to be accurate to leas than 1 milliseconds. @Clockwork-Muse we know on Windows UtcNow is taking sometime to execute, this is because we have to ensure the time is syned with the system when having a leap seconds. Otherwise it'll be shifted. Closing it as a duplicate to #13091 |
Reference source suggests that fx is still using the less accurate https://docs.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getsystemtimeasfiletime There is https://docs.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getsystemtimepreciseasfiletime but it's unclear if it's good enough. DateTime.utcnow is documented as being inaccurate see https://docs.microsoft.com/en-us/dotnet/api/system.datetime.utcnow?view=net-5.0#remarks Some discussion here: #5883 And the fix to core: dotnet/coreclr#9736 |
@cg110 I don't think this will make much difference. I have pointed at what is the cause of that. Also, as I mentioned this is unlikely to be fixed in the full framework but we are looking at that in the .NET future releases. Thanks! |
I'm puzzled, I'd just like the DateTime.utcnow changed to GetUtcNow() to match the diagnosticsource behaviour. Is that not doable to give better accuracy? It's already used in the stop/dispose. Is that not going to be fixed because it's framework? Utcnow is inaccurate to 16ms so bad for timing things |
@cg110 in general it depends on where we can fix this. If we need to fix UtcNow, then the fix will not be in Full Framework. If the fix would be in System.Diagnostics.DiagnosticSource, then the fix will be spanned all frameworks and you'll get it as this is NuGet package which can be used in the Full Framework too. I'll reactivate this issue to track if anything can be done in System.Diagnostics.DiagnosticSource not depending on UtcNow. |
@cg110 - Generally speaking you shouldn't be using @tarekgh - Although the cost of the call was less than the resolution (100ns). And that's not the same thing as the ~16ms innacuraccy. |
@Clockwork-Muse make sense. By the way, this is not related to ActivitySource as the issue title suggest. This is part of Activity which existed for long time. I understand there is some changes from |
@tarekgh I believe the fix to update this line: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs#L1020 To use GetUtcNow() instead of DateTime.UtcNow, as someone did the legwork already for Activity class to have a "more accurate" utcnow on netfx in: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.DateTime.netfx.cs I can't use a more accurate timer on an Activity without adding custom stopwatch timestamps, Activity has made a decision to use DateTime for it's timings, which in general will be good enough, I rarely expect sub-ms cross calls, but sub 16ms is normal. Sorry if the title was confusing, it's only the code path for Activities started by ActivitySource that hit this problem. Diagnostic source activities don't. |
To clarify, |
@cg110 could you give it a try and measure if you see any difference? I am still concerned about the GetUtcNow though. it looks to me will be broken when the system introduce any leap seconds. |
Wall of unit test code (rather messy, but getting kids to bed here), two tests, both pass on .net core 3.1, on framework the first one fails at some point (hence 1,000 iterations to get it jitted and "fast"), the second (having used a delegate to access Activity.GetUtcNow()) doesn't. You're right it probably doesn't handle leapseconds (as few probably even think about leap seconds), but for getting a more realistic duration it does a better job. [Test]
public void CheckActivitySourceDurationAccuracy()
{
Activity.DefaultIdFormat = ActivityIdFormat.W3C;
using var _activitySource = new ActivitySource("ActivityTest");
using var _activityListener = new ActivityListener
{
ShouldListenTo = source => true,
Sample = (ref ActivityCreationOptions<ActivityContext> options) => ActivitySamplingResult.PropagationData,
SampleUsingParentId = (ref ActivityCreationOptions<string> options) => ActivitySamplingResult.PropagationData,
};
ActivitySource.AddActivityListener(_activityListener);
Activity parentActivity;
// need to loop to get code Jitted etc.
foreach (var i in Enumerable.Range(1, 1000))
{
using (parentActivity = _activitySource.StartActivity("UnitTest"))
{
}
Assert.AreNotEqual(new TimeSpan(1), parentActivity.Duration, "failed on Iteration {0}", i);
}
}
private delegate DateTime GetUtcNow();
private static GetUtcNow getUtcNow = createDelegate();
private static GetUtcNow createDelegate()
{
var activityGetUtcNow = typeof(Activity).GetMethod("GetUtcNow", BindingFlags.Static | BindingFlags.NonPublic);
return (GetUtcNow)Delegate.CreateDelegate(typeof(GetUtcNow), null, activityGetUtcNow);
}
private static DateTimeOffset GetActivityUtcNow()
{
// ActivitySource has a bug, and isn't using accurate start times, so use Activity.GetUtcNow():
var activitytimestamp = getUtcNow();
return activitytimestamp;
}
[Test]
public void CheckActivitySourceDurationAccuracyWithGetDateTime()
{
Activity.DefaultIdFormat = ActivityIdFormat.W3C;
using var _activitySource = new ActivitySource("ActivityTest");
using var _activityListener = new ActivityListener
{
ShouldListenTo = source => true,
Sample = (ref ActivityCreationOptions<ActivityContext> options) => ActivitySamplingResult.PropagationData,
SampleUsingParentId = (ref ActivityCreationOptions<string> options) => ActivitySamplingResult.PropagationData,
};
ActivitySource.AddActivityListener(_activityListener);
Activity parentActivity;
// need to loop to get code Jitted etc.
foreach (var i in Enumerable.Range(1, 1000))
{
using (parentActivity = _activitySource.StartActivity("UnitTest", ActivityKind.Internal, default(ActivityContext), null, null, GetActivityUtcNow()))
{
}
Assert.AreNotEqual(new TimeSpan(1), parentActivity.Duration, "failed on Iteration {0}", i);
}
} |
@cg110 thanks. I think we can do better and stay accurate with the system here. Will keep the issue to track looking at a better solution. |
Would it be possible to get a fix in a 5.0.x nuget for the DateTime to GetUtcNow change? Rather than have a full fix for leap seconds etc (as that is likely a much bigger/invasive change) While I can control our usages of ActivitySource and wrap the calls to replace startTime, I can't for 3rd party libraries. |
@cg110 how blocking this to you? usually we are very careful servicing any fix to a released product. More info will help here telling how this issue breaking you. Thanks! |
@tarekgh the problem is that it makes the usage far more complex, particularly as we were looking to increase usage of ActivitySource particularly for examining performance of calls, and where we spend the time. This now means we can't trust the duration/performance data. Would it help to raise a case via support? I can provider a wrapper/helper class that contains ActivitySource, and injects the starttime on the way through, but it's complex for places that could have quickly put in an activitySource and done a startActivity is a client call, as they have to either use the wrapper (so we have to make it available to everyone), or get the reflected time source. And it also won't help with 3rd parties who don't know of this problem. It shows up in the the code below (which is an extreme case) shows that start times for 10,000 calls are in a few buckets, and they did not start at the same time: code: using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
namespace testduration
{
class Program
{
static void Main(string[] args)
{
using var activitysource = new ActivitySource("Test");
using var listen = new ActivityListener
{
ShouldListenTo = source => true,
Sample = (ref ActivityCreationOptions<ActivityContext> options) => ActivitySamplingResult.PropagationData,
SampleUsingParentId = (ref ActivityCreationOptions<string> options) => ActivitySamplingResult.PropagationData,
};
ActivitySource.AddActivityListener(listen);
int x = 10000;
var results = new Activity[x];
foreach (var i in Enumerable.Range(0, x))
{
using var activity = activitysource.StartActivity("someCall", ActivityKind.Client);
results[i] = activity;
}
//foreach (var i in Enumerable.Range(0, x))
//{
// Console.WriteLine("Start {0:O}, duration {1:c}, theoretical end {2:O}", results[i].StartTimeUtc, results[i].Duration, results[i].StartTimeUtc + results[i].Duration);
//}
var groups = results.GroupBy(x => x.StartTimeUtc, x => x.Duration.TotalMilliseconds).Select(x => new {x.Key, Count= x.Count(), average = x.Average()});
foreach (var group in groups)
{
Console.WriteLine("DateTime {0:O}, count {1}", group.Key, group.Count);
}
Console.ReadLine();
}
}
} |
@cg110 Can't you just pass the start time when calling |
That's what I'm doing, by wrapping activitysource and setting start time by reflecting Activity.getutcnow, as it needs to be the same time source that endtime uses. But it's not an obvious thing to do. And no guarantee that 3rd party instrumentation will do it. And also adds complexity to every instrumentation. |
I am not expecting many libraries is using ActivitySource yet. most of the libraries is using I am trying to figure out why this would be an issue if we wait to fix it in 6.0 release? we can get it early in 6.0 too. |
Perhaps a different way around, why is a small 1 line bug fix not possible in a service release? I think I'll patch it up in the listener, and hope nothing else ever sets the starttime, as I suspect if I always set it to the reflected getutcnow it'd be good enough. (Or I could try and source a better time and set start and end time in the listener) |
Because there are a tones of 1 line fixes around for many issues. if we decide go to this route, the accumulated changes would be big and risky. This is why we have a servicing bar we measure against regardless if the fix is one line.
are you doing that in the listener to listen to all other activities created from other sources? or you care only about the activities created by your code? |
I'll probably update them all, although right now it's be just ones I've added, but that may change as more code adopts activitysource. Eg if we use opentelemetry instrumentation. My expectation is to do propagation for things that aren't ours (so parent and child isn't lost, as i thought of not listening but if some other library does listen then you can get gaps in parent->child) But allow the option to add sources to gather all data. |
I am checking if we can service the fix. I'll let you know what I'll get. |
Thanks for considering it. I do wonder if duration should actually be tied to stopwatch.gettimestamp() and measure actual ticks passed not rely on math with datetime. As even on core it can still be 16ms accuracy (and technically can change etc). Or perhaps a new property TimedDuration. I might look at sticking timestamps on activities as a custom property and see how that looks. |
[edited comment] just realized after writing this comment you meant to ask about your workaround and not the fix in the framework. |
Normally changes due to things like clock updates from NTP are rare, and not very "far" (within a second or less). If you're measuring durations in aggregate, such a change isn't likely to affect things enough to be noticeable. |
This is fixed by the PR #45176 and the fix will be released in the next servicing release 5.0.1. |
@tarekgh will there be a release of the nuget package? https://www.nuget.org/packages/System.Diagnostics.DiagnosticSource/ |
@chris-smith-zocdoc yes there will be updated NuGet package. @danmosemsft @ericstj 5.0.1 is released on Dec 8th and I am not seeing the package https://www.nuget.org/packages/System.Diagnostics.DiagnosticSource/ updated as we have some servicing fix there. When we expect this will get updated? |
Looks like the package wasn't added to the build for servicing. @Anipik can you do this for 5.0.2? |
yep i will add that |
I added the package, we can keep this open until we ship the package |
@Anipik When does 5.0.2 gets released to nuget? |
It will be shipped around second week of january |
The package is now released. https://www.nuget.org/packages/System.Diagnostics.DiagnosticSource/5.0.1 |
Description
Activity source started events on .net framework don't have accurate duration. This appears to be due to the use of DateTime.UtcNow:
runtime/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs
Line 1020 in 8861c41
In other code paths GetUtcNow() is used to get an accurate timestamp.
runtime/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs
Line 576 in 8861c41
This can be observed by durations being 00:00:00.0000001 on .net framework.
Configuration
.net framework 4.8
Windows 10 2004.
x64
Regression?
Not a regression, as activity source is new, but diagnostic source had accurate durations.
Other information
I suspect the only workaround is to provide an accurate timestamp to the calls to StartActivity.
The text was updated successfully, but these errors were encountered: