Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ActivitySource started activities aren't using an accurate utcnow on framework. #45111

Closed
cg110 opened this issue Nov 23, 2020 · 39 comments
Closed
Labels
area-System.Diagnostics.Activity enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@cg110
Copy link

cg110 commented Nov 23, 2020

Description

Activity source started events on .net framework don't have accurate duration. This appears to be due to the use of DateTime.UtcNow:

activity.StartTimeUtc = startTime == default ? DateTime.UtcNow : startTime.UtcDateTime;

In other code paths GetUtcNow() is used to get an accurate timestamp.

This can be observed by durations being 00:00:00.0000001 on .net framework.

Configuration

.net framework 4.8
Windows 10 2004.
x64

Regression?

Not a regression, as activity source is new, but diagnostic source had accurate durations.

Other information

I suspect the only workaround is to provide an accurate timestamp to the calls to StartActivity.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Diagnostics.Tracing untriaged New issue has not been triaged by the area owner labels Nov 23, 2020
@ghost
Copy link

ghost commented Nov 23, 2020

Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Activity source started events on .net framework don't have accurate duration. This appears to be due to the use of DateTime.UtcNow:

activity.StartTimeUtc = startTime == default ? DateTime.UtcNow : startTime.UtcDateTime;

In other code paths GetUtcNow() is used to get an accurate timestamp.

This can be observed by durations being 00:00:00.0000001 on .net framework.

Configuration

.net framework 4.8
Windows 10 2004.
x64

Regression?

Not a regression, as activity source is new, but diagnostic source had accurate durations.

Other information

I suspect the only workaround is to provide an accurate timestamp to the calls to StartActivity.

Author: cg110
Assignees: -
Labels:

area-System.Diagnostics.Tracing, untriaged

Milestone: -

@cijothomas
Copy link
Contributor

@Clockwork-Muse
Copy link
Contributor

On corefx:

internal static DateTime GetUtcNow()
{
    // .NET Core CLR gives accurate UtcNow
    return DateTime.UtcNow;
}

On netfx:

internal static DateTime GetUtcNow()
{
    // DateTime.UtcNow accuracy on .NET Framework is ~16ms, this method
    // uses combination of Stopwatch and DateTime to calculate accurate UtcNow.

    var tmp = timeSync;

    // Timer ticks need to be converted to DateTime ticks
    long dateTimeTicksDiff = (long)((Stopwatch.GetTimestamp() - tmp.SyncStopwatchTicks) * 10000000L /
                                    (double)Stopwatch.Frequency);

    // DateTime.AddSeconds (or Milliseconds) rounds value to 1 ms, use AddTicks to prevent it
    return tmp.SyncUtcNow.AddTicks(dateTimeTicksDiff);
}

... My first question is going to be "are we sure this is still the case"? The comment implies that netfx is still pulling from a lower-accuracy clock in Windows (since this used to be a common limitation for all OSs/languages). Is switching to the better clock something on the table for netfx, if it is?

@tarekgh tarekgh added question Answer questions and provide assistance, not an issue with source code or documentation. and removed untriaged New issue has not been triaged by the area owner labels Nov 23, 2020
@tarekgh tarekgh added this to the Future milestone Nov 23, 2020
@tarekgh
Copy link
Member

tarekgh commented Nov 23, 2020

@cg110 although we maintain and support the Full Framework, but we are not actively doing any development there. Also, I don't think Activity timestamps is meant to be accurate to leas than 1 milliseconds.

@Clockwork-Muse we know on Windows UtcNow is taking sometime to execute, this is because we have to ensure the time is syned with the system when having a leap seconds. Otherwise it'll be shifted.

Closing it as a duplicate to #13091

@tarekgh tarekgh closed this as completed Nov 23, 2020
@cg110
Copy link
Author

cg110 commented Nov 23, 2020

Reference source suggests that fx is still using the less accurate https://docs.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getsystemtimeasfiletime

There is https://docs.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getsystemtimepreciseasfiletime but it's unclear if it's good enough.

DateTime.utcnow is documented as being inaccurate see https://docs.microsoft.com/en-us/dotnet/api/system.datetime.utcnow?view=net-5.0#remarks

Some discussion here: #5883

And the fix to core: dotnet/coreclr#9736

@tarekgh
Copy link
Member

tarekgh commented Nov 23, 2020

@cg110 I don't think this will make much difference. I have pointed at what is the cause of that. Also, as I mentioned this is unlikely to be fixed in the full framework but we are looking at that in the .NET future releases. Thanks!

@cg110
Copy link
Author

cg110 commented Nov 23, 2020

I'm puzzled, I'd just like the DateTime.utcnow changed to GetUtcNow() to match the diagnosticsource behaviour. Is that not doable to give better accuracy? It's already used in the stop/dispose.

Is that not going to be fixed because it's framework? Utcnow is inaccurate to 16ms so bad for timing things

@tarekgh
Copy link
Member

tarekgh commented Nov 23, 2020

@cg110 in general it depends on where we can fix this. If we need to fix UtcNow, then the fix will not be in Full Framework. If the fix would be in System.Diagnostics.DiagnosticSource, then the fix will be spanned all frameworks and you'll get it as this is NuGet package which can be used in the Full Framework too. I'll reactivate this issue to track if anything can be done in System.Diagnostics.DiagnosticSource not depending on UtcNow.

@tarekgh tarekgh reopened this Nov 23, 2020
@Clockwork-Muse
Copy link
Contributor

Clockwork-Muse commented Nov 23, 2020

@cg110 - Generally speaking you shouldn't be using DateTime.UtcNow for timing/measuring duration, you should be using Stopwatch/Timer, since they usually use a more accurate relative timer underneath. That said, it's only potentially at the 16ms inaccuracy, since more recent versions of Windows are using the more precise time call.

@tarekgh - Although the cost of the call was less than the resolution (100ns). And that's not the same thing as the ~16ms innacuraccy.

@tarekgh
Copy link
Member

tarekgh commented Nov 23, 2020

@Clockwork-Muse make sense.

By the way, this is not related to ActivitySource as the issue title suggest. This is part of Activity which existed for long time. I understand there is some changes from GetUtcNow but this apply to Activity and not really ActivitySource only. Could you modify the title and the description that this apply to Activity and not ActivitySource?

@cg110
Copy link
Author

cg110 commented Nov 23, 2020

@tarekgh I believe the fix to update this line: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs#L1020

To use GetUtcNow() instead of DateTime.UtcNow, as someone did the legwork already for Activity class to have a "more accurate" utcnow on netfx in: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.DateTime.netfx.cs

I can't use a more accurate timer on an Activity without adding custom stopwatch timestamps, Activity has made a decision to use DateTime for it's timings, which in general will be good enough, I rarely expect sub-ms cross calls, but sub 16ms is normal.

Sorry if the title was confusing, it's only the code path for Activities started by ActivitySource that hit this problem. Diagnostic source activities don't.

@cijothomas
Copy link
Contributor

To clarify,
if I do new Activity().Start (this is the DS path) - timestamp is accurate.
If I do new ActivitySource().StartActivity() - then timestamp is inaccurate.
Means this is tied to ActivitySource path only.

@tarekgh
Copy link
Member

tarekgh commented Nov 23, 2020

@cg110 could you give it a try and measure if you see any difference? I am still concerned about the GetUtcNow though. it looks to me will be broken when the system introduce any leap seconds.

@cg110
Copy link
Author

cg110 commented Nov 23, 2020

Wall of unit test code (rather messy, but getting kids to bed here), two tests, both pass on .net core 3.1, on framework the first one fails at some point (hence 1,000 iterations to get it jitted and "fast"), the second (having used a delegate to access Activity.GetUtcNow()) doesn't.

You're right it probably doesn't handle leapseconds (as few probably even think about leap seconds), but for getting a more realistic duration it does a better job.

        [Test]
        public void CheckActivitySourceDurationAccuracy()
        {
            Activity.DefaultIdFormat = ActivityIdFormat.W3C;
            using var _activitySource = new ActivitySource("ActivityTest");

            using var _activityListener = new ActivityListener
            {
                ShouldListenTo = source => true,
                Sample = (ref ActivityCreationOptions<ActivityContext> options) => ActivitySamplingResult.PropagationData,
                SampleUsingParentId = (ref ActivityCreationOptions<string> options) => ActivitySamplingResult.PropagationData,
            };

            ActivitySource.AddActivityListener(_activityListener);

            Activity parentActivity;
            // need to loop to get code Jitted etc.
            foreach (var i in Enumerable.Range(1, 1000))
            {
                using (parentActivity = _activitySource.StartActivity("UnitTest"))
                {
                }

                Assert.AreNotEqual(new TimeSpan(1), parentActivity.Duration, "failed on Iteration {0}", i);
            }
        }

        private delegate DateTime GetUtcNow();

        private static GetUtcNow getUtcNow = createDelegate();

        private static GetUtcNow createDelegate()
        {
            var activityGetUtcNow = typeof(Activity).GetMethod("GetUtcNow", BindingFlags.Static | BindingFlags.NonPublic);
            return (GetUtcNow)Delegate.CreateDelegate(typeof(GetUtcNow), null, activityGetUtcNow);
        }


        private static DateTimeOffset GetActivityUtcNow()
        {
            // ActivitySource has a bug, and isn't using accurate start times, so use Activity.GetUtcNow():
            var activitytimestamp = getUtcNow();
            return activitytimestamp;
        }

        [Test]
        public void CheckActivitySourceDurationAccuracyWithGetDateTime()
        {
            Activity.DefaultIdFormat = ActivityIdFormat.W3C;
            using var _activitySource = new ActivitySource("ActivityTest");

            using var _activityListener = new ActivityListener
            {
                ShouldListenTo = source => true,
                Sample = (ref ActivityCreationOptions<ActivityContext> options) => ActivitySamplingResult.PropagationData,
                SampleUsingParentId = (ref ActivityCreationOptions<string> options) => ActivitySamplingResult.PropagationData,
            };

            ActivitySource.AddActivityListener(_activityListener);

            Activity parentActivity;
            // need to loop to get code Jitted etc.
            foreach (var i in Enumerable.Range(1, 1000))
            {
                using (parentActivity = _activitySource.StartActivity("UnitTest", ActivityKind.Internal, default(ActivityContext), null, null, GetActivityUtcNow()))
                {
                }

                Assert.AreNotEqual(new TimeSpan(1), parentActivity.Duration, "failed on Iteration {0}", i);
            }
        }

@tarekgh
Copy link
Member

tarekgh commented Nov 23, 2020

@cg110 thanks. I think we can do better and stay accurate with the system here. Will keep the issue to track looking at a better solution.

@tarekgh tarekgh added enhancement Product code improvement that does NOT require public API changes/additions and removed question Answer questions and provide assistance, not an issue with source code or documentation. labels Nov 23, 2020
@cg110
Copy link
Author

cg110 commented Nov 24, 2020

Would it be possible to get a fix in a 5.0.x nuget for the DateTime to GetUtcNow change? Rather than have a full fix for leap seconds etc (as that is likely a much bigger/invasive change)

While I can control our usages of ActivitySource and wrap the calls to replace startTime, I can't for 3rd party libraries.

@tarekgh
Copy link
Member

tarekgh commented Nov 24, 2020

@cg110 how blocking this to you? usually we are very careful servicing any fix to a released product. More info will help here telling how this issue breaking you. Thanks!

@cg110
Copy link
Author

cg110 commented Nov 24, 2020

@tarekgh the problem is that it makes the usage far more complex, particularly as we were looking to increase usage of ActivitySource particularly for examining performance of calls, and where we spend the time. This now means we can't trust the duration/performance data.

Would it help to raise a case via support?

I can provider a wrapper/helper class that contains ActivitySource, and injects the starttime on the way through, but it's complex for places that could have quickly put in an activitySource and done a startActivity is a client call, as they have to either use the wrapper (so we have to make it available to everyone), or get the reflected time source.

And it also won't help with 3rd parties who don't know of this problem.

It shows up in the the code below (which is an extreme case) shows that start times for 10,000 calls are in a few buckets, and they did not start at the same time:
DateTime 2020-11-24T19:07:18.8399726Z, count 1
DateTime 2020-11-24T19:07:18.8599748Z, count 404
DateTime 2020-11-24T19:07:18.8609748Z, count 1091
DateTime 2020-11-24T19:07:18.8619748Z, count 1208
DateTime 2020-11-24T19:07:18.8629738Z, count 1219
DateTime 2020-11-24T19:07:18.8639735Z, count 1210
DateTime 2020-11-24T19:07:18.8649750Z, count 1219
DateTime 2020-11-24T19:07:18.8659740Z, count 933
DateTime 2020-11-24T19:07:18.8669747Z, count 1134
DateTime 2020-11-24T19:07:18.8679738Z, count 1035
DateTime 2020-11-24T19:07:18.8689743Z, count 546

code:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace testduration
{
    class Program
    {
        static void Main(string[] args)
        {

            using var activitysource = new ActivitySource("Test");

            using var listen = new ActivityListener
            {
                ShouldListenTo = source => true,
                Sample = (ref ActivityCreationOptions<ActivityContext> options) => ActivitySamplingResult.PropagationData,
                SampleUsingParentId = (ref ActivityCreationOptions<string> options) => ActivitySamplingResult.PropagationData,

            };

            ActivitySource.AddActivityListener(listen);

            int x = 10000;

            var results = new Activity[x];

            foreach (var i in Enumerable.Range(0, x))
            {
                using var activity = activitysource.StartActivity("someCall", ActivityKind.Client);
                results[i] = activity;
            }

            //foreach (var i in Enumerable.Range(0, x))
            //{
            //    Console.WriteLine("Start {0:O}, duration {1:c}, theoretical end {2:O}", results[i].StartTimeUtc, results[i].Duration, results[i].StartTimeUtc + results[i].Duration);
            //}

            var groups = results.GroupBy(x => x.StartTimeUtc, x => x.Duration.TotalMilliseconds).Select(x => new {x.Key, Count= x.Count(), average = x.Average()});

            foreach (var group in groups)
            {
                Console.WriteLine("DateTime {0:O}, count {1}", group.Key, group.Count);
            }

            Console.ReadLine();

        }
    }
}

@tarekgh
Copy link
Member

tarekgh commented Nov 24, 2020

@cg110 Can't you just pass the start time when calling ActivitySource.StartActivity(...)? this will be very simple workaround.

@cg110
Copy link
Author

cg110 commented Nov 24, 2020

That's what I'm doing, by wrapping activitysource and setting start time by reflecting Activity.getutcnow, as it needs to be the same time source that endtime uses.

But it's not an obvious thing to do. And no guarantee that 3rd party instrumentation will do it. And also adds complexity to every instrumentation.

@tarekgh
Copy link
Member

tarekgh commented Nov 24, 2020

I am not expecting many libraries is using ActivitySource yet. most of the libraries is using new Activity(...) and my understanding is at that time the issue will not show up because it will use GetUtcTime and not DateTime.Now.
Also, the issue would be the time precision and not really wrong time. so the reported time still correct but not with high precision.

I am trying to figure out why this would be an issue if we wait to fix it in 6.0 release? we can get it early in 6.0 too.

@cg110
Copy link
Author

cg110 commented Nov 24, 2020

Perhaps a different way around, why is a small 1 line bug fix not possible in a service release?

I think I'll patch it up in the listener, and hope nothing else ever sets the starttime, as I suspect if I always set it to the reflected getutcnow it'd be good enough. (Or I could try and source a better time and set start and end time in the listener)

@tarekgh
Copy link
Member

tarekgh commented Nov 24, 2020

Perhaps a different way around, why is a small 1 line bug fix not possible in a service release?

Because there are a tones of 1 line fixes around for many issues. if we decide go to this route, the accumulated changes would be big and risky. This is why we have a servicing bar we measure against regardless if the fix is one line.

I think I'll patch it up in the listener, and hope nothing else ever sets the starttime, as I suspect if I always set it to the reflected getutcnow it'd be good enough. (Or I could try and source a better time and set start and end time in the listener)

are you doing that in the listener to listen to all other activities created from other sources? or you care only about the activities created by your code?

@cg110
Copy link
Author

cg110 commented Nov 24, 2020

I'll probably update them all, although right now it's be just ones I've added, but that may change as more code adopts activitysource. Eg if we use opentelemetry instrumentation.

My expectation is to do propagation for things that aren't ours (so parent and child isn't lost, as i thought of not listening but if some other library does listen then you can get gaps in parent->child)

But allow the option to add sources to gather all data.

@tarekgh
Copy link
Member

tarekgh commented Nov 24, 2020

I am checking if we can service the fix. I'll let you know what I'll get.

@cg110
Copy link
Author

cg110 commented Nov 24, 2020

Thanks for considering it.

I do wonder if duration should actually be tied to stopwatch.gettimestamp() and measure actual ticks passed not rely on math with datetime. As even on core it can still be 16ms accuracy (and technically can change etc). Or perhaps a new property TimedDuration.

I might look at sticking timestamps on activities as a custom property and see how that looks.

@tarekgh
Copy link
Member

tarekgh commented Nov 24, 2020

[edited comment]
@cg110 I don't think we need to do anything for the duration. we already call GetUtcNow() in the Stop() and then calculate the duration from that. So duration is always synced with the way we calculated start time.

just realized after writing this comment you meant to ask about your workaround and not the fix in the framework.

@Clockwork-Muse
Copy link
Contributor

As even on core it can still be 16ms accuracy (and technically can change etc).

Normally changes due to things like clock updates from NTP are rare, and not very "far" (within a second or less). If you're measuring durations in aggregate, such a change isn't likely to affect things enough to be noticeable.

@tarekgh
Copy link
Member

tarekgh commented Nov 25, 2020

This is fixed by the PR #45176 and the fix will be released in the next servicing release 5.0.1.

@tarekgh tarekgh closed this as completed Nov 25, 2020
@chris-smith-zocdoc
Copy link

@tarekgh will there be a release of the nuget package? https://www.nuget.org/packages/System.Diagnostics.DiagnosticSource/

@tarekgh
Copy link
Member

tarekgh commented Dec 10, 2020

@chris-smith-zocdoc yes there will be updated NuGet package.

@danmosemsft @ericstj 5.0.1 is released on Dec 8th and I am not seeing the package https://www.nuget.org/packages/System.Diagnostics.DiagnosticSource/ updated as we have some servicing fix there. When we expect this will get updated?

@ericstj
Copy link
Member

ericstj commented Dec 11, 2020

Looks like the package wasn't added to the build for servicing. @Anipik can you do this for 5.0.2?

@ericstj ericstj reopened this Dec 11, 2020
@Anipik
Copy link
Contributor

Anipik commented Dec 11, 2020

Looks like the package wasn't added to the build for servicing. @Anipik can you do this for 5.0.2?

yep i will add that

@Anipik
Copy link
Contributor

Anipik commented Dec 14, 2020

I added the package, we can keep this open until we ship the package

@cijothomas
Copy link
Contributor

@Anipik When does 5.0.2 gets released to nuget?

@Anipik
Copy link
Contributor

Anipik commented Dec 15, 2020

It will be shipped around second week of january

@tarekgh
Copy link
Member

tarekgh commented Jan 20, 2021

@tarekgh tarekgh closed this as completed Jan 20, 2021
@tarekgh tarekgh modified the milestones: Future, 6.0.0 Jan 21, 2021
@dotnet dotnet locked as resolved and limited conversation to collaborators Feb 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Diagnostics.Activity enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

No branches or pull requests

9 participants