Support OpenTelemetry concepts on Activity #31373

tarekgh · 2019-11-02T00:07:01Z

This issue tracking the work as the part of the issue https://github.com/dotnet/corefx/issues/42305

Please look at dotnet/designs#98 for the detailed design.

noahfalk · 2020-03-03T03:12:19Z

@tarekgh is the primary dev owner, I am actively assisting. So far design feedback has been occurring in #32660

reyang · 2020-03-12T01:48:27Z

Looks great! Thanks @tarekgh and @noahfalk!

One question related to

using (Activity foo = new Activity(...))
{
    throw new FoobarException(...);
}

If it is possible to capture the exception and associate it with the Activity?

In OpenTelemetry Python there is such feature (although I think it is more like a question to the language/runtime rather than API) where we know if there is exception and derive status code.

tarekgh · 2020-03-12T02:03:39Z

Thanks @reyang for your review.

I don't think we can automatically capture the exception inside using block without try-catch. The user can write the code like:

using (Activity foo = Activity.StartNew(...))
{
    try
    {
        throw new FoobarException(...);
    }
    catch (Exception e)
    {
        foo?.SetCustomProperty("Exception", e);
    }
}

and if there is any listener, will get the notification in stopping the activity and then can check if the exception is set there using GetCustomProperty. I'll think if we'll have a better way to handle it.

reyang · 2020-03-12T02:11:27Z

If developers perform exception handling in a central place (e.g. the evil global exception handler, or catching exception from top level code), is there a way for them to know what's the enclosing activity? (I guess no?)

try {
    using (var foo = new Activity("foo"))
    {
        using (var baz = new Activity("baz")
        {
            DoSomething();
        }
        DoSomething();
    }
    using (var bar = new Activity("bar"))
    {
        DoSomething();
    }
} catch (FoobarException ex) {
    log(
        ex,
        MagicalFunctionWhichReturnsTheEnclosingActivity(ex).Name, // this is hard since activity is disposed already
        MagicalFunctionWhichReturnsTheVeryContext(ex).SpanId,
    );
}

chandramouleswaran · 2020-03-12T03:24:15Z

Can you also share the examples for creating an Activity from a parent activity ID only? Specifically around async workflows.

There are lots of use cases for us where we might spawn off an async task while the sync portion of the API needs to resume back in existing Activity context. In that case, we would still want to link the async task's activity to the API activity.

There are couple of options I can think of
Option 1 (taking inspiration from internal Ifx tools)
Create an Activity and do not dispose it.
Provide the ability to Unload it which returns to the older context.
Activity created can be passed to the task
Allow the option to Load the activity back and continue using a dispose pattern

Option 2
Get the serialized Activity data (may be just the current full ID is good enough if it has parent information)
Pass that along to the async task
Create a new activity with the serialized data as the parent
Use the dispose pattern

noahfalk · 2020-03-12T10:12:04Z

Thanks Tarek! Its a little tricky to comment on in GitHub because all the text is in the body of an issue. I'll do quotes but if the comments grow it might be helpful to put the text as a markdown document in a PR to allow inline commenting.

We also enhance and simplify the usage of the Activity

I think there is another major value prop worth calling out explicitly: it is now reasonable for a listener such as OpenTelemetry to observe Activity instrumentation from any code component in the app. Previously OpenTelemetry would have needed a-priori knowledge of the strings that would be used for DiagnosticSource name and IsEnabled() callbacks for each Activity it wanted to receive callbacks for.

The current usage pattern of the Activity is not neat and not that performant

I don't think the current proposal is making any performance improvement? Previously when we had ActivitySource that offered a faster path, but we've removed it.

Also, the listener should be more performant than the current listening way (through DiagnosticListener)

Similar to above, I don't think the current proposal is faster?

// The following pattern is useful for the Dependency Injection scenario to allow easily creating the listener with the reflection without the need to subclass and override the listener methods.
        ActivityListener listener = ActivityListener.CreateAndStart(shouldCreateActivity: (name, ContextBoundObject, links) => true, onActivityStarted: (activity) => { ... }, onActivityStopped: (activity) => { ... });

I think ContextBoundObject was unintended? Lower in the spec it is ActivityContext
The strongly typed objects in the parameters may make it tricky to use via reflection but something like expression trees might make it doable. I'll play with it.

public virtual bool ShouldCreateActivity(string activitySourceName, ActivityContext context, IEnumerable? links)

Naming suggestions:
activitySourceName -> activityName
context -> parent

For ActivityListener I assume the callback behavior is that OnActivityStarted/OnActivityStopped get invoked for every Activity, but ShouldCreateActivity only gets invoked for Activities created via Activity.StartNew? We should clarify that.

I've been thinking a bit about how we handle cases where someone is using Hierarchial ids rather than W3C ids. I think there may be an easy path where we add an overload to Activity.StartNew and ActivityListener.ShouldCreateActivity that uses a string for the id instead of the ActivityContext.

noahfalk · 2020-03-12T10:48:50Z

If it is possible to capture the exception and associate it with the Activity?

If we had a defined place on the Activity where Exception should be stored then the BCL (or any of our users) could make a helper method that lets you write code like this:

Activity.RunWithActivity("MyActivityName", activity =>
{
   ... code that does whatever work inside activity scope
   throw new Exception("oopsie");
});

That helper could be implemented something like:

class Activity
{
    public static RunWithActivity(string name, Action<Activity?> action)
    {
        try
        {
             Activity? a = Activity.StartNew(name);
             action(a);
        }
        catch(Exception e) when AssociateException(a, e)
        { Assert.Fail("This should be unreachable code") }
        finally
        {
             a?.Dispose();
        }
    }

    static bool AssociateException(Activity a, Exception e)
    {
         a.SetCustomProperty("Exception", e);
         return false; // this function gets evaluated as an exception filter, we don't want to catch the exception
    }
}

If developers perform exception handling in a central place (e.g. the evil global exception handler, or catching exception from top level code), is there a way for them to know what's the enclosing activity? (I guess no?)

In a catch handler no, but in an exception filter potentially yes. However if the exception is caught and then rethrown at some intermediate frame (as often happens with async) then the filter won't help either.

tarekgh · 2020-03-12T16:48:58Z

@noahfalk thanks for the feedback. I have updated the description according to your comments. please let me know if I missed anything there.

@chandramouleswaran thanks for the feedback too, I'll try to get back to you soon.

pakrym · 2020-03-12T16:50:25Z

Is there a fast way to check if activity should be created? Sometimes collecting context and links might be expensive, it would be nice to have a performant way to know beforehand if someone is listening on the activity.

tarekgh · 2020-03-12T17:00:33Z

@pakrym we Activity.StartNew(string name) which doesn't take a context or links.

if you are talking about the Listener side and want to avoid dealing with context/links there, we may think to add a callback to the listener which doesn't take a context/links. but I am not sure if this case concerns you.

namespace System.Diagnostics
{
    public class ActivityListener : IDisposable
    {
        ...
        public virtual bool ShouldCreateActivity(string activityName, ActivityContext parent, IEnumerable<ActivityLink>? links)
        public virtual bool ShouldCreateActivity(string activityName)
        ...   
   }
}

pakrym · 2020-03-12T17:11:06Z

As a producer of activity I want to know when any particular activity is enabled so I don't have to extract links/ActivityContext from, for example an incoming request, when no one is listening.

tarekgh · 2020-03-12T17:24:14Z

got it. would adding something like:

public static bool ActivityListener.IsEnabled(string activityName)

will it address your scenario?

pakrym · 2020-03-12T17:26:47Z

would adding something like:
public static bool ActivityListener.IsEnabled(string activityName)

Would it be as fast as DiagnosticSource.IsEnabled() check currently is (a single non-virtual field read)?

tarekgh · 2020-03-12T17:52:27Z

Would it be as fast as DiagnosticSource.IsEnabled() check currently is (a single non-virtual field read)?

If there are no registered listeners, then yes, we'll just check a static field value and if it is null we'll just return false. if there is any registered listener, we'll need to call them to know if any of the listeners interested in creating such an activity with the input name.

pakrym · 2020-03-12T17:56:01Z

Then I don't think it addresses my scenario, N virtual calls just to know if an activity should be created seems way to high for something like Kestrel.

noahfalk · 2020-03-17T00:10:25Z

in the best case but also the way it scales

Agreed it certainly scales differently. So far I can't come up with a scenario where that scaling appears to matter? If there are scenarios that are creating 100s or 1000s of Activities per request but somehow the requests are also really fast it would be an issue - but I don't forsee that happening in practice. If we had a plausible hypothetical example I might see it differently.

tarekgh · 2020-03-17T18:56:26Z

Summary to the status of the discussions

SpanContext.Tracestate will keep it as string per @reyang recommendation
SpanContext.IsValid/IsRemote will not be added until we find a value for it. per @reyang recommendation. IsValid is a trivial check (context == default). IsRemote, was recommended not to added before by Liudmila.
Added public bool TryGetContext(out ActivityContext context) to Activity class to retrieve the Activity context.
Status/Event will not be added to the Activity for now.
I added a couple of overloads that allow passing Kind, attributes, start time, context and links.
IsRecording would not be needed per @noahfalk comment: Support OpenTelemetry concepts on Activity #31373 (comment). @SergeyKanzhelev please let's know if you have a comment regarding that.
I added a setter to OperationName (to address UpdateName). @noahfalk please advise if you have any concern allowing the setter.
Removed the ActivityListener and added public static IDisposable StartListening(Func<string, ActivityContext, IEnumerable<ActivityLink>?, bool> shouldCreateActivity, Action<Activity> onActivityStarted, Action<Activity> onActivityStopped); to the Activity class.
Will not add Tracer for the mentioned complexity. @SergeyKanzhelev let's know if you have a comment about that.
@pakrym concern about the perf which @noahfalk replied.
- @davidfowl could you please help to look and advise for the ASP.NET scenario?
- I don't mind returning ActivitySource back if the perf is really a concern here and also @SergeyKanzhelev need to clarify Luidmila's comments regarding ActivitySource which mentioned here Proposed new System.Diagnostics APIs #32660 (review).

noahfalk · 2020-03-17T22:20:18Z

Thanks @tarekgh!

I added a setter to OperationName (to address UpdateName). @noahfalk please advise if you have any concern allowing the setter.

I am concerned that if we add that setter people will use it in ways that break pre-existing code. All .NET code for the past decade has been able to rely on OperationName being an immutable value. I recommend we do not add the setter and get feedback from @SergeyKanzhelev or @reyang to understand what the use case is to search for an alternate solution.

IsRecording would not be needed per @noahfalk comment: #31373 (comment).

I had an open question (to @SergeyKanzhelev and @reyang) in that portion that needs to be answered before we can resolve IsRecording: "The spec isn't clear about what the value prop is to create Spans where IsRecording is false given that they won't be reported to SpanProcessors?"

I added a couple of overloads that allow passing Kind, attributes, start time, context and links.

It is hard to provide feedback on this before the code changes are visible. Ideally new work would should up as changes to the APIs in the design doc. Changes that people are proposing but we ultimately decide not to do can show up in the Q&A section to record the request and the rationale for why it didn't happen or what the alternative is.

Earlier I commented about needing to handle the hierarchial ID case. I think we need an overload ShouldCreateActivity which uses a string instead of ActivityContext.

reyang · 2020-03-17T22:48:21Z

I am concerned that if we add that setter people will use it in ways that break pre-existing code. All .NET code for the past decade has been able to rely on OperationName being an immutable value. I recommend we do not add the setter and get feedback from @SergeyKanzhelev or @reyang to understand what the use case is to search for an alternate solution.

FYI - this PR has some context about the span name semantic. I don't think users would be blocked if there is no UpdateName. It could be a bit inconvenient as I could imagine, as we have to give users some guidance which is not aligned with other OpenTelemetry SDKs.

I had an open question (to @SergeyKanzhelev and @reyang) in that portion that needs to be answered before we can resolve IsRecording: "The spec isn't clear about what the value prop is to create Spans where IsRecording is false given that they won't be reported to SpanProcessors?"

I think the spec does not ask SDK to create a new span. Most implementations would return a dummy span singleton when we know it is not being reported to SpanProcessor.

benaadams · 2020-03-18T00:08:35Z

in the best case but also the way it scales

Agreed it certainly scales differently. So far I can't come up with a scenario where that scaling appears to matter? If there are scenarios that are creating 100s or 1000s of Activities per request but somehow the requests are also really fast it would be an issue. If we had a plausible hypothetical example I might see it differently.

Isn't it that entire design though? This is tracing through a latency chain of many components not a single request; its N requests occurring sequentially so is directly N x additive to the latency.

tarekgh · 2020-03-18T01:55:11Z

@noahfalk I have changed the design doc to reflect the latest changes. we may need to add more bullets in the Q & A section to capture the updates in the discussion here.

noahfalk · 2020-03-18T02:50:01Z

[@reyang] FYI - this PR has some context about the span name semantic. I don't think users would be blocked if there is no UpdateName. It could be a bit inconvenient as I could imagine, as we have to give users some guidance which is not aligned with other OpenTelemetry SDKs.

Thanks! Reading thought the PR I still felt like I was grasping at the scenario where a user would need to call UpdateName(). The closest explanation I could find was this sentence at the end "Instrumentation MUST NOT default to using URI path as span name, but MAY provide hooks to allow custom logic to override the default span name." If I follow correctly the worry was that the code creating the Span either can't access the URI or doesn't know what portions of it are dynamic. Some code later on does have that information and wants to use it to improve descriptive text shown in the UI. If that is accurate it suggests there are two different name concepts being squeezed into one property, the initial one is for classification and sampling within the app, and the later improved one is only intended to aid end-user understanding. If we wanted to model that in .Net my first thought would be to have two properties, Activity.OperationName and Activity.DisplayName. The default value of DisplayName would be identical to OperationName, but it can be changed. However this feels like a place we'd want to engage on the broader spec rather than coming up with our scheme in isolation. I agree that the best path in the immediate future would be to leave it out.

[@reyang] I think the spec does not ask SDK to create a new span. Most implementations would return a dummy span singleton when we know it is not being reported to SpanProcessor.

Cool, then it sounds like there is no issue if .Net returns a null Activity to represent this case rather than a dummy.

[@benaadams ] Isn't it that entire design though? This is tracing through a latency chain of many components not a single request; its N requests occurring sequentially so is directly N x additive to the latency.

By single request I am referring to a single trace-id from the point it enters the .Net Core app until they point we send a response message that completes the request. Certainly it is possible the code that implements this request could have many sequential steps within it, including the creation of many Activities. However I don't think there is a scenario where the implementation both creates a large number of Activities and it is already extremely fast? For example do we expect a scenario where a request completes in 100us needs 100 Activities to instrument all of its dependencies? Being fast is usually mutually exclusive with doing a large amount of work.

@noahfalk I have changed the design doc to reflect the latest changes. we may need to add more bullets in the Q & A section to capture the updates in the discussion here.

Thanks!

SergeyKanzhelev · 2020-03-18T23:32:02Z

I don't know how to comment on this thread. I'll try. Pretty-please, can we switch to google doc?

UpdateName is absolutely needed. Asp.Net Core listener wouldn't work without it: https://github.com/open-telemetry/opentelemetry-dotnet/blob/633dda37bdb171f337596ba4b53021b5d35f117c/src/OpenTelemetry.Collector.AspNetCore/Implementation/HttpInListener.cs#L148
IsRecording vs. no activity allocation. I commented before that not creating activity breaks propagation of the context. Which will break user scenarios. So we need Activity to be created or at least context propagated. Thus it needs some indication on whether recording enabled or not.
Having specific callback for sampling instead of having the sampling decision made in creation callback is not aligned with OpenTelemetry spec. Some features like attributes returned by Sampler to be associated with the Span will simply not work.

noahfalk · 2020-03-19T08:32:10Z

Pretty-please, can we switch to google doc?

I agree a single inline thread is hard to follow - I am already trying to move discussion to the PR which does allow for inline annotations and replies to different threads of conversation. I think this is similar to properties we'd get from a google doc. I don't really want to move yet again to google doc because (a) it is different than the convention every other .NET design issue uses, (b) every time we move the discussion forum it breaks the flow and makes it that much harder to understand the history or accumulate issues in one place.

It is very tempting (I am guilty too) to keep replying back to this issue. To try and break that pattern I copied each of the issues you raised to comments in relevant portions of the PR and responded there instead.

SergeyKanzhelev · 2020-03-19T16:42:47Z

@noahfalk yes, PR in designs repo is OK.

tarekgh · 2020-03-29T02:05:37Z

Just FYI, I have updated the design proposal dotnet/designs#98 to reflect the latest discussions and decisions.

chandramouleswaran · 2020-04-03T17:48:24Z

@chandramouleswaran thanks for the feedback too, I'll try to get back to you soon.

Any updates here?

tarekgh · 2020-04-03T18:12:37Z

@chandramouleswaran, sorry for the late reply I have missed this one unintentionally. thanks for the reminder.

ActivitySource.StartActivity overload APIs provide multiple options to specify the parent context.

        /// Creates a new Activity object if there is any listener to the Activity, returns null otherwise.
        public Activity? StartActivity(string name, ActivityKind kind = ActivityKind.Internal);

        /// Creates a new <see cref="Activity"/> object if there is any listener to the Activity events, returns null otherwise.
        public Activity? StartActivity(
                            string name,
                            ActivityKind kind,
                            ActivityContext parentContext, IEnumerable<KeyValuePair<string, string?>>? tags = null,
                            IEnumerable<ActivityLink>? links = null,
                            DateTimeOffset startTime = default);

        /// Creates a new <see cref="Activity"/> object if there is any listener to the Activity events, returns null otherwise.
        public Activity? StartActivity(
                            string name,
                            ActivityKind kind,
                            string parentId,
                            IEnumerable<KeyValuePair<string, string?>>? tags = null,
                            IEnumerable<ActivityLink>? links = null,
                            DateTimeOffset startTime = default);

Here are the options:

If you use the first overload which not providing ActivityContext nor parent Id, we'll create the Activity and have the parent pointing at the existing Activity.Current. This should address the async scenario as Activity.Current is defined as AsyncLocal which transfers with the context of the async call.
If you use the second overload, you can provide the parent ActivityContext (which includes the parent trace Id, span Id, flags and state). You have the control to set the parent context to the newly created Activity.
If you use the third overload, you can create the Activity object using the parent Id only.

Please let me know if you have more questions or anything unclear here.

tarekgh · 2020-04-14T20:41:22Z

dotnet/apireviews#117

terrajobst · 2020-04-17T17:02:05Z

(marked as approved because that's what we decided in the notes but I didn't update the item during the review)

tarekgh · 2020-05-18T20:53:28Z

Use new Activity to Replace OT Span: open-telemetry/opentelemetry-dotnet#660

tarekgh assigned noahfalk Nov 2, 2019

msftgits transferred this issue from dotnet/corefx Feb 1, 2020

msftgits added this to the Future milestone Feb 1, 2020

maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020

noahfalk assigned tarekgh and unassigned noahfalk Mar 3, 2020

tarekgh added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed enhancement Product code improvement that does NOT require public API changes/additions untriaged New issue has not been triaged by the area owner labels Mar 11, 2020

tarekgh modified the milestones: Future, 5.0 Mar 11, 2020

noahfalk changed the title ~~Make Activity and OpenTelemetry Span the same~~ Support OpenTelemetry Span operations on Activity Mar 12, 2020

noahfalk mentioned this issue Mar 12, 2020

Tracing and Counters Interest Group - Announcements #13019

Open

noahfalk changed the title ~~Support OpenTelemetry Span operations on Activity~~ Support OpenTelemetry concepts on Activity Mar 12, 2020

NickDarvey mentioned this issue Mar 12, 2020

Distributed tracing and Orleans dotnet/orleans#4992

Closed

gregkalapos mentioned this issue Apr 1, 2020

Support one or more of OpenTracing, OpenCensus, OpenTelemetry technologies elastic/apm-agent-dotnet#303

Closed

tarekgh added api-ready-for-review and removed api-needs-work API needs work before it is approved, it is NOT ready for implementation labels Apr 2, 2020

tarekgh mentioned this issue Apr 8, 2020

Distributed Context in ASP.NET and Open Telemetry #31372

Closed

3 tasks

reyang mentioned this issue Apr 9, 2020

Exception Handling best practice reyang/test#6

Open

terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review labels Apr 17, 2020

tarekgh mentioned this issue Apr 20, 2020

System.Diagnostics Tracing APIs #35220

Merged

tarekgh closed this as completed in #35220 Apr 23, 2020

tarekgh mentioned this issue Apr 23, 2020

System.Diagnostics Tracing APIs (#35220) #35326

Merged

kzu mentioned this issue May 28, 2020

Adding support for end-to-end activity tracing microsoft/vs-streamjsonrpc#470

Closed

tarekgh mentioned this issue Jun 11, 2020

Improve Activity API usability and OpenTelemetry integration (Part 2) dotnet/designs#134

Merged

tarekgh mentioned this issue Aug 3, 2020

System.Diagnostics.ActivitySource API Addition #40046

Closed

dotnet locked as resolved and limited conversation to collaborators Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support OpenTelemetry concepts on Activity #31373

Support OpenTelemetry concepts on Activity #31373

tarekgh commented Nov 2, 2019 •

edited

noahfalk commented Mar 3, 2020

reyang commented Mar 12, 2020

tarekgh commented Mar 12, 2020 •

edited

reyang commented Mar 12, 2020 •

edited

chandramouleswaran commented Mar 12, 2020

noahfalk commented Mar 12, 2020

noahfalk commented Mar 12, 2020

tarekgh commented Mar 12, 2020

pakrym commented Mar 12, 2020

tarekgh commented Mar 12, 2020

pakrym commented Mar 12, 2020

tarekgh commented Mar 12, 2020

pakrym commented Mar 12, 2020

tarekgh commented Mar 12, 2020

pakrym commented Mar 12, 2020

noahfalk commented Mar 17, 2020

tarekgh commented Mar 17, 2020 •

edited

noahfalk commented Mar 17, 2020

reyang commented Mar 17, 2020

benaadams commented Mar 18, 2020

tarekgh commented Mar 18, 2020

noahfalk commented Mar 18, 2020 •

edited

SergeyKanzhelev commented Mar 18, 2020

noahfalk commented Mar 19, 2020

SergeyKanzhelev commented Mar 19, 2020

tarekgh commented Mar 29, 2020

chandramouleswaran commented Apr 3, 2020

tarekgh commented Apr 3, 2020 •

edited

tarekgh commented Apr 14, 2020

terrajobst commented Apr 17, 2020

tarekgh commented May 18, 2020 •

edited

Support OpenTelemetry concepts on Activity #31373

Support OpenTelemetry concepts on Activity #31373

Comments

tarekgh commented Nov 2, 2019 • edited

noahfalk commented Mar 3, 2020

reyang commented Mar 12, 2020

tarekgh commented Mar 12, 2020 • edited

reyang commented Mar 12, 2020 • edited

chandramouleswaran commented Mar 12, 2020

noahfalk commented Mar 12, 2020

noahfalk commented Mar 12, 2020

tarekgh commented Mar 12, 2020

pakrym commented Mar 12, 2020

tarekgh commented Mar 12, 2020

pakrym commented Mar 12, 2020

tarekgh commented Mar 12, 2020

pakrym commented Mar 12, 2020

tarekgh commented Mar 12, 2020

pakrym commented Mar 12, 2020

noahfalk commented Mar 17, 2020

tarekgh commented Mar 17, 2020 • edited

Summary to the status of the discussions

noahfalk commented Mar 17, 2020

reyang commented Mar 17, 2020

benaadams commented Mar 18, 2020

tarekgh commented Mar 18, 2020

noahfalk commented Mar 18, 2020 • edited

SergeyKanzhelev commented Mar 18, 2020

noahfalk commented Mar 19, 2020

SergeyKanzhelev commented Mar 19, 2020

tarekgh commented Mar 29, 2020

chandramouleswaran commented Apr 3, 2020

tarekgh commented Apr 3, 2020 • edited

tarekgh commented Apr 14, 2020

terrajobst commented Apr 17, 2020

tarekgh commented May 18, 2020 • edited

tarekgh commented Nov 2, 2019 •

edited

tarekgh commented Mar 12, 2020 •

edited

reyang commented Mar 12, 2020 •

edited

tarekgh commented Mar 17, 2020 •

edited

noahfalk commented Mar 18, 2020 •

edited

tarekgh commented Apr 3, 2020 •

edited

tarekgh commented May 18, 2020 •

edited