Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to the ActivityListeners to add more data to the Activity when it gets created #40339

Closed
tarekgh opened this issue Aug 4, 2020 · 30 comments · Fixed by #40534
Closed
Labels
api-approved API was approved in API review, it can be implemented area-System.Diagnostics.Tracing
Milestone

Comments

@tarekgh
Copy link
Member

tarekgh commented Aug 4, 2020

Background and Motivation

We have exposed the following API in the ActivitySource class to create and start the activity.

    public Activity? StartActivity(string name, ActivityKind kind, ActivityContext parentContext, IEnumerable<KeyValuePair<string, object?>>? tags = null, IEnumerable<ActivityLink>? links = null, DateTimeOffset startTime = default)

If there is any listener to the Activity events, the parameters passed to StartActivity will be sent to the listener to decide if want to create the Activity object and with what initial contents.
OpenTelemetry samplers create a listener to sample in or out the Activities. We have got a new OpenTelemetry sampler scenario which is, besides deciding to sample in or out, it also needs to add some more tags to the activity when it gets created. That is means we'll need to have a way in our APIs to allow the listeners to pass back some more data we include in the Activity object we'll create later.

Currently, the callback to the listener has the signature:

    public delegate ActivityDataRequest GetRequestedData<T>(ref ActivityCreationOptions<T> options);

Where ActivityCreationOptions is a struct that wraps the parameters passed to StartActivity. Currently ActivityCreationOptions is a read-only struct. The proposal here is to make it read/write struct and adding some extra properties the listeners/sampler can use to stick the extra data (e.g. the extra tags).
When having ActivityCreationOptions as read/write, this will allow us in the future support communicating more data.

Also, we previously exposed ActivityListener.AutoGenerateRootContextTraceId property to allow the listener to communicate if need automatically generating a trace id when we have a root parent. It could be not obvious the purpose of this property when looking at the interfaces. So, we decided to enhance this after having ActivityCreationOptions became read/write. The idea is to get rid of the property AutoGenerateRootContextTraceId in the listener and then have a TraceId property on the ActivityCreationOptions. The property will be a getter only and internally we'll encapsulate the logic of generating a new trace Id when needed.

Last, we are proposing to enhance some of the names used in our APIs.

Proposed API

namespace System.Diagnostics
{
    public struct ActivityCreationOptions<T> // We removed the readonly specifier.
    {
        // the getter will automatically create a collection if not created before. 
        // The collection will be used by the listener to add more tags
        public ActivityTagsCollection SamplingTags { get; } 

        // a replacement for ActivityListener.AutoGenerateRootContextTraceId
        public ActivityTraceId TraceId { get; } 
    }
}

We are removing the property

namespace System.Diagnostics
{
    public sealed class ActivityListener : IDisposable
    {
        public bool AutoGenerateRootContextTraceId { get; set;}
    }
}

We are renaming the following methods/types:

ActivityListener.GetRequestedDataUsingParentId to SampleUsingParentId

ActivityListener.GetRequestedDataUsingContext to Sample

The enum ActivityDataRequest to ActivitySamplingResult

The delegate GetRequestedData to SampleActivity

Before renaming, we had the following:

    public delegate ActivityDataRequest GetRequestedData<T>(ref ActivityCreationOptions<T> options);

    public sealed class ActivityListener : IDisposable
    {
        public GetRequestedData<string>? GetRequestedDataUsingParentId { get; set; }
        public GetRequestedData<ActivityContext>? GetRequestedDataUsingContext { get; set; }
    }

After renaming we'll have:

    public delegate ActivitySamplingResult SampleActivity<T>(ref ActivityCreationOptions<T> options);

    public sealed class ActivityListener : IDisposable
    {
        public SampleActivity<string>? SampleUsingParentId { get; set; }
        public SampleActivity<ActivityContext>? Sample { get; set; }
    }
@tarekgh tarekgh added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Aug 4, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Diagnostics.Tracing untriaged New issue has not been triaged by the area owner labels Aug 4, 2020
@ghost
Copy link

ghost commented Aug 4, 2020

Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti
See info in area-owners.md if you want to be subscribed.

@tarekgh tarekgh removed the untriaged New issue has not been triaged by the area owner label Aug 4, 2020
@tarekgh tarekgh added this to the 5.0.0 milestone Aug 4, 2020
@tarekgh
Copy link
Member Author

tarekgh commented Aug 4, 2020

@tarekgh
Copy link
Member Author

tarekgh commented Aug 4, 2020

CC @lmolkova @noahfalk @cijothomas

This proposal is pending the other renames, will try to add it shortly.

@cijothomas
Copy link
Contributor

@CodeBlanch to share his ideas.

ActivityDataRequest -> ActivitySamplingResult
public delegate ActivityDataRequest GetRequestedData -> public delegate ActivitySamplingResult GetRequestedSampling

Do consider the following as well:
Activity.IsAllDataRequested -> Activity.Sampled

@tarekgh
Copy link
Member Author

tarekgh commented Aug 4, 2020

@cijothomas you suggestions looks reasonable to me. do you think the enum fields names are still ok?

    public enum ActivityDataRequest
    {
        None,
        PropagationData,
        AllData,
        AllDataAndRecorded
    }

@noahfalk
Copy link
Member

noahfalk commented Aug 4, 2020

Do consider the following as well:
Activity.IsAllDataRequested -> Activity.Sampled

OpenTelemetry uses the term Sampled as well, but it has a different meaning than what ours would have if we renamed this way:

  • Span.Context.TraceFlags == SampleFlag means Activity.IsRecorded
  • Span.IsRecording means Activity.IsAllDataRequested

I'd recommend we not use the term 'Sampled' because redefining its meaning would add further confusion.

@tarekgh
Copy link
Member Author

tarekgh commented Aug 4, 2020

what about:

PropagationData -> Propagation
AllData -> Logging
AllDataAndRecorded -> Exporting

and ActivityDataRequest would be something like ActivityBehavior or something similar.
Activity.IsAllDataRequested would be Activity.IsExporting?

properly I confused you more now :-)

@macrogreg
Copy link

macrogreg commented Aug 4, 2020

About adding the SamplingTags property:

Thank you for spotting this, adding public ActivityTagsCollection SamplingTags { get; } is great and it will allow the sampler to attach some information to the Activity.

  • If we are doing this, could we please also add public IDictionary<string, object> SamplingCustomProperties { get; } with similar semantics.

(Especially, in the PropagationData-case, adding tags may be "surprising", but customer properties may make sense from the perspective a the monitoring solution.)

This would support samplers that prefer to attach something more than a tag to the activity.

An alternative to consider would be to allow a sampler to add a post-creation callback:

    public struct ActivityCreationOptions<T>
    {
        public ActivityTraceId TraceId { get; } 
        public Action<Activity> InitializationCallback { get; set; } 
    }

it would be then invoked if the sampler set it to something other than null (and if the activity was actually created):

listner.GetRequestedDataUsingContext = (options)
    {
        options.InitializationCallback = (activity) =>
        {
            activity.SetCustomProperty("key", new MyType());
        }

        return ActivityDataRequest.PropagationData;
    }

@macrogreg
Copy link

macrogreg commented Aug 5, 2020

About the "sampling" terminology:

Monitoring solutions may want to use data reduction techniques other than sampling. They may preferentially treat traces that originated at a particular upstream component, traces that were marked in some special way by some upstream component, or generally "filter" in some non-purely=probabilistic way.

Would it make sense to move away from the "sampling" terminology and use "DataReduction" or similar instead, and leave it out where possible?

E.g.:

    public struct ActivityCreationOptions<T>
    {
        public ActivityTraceId TraceId { get; } 

        public ActivityTagsCollection Tags { get; } 
        public IDictionary<string, object> CustomProperties { get; }
    }

or if we pick the other way:

    public struct ActivityCreationOptions<T>
    {
        public ActivityTraceId TraceId { get; } 

        public Action<Activity> InitializationCallback { get; set; } 
    }

and

ActivityListener.ApplyDataReductionUsingParentId . . .

ActivityListener.ApplyDataReduction . . .

ActivityDataRequest could be ActivityDataReductionPolicy

@tarekgh
Copy link
Member Author

tarekgh commented Aug 5, 2020

@macrogreg

If we are doing this, could we please also add public IDictionary<string, object> SamplingCustomProperties { get; } with similar semantics.

Is this a real scenario we are facing today? I am asking because the bar now is very high and would be good to hurry avoid APIs which we didn't spend enough time looking at. The proposed APIs in the issue is a blocking OpenTelemetry scenario, that is why we are considering it now.

If you can tell more about your scenario we can look at it if it is really blocking. otherwise, I would suggested to track this for next release. Thanks.

@tarekgh
Copy link
Member Author

tarekgh commented Aug 5, 2020

I forgot to comment on

 public Action<Activity> InitializationCallback { get; set; } 

I would avoid doing this one for perf reason. This will make the Activity creation is heavy process when listeners exist. remember we can have multiple listeners. Also, we can add this later if proved is needed.
As a workaround, maybe you can listen to the Activity Start event and then add any needed custom property which almost will be same as your callback.

@cijothomas
Copy link
Contributor

I'll keep sampling instead of DataReduction, as OpenTelemetry is using the term sampling.
https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#sampler

@macrogreg
Copy link

@tarekgh

perf reason . . . will make the Activity creation is heavy process when listeners exist

Agreed, just wanted to highlight the "felxible" alternative.

Is this a real scenario we are facing today?

The scenario described in open-telemetry/opentelemetry-dotnet#953 is "comply to the spec". That is valid. Yet, thinking beyond that for a moment and considering what did the spec actually want to achieve in spirit:

Potentially one could create some custom way to pass state between GetRequestedData and StartActivity and set the tags in StartActivity. But it would be complex and, potentially slow. So the spec aims to allow a sampler (data reducer) to attach the "appropriate" state to the span (activity) at the "right" time, so that it can be carried with it.

The scenario here is the same - stay in the spirit with the spec and allow attaching the state at the "right" time. Consider a data reducer that makes a non-trivial decision based on some parameters. The monitoring vendor may expect that this data is serialized along with the span (activity) metadata in some special, structured way. For this, the sampler needs to attach it to the activity. Such structured metadata may include sampling rates and data that influenced the sampling decision and may be relevant to the user while consuming the span information.

On a meta-level: if it is too late to add these scenarios, that makes sense. However, if we decide to address the gap identified, and thus change the ActivityCreationOptions struct: if the struct is being changed, would it not be better to do it in the best way we can at this time? If the change does occur - how changing one line is better/safer than two lines?

@tarekgh
Copy link
Member Author

tarekgh commented Aug 5, 2020

@macrogreg custom properties are not part of the specs at all. This is .NET thingy to allow users attach anythings to the Activity.

If the change does occur - how changing one line is better/safer than two lines?

I want to tell we are changing the APIs now to allow the blocking scenarios. Think about, if we got the APIs right in the first time, we wouldn't change it now. This is the purpose of having preview builds for us to collect feedback on the APIs before we ship it. Adding anything now is risky as we'll not have much chance to fix anything after that. That means we need to address the blocking scenario and not just adding any API we think it is useful. I see the API you are proposing is useful but I am not seeing it urgent or blocking to get it now. Getting feedback from this release will be make us confident on adding such API or it can prove the opposite that it better not adding it.

In general in current time, if any API proposal is not urgent or blocking, I would prefer pushing it to the next release. notice I am not pushing back on the API itself more than just the timing.

@macrogreg
Copy link

@tarekgh

custom properties are not part of the specs at all. This is .NET thingy to allow users attach anythings to the Activity.

True. I was approaching the whole thing from the .NET perspective more than from the OTel perspective. :)

That means we need to address the blocking scenario and not just adding any API we think it is useful.

Agree. All I am doing is to provide a data point, and - of course - the decision is up to the .NET team.
In my opinion the original scenario in this thread s not the OTel compliance. OTel compliance is merely an example. The actual scenario is that the sampler (aka data reducer) wants to add "stuff" to the activity if it ends up being created. I believe that is it best to either punt this complete scenario for later (no change now), of fix it well, for all kinds of "stuff" that may belong on the activity. However, the proposed solution stands in the middle and does not either.

Honestly, if I owned this, I would consider fixing the terminology (because it is very confusing now and cannot be fix later), but not adding any functional changes. Instead, I would do that changes in a later release. But you likely have a more holistic perspective on this. :) So all I am saying is that he proposed approach feels half-way and either full-way or no-way seems preferable, :)

@CodeBlanch
Copy link
Contributor

Proposed API looks good to me! I'm happy we came around to doing a bit of renaming. GetRequestedDataUsingContext was really super confusing. Lead to some funny chats though 🤣

I like @tarekgh's GetSamplingResult & @cijothomas's ActivitySamplingResult.

I would personally vote for GetRequestedDataUsingContext -> ShouldSample because we already have ShouldListenTo on ActivityListener. The one mistake everyone makes is they don't implement those two complicated scary callbacks and then their listeners go quiet. Everyone will implement ShouldListenTo so having ShouldSample in there might help with that situation?

If we rename ActivityDataRequest -> ActivitySamplingResult maybe the enums will work better as actions?

    public enum ActivitySamplingResult
    {
        None,
        Propagate,
        Populate,
        PopulateAndRecord
    }

@tarekgh
Copy link
Member Author

tarekgh commented Aug 5, 2020

@macrogreg

All I am doing is to provide a data point

We appreciate that. We have the proposal to hear all opinions.

In my opinion the original scenario in this thread s not the OTel compliance.

The comment open-telemetry/opentelemetry-dotnet#953 (comment) is saying otherwise. I'll let @lmolkova and @cijothomas comment on that.

I would consider fixing the terminology

Agree. this is what we are trying to do here too.

but not adding any functional changes

From the issue open-telemetry/opentelemetry-dotnet#953 this looks blocking to me and we need to provide something to make this scenario work nicely.

@tarekgh
Copy link
Member Author

tarekgh commented Aug 5, 2020

@CodeBlanch

I would personally vote for GetRequestedDataUsingContext -> ShouldSample

We also have this delegate, what you suggest naming it?

public delegate ActivityDataRequest GetRequestedData<T>(ref ActivityCreationOptions<T> options);

and GetRequestedDataUsingParentId?

@macrogreg
Copy link

I'll keep sampling instead of DataReduction, as OpenTelemetry is using the term sampling.
https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#sampler

True. But is the goal to pick the best term, or to use the same words as OTel does? .NET already uses different terms at many places.
DataReduction may or may not be the better term over Sampling; I'd prefer to focus on that rather than to simply try to use the OTel terms.

@CodeBlanch
Copy link
Contributor

@tarekgh I forgot about the delegate! How about this?

    public enum ActivitySamplingResult
    {
        None,
        Propagate,
        Populate,
        PopulateAndRecord
    }

    public delegate ActivitySamplingResult GetSamplingResult<T>(ref ActivityCreationOptions<T> options);

    public GetSamplingResult<string>? ShouldSampleParentId { get; set; }
    public GetSamplingResult<ActivityContext>? ShouldSample { get; set; }

Something a bit more provocative...

    public enum ActivitySamplingResult
    {
        None,
        Propagate,
        Populate,
        PopulateAndRecord
    }

    public delegate ActivitySamplingResult GetSamplingResult(ref ActivityCreationOptions<ActivityContext> options);

    public Func<string, ActivityContext?>? ParseCustomParentId { get; set; }
    public GetSamplingResult? ShouldSample { get; set; }

What if we changed the ParentId case to be a parse/conversion to ActivityContext which we then pass to the regular ShouldSample? If user doesn't implement that part, or it returns null, we just try it as a regular W3C style.

@cijothomas
Copy link
Contributor

I would personally vote for GetRequestedDataUsingContext -> ShouldSample because we already have ShouldListenTo on ActivityListener

ShouldListenTo is bit different - it returns true/false, and the name makes it clear that its expected return type is bool.

ShouldSample - gives the feeling that the return type is bool, but the return type is more than that. Hence my preference for GetSamplingResult because we are returning SamplingResult.

@cijothomas
Copy link
Contributor

  public enum ActivitySamplingResult
    {
        None,
        Propagate,
        Populate,
        PopulateAndRecord
    }

Naming is hard for me :) Populate and Record - could also be confusing....

@cijothomas
Copy link
Contributor

I'll keep sampling instead of DataReduction, as OpenTelemetry is using the term sampling.
https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#sampler

True. But is the goal to pick the best term, or to use the same words as OTel does? .NET already uses different terms at many places.
DataReduction may or may not be the better term over Sampling; I'd prefer to focus on that rather than to simply try to use the OTel terms.

I think .NET has been closely following OTel terminology. .NET has differences because picking OTel name would have conflicting with something already existing. (eg. Activity.IsAllDataRequested instead of Activity.IsRecording as Recorded already exists and means something else)

@cijothomas
Copy link
Contributor

Btw - Our goal is to get the ability to add tags from GetRequestedData callback, to Activity, if it gets created. This enables OTel .NET SDK to implement spec compliant Sampling, and help Liudmila/others who want to leverage this to add weightage to Activity as tag from Sampler. This is a critical issue.

Everything else including renaming is not blocker.

@noahfalk
Copy link
Member

noahfalk commented Aug 5, 2020

is the goal to pick the best term, or to use the same words as OTel does?

"Best" acknowledges that aligning to OTel isn't the only goal we have, which is accurate. However aligning to OTel is significant among those goals. It indicates a community of users agreed upon an industry standard term and using the same term that others use makes understanding more likely. There would need to be a compelling argument if OTel has a term they use in the same context and we are choosing not to use it. So far the only places we felt that burden had been met is if .NET already defined a different standard term and/or the name can't be changed due to back compat.

Monitoring solutions may want to use data reduction techniques other than sampling. They may preferentially treat traces that originated at a particular upstream component, traces that were marked in some special way by some upstream component, or generally "filter" in some non-purely=probabilistic way.

Although sampling is commonly implied to be probabilistic the term does have broader usage. Sampling can also be
stratified, non-random and/or non-representative. Everything you mentioned above was still a form of sampling.

@noahfalk
Copy link
Member

noahfalk commented Aug 5, 2020

I'm flexible, but given all the options I've seen so far this is what I would pick + rationale:

    public enum ActivitySamplingResult
    {
        None,
        PropagationData,
        AllData,
        AllDataAndRecorded
    }

    public delegate ActivitySamplingResult SampleActivity<T>(ref ActivityCreationOptions<T> options);

    public SampleActivity<string>? SampleUsingParentId { get; set; }
    public SampleActivity<ActivityContext>? Sample { get; set; }

Rationale:

  1. The alternate names for the ActivitySamplingResults enumerants didn't feel clearly better so I favored not making a change. Also the IsAllDataRequested property aligns with "AllData" and "AllDataAndRecorded". Changing those enumerants would have continued to cascade making the change scope yet larger.
  2. Sample vs. GetSamplingResult vs. ShouldSample - Agreed with arguments above that "ShouldXYZ" implies the return type is a boolean. 'Sample' seemed even simpler than our previous proposal 'GetSamplingResult' but I thought GetSamplingResult was just fine too.
  3. Sampling vs. DataReduction - Both terms are accurate. Sampling does include connotations of randomness, but I consider it the concensus term chosen across the industry and specifically by OTel and value that trait far higher.
  4. GetSamplingResultUsingParentId vs. ParseCustomParentId - Parse assumes that the result can be represented as an ActivityContext but that isn't the case for a hierarchial id.
  5. Not including sampler CustomProperties dictionary - this is a late change so I agree with @tarekgh that we should set the bar at:
    a) Changes that can never be done in the future if they aren't done right now (naming or eliminating APIs)
    b) Changes that resolve OpenTelemetry blocking issues
    Trying to minimize the total number of changes over time (because we are adding some stuff now, adding more stuff later) is a non-goal. Instead we should explicitly favor adding things later if it is possible to do so because it minimizes risk that we prematurely make an irreversible API change. Minimal time for feedback and experimentation make it more likely we would discover a reason to regret our choice afterwards and simultaneously be unable to revert it.

@CodeBlanch
Copy link
Contributor

Should we add a little more detail somewhere explaining how the callbacks are used? Something like:

Before:

        /// <summary>
        /// Set or get the callback used to decide if want to listen to <see cref="Activity"/> objects events which created using <see cref="ActivitySource"/> object.
        /// </summary>
        public Func<ActivitySource, bool>? ShouldListenTo { get; set; }

After:

        /// <summary>
        /// Set or get the callback used to decide if want to listen to <see cref="Activity"/> objects events which created using <see cref="ActivitySource"/> object.
        /// </summary>
        /// <remarks>
        /// <see cref="Activity"> objects created via the selected <see cref="ActivitySource"/> objects will be run through the sampling callbacks (<see cref="Sample"/> &amp; <see cref="SampleUsingParentId"/>). Only <see cref="Activity"/> objects sampled (<see cref="ActivitySamplingResult.PropagationData"/>, <see cref="ActivitySamplingResult.AllData"/>, or <see cref="ActivitySamplingResult.AllDataAndRecorded"/>) will trigger <see cref="ActivityStarted"/> and <see cref="ActivityStopped"/> events.
        /// </summary>
        public Func<ActivitySource, bool>? ShouldListenTo { get; set; }

Just trying to nudge people into the pit of success 😉

@tarekgh
Copy link
Member Author

tarekgh commented Aug 5, 2020

Thanks all for your discussion and suggestions. I have updated the proposal in the top with what we have concluded. I'll go ahead and start the process to get this approved and implemented.

@tarekgh tarekgh added api-ready-for-review API is ready for review, it is NOT ready for implementation blocked Issue/PR is blocked on something - see comments and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Aug 5, 2020
@noahfalk
Copy link
Member

noahfalk commented Aug 6, 2020

Should we add a little more detail somewhere explaining how the callbacks are used?

I like the idea of using remarks in ShouldListenTo to guide people that Sample will probably be important for them as well : )

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation blocked Issue/PR is blocked on something - see comments labels Aug 6, 2020
@terrajobst
Copy link
Member

terrajobst commented Aug 6, 2020

Video

  • What happens when you remove the readonly specifier? Does this potentially create defensive copies.
  • If the only reason is lazy initialization it seems better keep it readonly and use Unsafe and initialize
 namespace System.Diagnostics
 {
     public readonly struct ActivityCreationOptions<T>
     {
+        public ActivityTagsCollection SamplingTags { get; }
+        public ActivityTraceId TraceId { get; }
     }
     public sealed class ActivityListener : IDisposable
     {
-        public bool AutoGenerateRootContextTraceId { get; set;}
         // Renames:
-        public GetRequestedData<string>? GetRequestedDataUsingParentId { get; set; }
-        public GetRequestedData<ActivityContext>? GetRequestedDataUsingContext { get; set; }
+        public SampleActivity<string>? SampleUsingParentId { get; set; }
+        public SampleActivity<ActivityContext>? Sample { get; set; }
     }
-    public delegate ActivityDataRequest GetRequestedData<T>(ref ActivityCreationOptions<T> options);
+    public delegate ActivitySamplingResult SampleActivity<T>(ref ActivityCreationOptions<T> options);
}

@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Diagnostics.Tracing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants