Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telemetry: publish events for each actor create / restart / terminate #6293

Closed
Aaronontheweb opened this issue Dec 6, 2022 Discussed in #5587 · 1 comment
Closed

Telemetry: publish events for each actor create / restart / terminate #6293

Aaronontheweb opened this issue Dec 6, 2022 Discussed in #5587 · 1 comment
Labels
akka.net v1.4 Issues affecting Akka.NET v1.4 akka-actor enhancement
Milestone

Comments

@Aaronontheweb
Copy link
Member

Discussed in #5587

Originally posted by Aaronontheweb February 7, 2022
Is your feature request related to a problem? Please describe.

I've been thinking about this in a few areas:

  1. Phobos - right now we use invocation counters to keep track of actor starts / stops to keep track the churn of actors. In platforms like Prometheus we compute the delta between start / stop events to compute the total number of alive actors. We can try using things like an OTel UpDownCounter (not yet implemented in .NET) when that gets added to reach a total number and maybe that will be more accurate.
  2. Petabridge.Cmd - similar to how we report on shard counts right now, I'd love to update the actor commands to show a breakdown of how many actors are alive by hierarchy position and type.
  3. Homespun telemetry solutions - right now most of these are done via the logging system or subscribing to various data channels inside Akka.NET (i.e. Cluster.Subscribe), but the actual structure of the actor hierarchy itself is difficult to capture here.

In particular, some of the work I've been doing around Phobos 2.0 has been leading me to wonder if it makes sense to emit some basic telemetry around actor creation inside Akka.NET itself.

Describe the solution you'd like
I'll have to flesh this out in some more detail, but basically add three EventStream events that can be subscribed to:

public readonly struct ActorStarted{
   public IActorRef Ref {get;} // can be used for topological sort / correlation
   public Type ActorType {get;}
}

public readonly struct ActorRestarted{
   public IActorRef Ref {get;} // can be used for topological sort / correlation
   public Type ActorType {get;}
   public Exception CrashReason {get;}
}

public readonly struct ActorTerminated{
   public IActorRef Ref {get;} // can be used for topological sort / correlation
   public Type ActorType {get;}
   public TerminatedReason Reason {get;} // stopped, parent decided, system shutting down
}

Having a TerminatedReason would be really helpful - implementing that alone would be a bit of work, but I think it's achievable.

@Aaronontheweb Aaronontheweb added enhancement akka-actor akka.net v1.4 Issues affecting Akka.NET v1.4 labels Dec 6, 2022
@Aaronontheweb Aaronontheweb added this to the 1.4.47 milestone Dec 6, 2022
Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this issue Dec 6, 2022
Aaronontheweb added a commit that referenced this issue Dec 8, 2022
* added initial actor telemetry for #6293

* added basic telemetry tests for local actors

* added spec to validate that `RemoteActorRef` doesn't influence counters

* updated `SpawnActorBenchmarks` to include telemetry impact

* converted telemetry events into `sealed class`es with `internal` constructors

* removed `Reason`
@Aaronontheweb
Copy link
Member Author

closed via #6294

Arkatufus pushed a commit to Arkatufus/akka.net that referenced this issue Dec 9, 2022
* added initial actor telemetry for akkadotnet#6293

* added basic telemetry tests for local actors

* added spec to validate that `RemoteActorRef` doesn't influence counters

* updated `SpawnActorBenchmarks` to include telemetry impact

* converted telemetry events into `sealed class`es with `internal` constructors

* removed `Reason`

(cherry picked from commit 7f68c48)
Aaronontheweb added a commit that referenced this issue Dec 9, 2022
* add simple actor telemetry (#6294)

* added initial actor telemetry for #6293

* added basic telemetry tests for local actors

* added spec to validate that `RemoteActorRef` doesn't influence counters

* updated `SpawnActorBenchmarks` to include telemetry impact

* converted telemetry events into `sealed class`es with `internal` constructors

* removed `Reason`

(cherry picked from commit 7f68c48)

* Update API Verify list

* Fix API verify list

* Fix API verify list

* fix API Verify list

Co-authored-by: Aaron Stannard <aaron@petabridge.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
akka.net v1.4 Issues affecting Akka.NET v1.4 akka-actor enhancement
Projects
None yet
Development

No branches or pull requests

1 participant