Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New telemetry isn't user friendly #129

Closed
SophisticaSean opened this issue Jul 28, 2021 · 8 comments
Closed

New telemetry isn't user friendly #129

SophisticaSean opened this issue Jul 28, 2021 · 8 comments

Comments

@SophisticaSean
Copy link

Comparing my fork here: https://github.com/SophisticaSean/nebulex/tree/sophisticasean/support-before-after-blocks-in-decorator

In that fork, it was easy to attach to all cache events in your elixir app like such:

lib/myapp/application.ex

 # attach to Nebulex tracing
    :ok =
      :telemetry.attach(
        "nebulex-tracer-myapp-start",
        [:nebulex, :start],
        &myapp.NebulexTelemetry.handle_event/4,
        nil
      )

    :ok =
      :telemetry.attach(
        "nebulex-tracer-myapp-end",
        [:nebulex, :end],
        &myapp.NebulexTelemetry.handle_event/4,
        nil
      )

    :ok =
      :telemetry.attach(
        "nebulex-decorate-tracer-myapp-start",
        [:nebulex, :decorate, :start],
        &myapp.NebulexTelemetry.handle_event/4,
        nil
      )

    :ok =
      :telemetry.attach(
        "nebulex-decorate-tracer-myapp-end",
        [:nebulex, :decorate, :end],
        &myapp.NebulexTelemetry.handle_event/4,
        nil
      )

    :ok =
      :telemetry.attach(
        "nebulex-rpc-tracer-myapp-start",
        [:nebulex, :rpc, :start],
        &myapp.NebulexTelemetry.handle_event/4,
        nil
      )

    :ok =
      :telemetry.attach(
        "nebulex-rpc-tracer-myapp-end",
        [:nebulex, :rpc, :end],
        &myapp.NebulexTelemetry.handle_event/4,
        nil
      )

Now, with the changes in master, I have to setup telemetry for every cache, 3 or 4 times if I want to include init event:

lib/myapp/application.ex

    :ok =
      :telemetry.attach(
        "nebulex-users-start",
        [:myapp, :caches, :users_cache, :command, :start],
        &myapp.NebulexTelemetry.handle_event/4,
        nil
      )

    :ok =
      :telemetry.attach(
        "nebulex-users-stop",
        [:myapp, :caches, :users_cache, :command, :stop],
        &myapp.NebulexTelemetry.handle_event/4,
        nil
      )

    :ok =
      :telemetry.attach(
        "nebulex-users-exception",
        [:myapp, :caches, :users_cache, :command, :exception],
        &myapp.NebulexTelemetry.handle_event/4,
        nil
      )

I already have 6 caches in this rather large app, which means the telemetry attachment is over 400 lines long and I have to remember to add all the events here anytime I add a new cache.

We're also missing events entirely. I want to get telemetry events (and my fork did this) for each adapter so I can plot traces like this:

Screen Shot 2021-07-28 at 4 13 53 PM

I'm missing which adapter the telemetry event is coming from. I'm also missing which macro we're using. It would be nice to see if its cacheable or cache_put for Nebulex.Caching module.

I'm happy to make these changes, but frankly you didn't provide feedback on my last MR that had all this data and had a better external API. So I'm a bit frustrated as I love this library but I would prefer you help me help you rather than taking a stab at the functionality you think I want.

@SophisticaSean
Copy link
Author

SophisticaSean commented Jul 28, 2021

The solution to problem number 1 is to not have the telemetry event be tied to the cache. You should have the telemetry event be tied to a deterministic, static Nebulex name and send which cache it came from as part of the event data.

This:

    :ok =
      :telemetry.attach(
        "nebulex-users-start",
        [:myapp, :caches, :users_cache, :command, :start],
        &myapp.NebulexTelemetry.handle_event/4,
        nil
      )

Should become this:

    :ok =
      :telemetry.attach(
        "nebulex-users-start",
        [:myapp, :caches, :command, :start],
        &myapp.NebulexTelemetry.handle_event/4,
        nil
      )

And then the cache the telemetry event belongs to would be passed into handle_event/4 as data.

@mjezew
Copy link

mjezew commented Jul 28, 2021

in the vein of user friendliness, some documentation feedback:

  1. there is no mention of the primary cache telemetry event
  2. it would be great to clarify that the telemetry event is the path to the cache (if this does not change)
  3. I'd love to see a stricter type than map on adapter_metadata

and +1 for changing the telemetry event names.
excited to see where this discussion goes! :)

@cabol
Copy link
Owner

cabol commented Jul 29, 2021

@SophisticaSean there are several things here. First of all about:

Now, with the changes in master, I have to setup telemetry for every cache, 3 or 4 times if I want to include init event:
You should have the telemetry event be tied to a deterministic, static Nebulex name and send which cache it came from as part of the event data.

Have you checked the option :telemetry_prefix (https://github.com/cabol/nebulex/blob/master/lib/nebulex/cache.ex#L38-L46)? Nebulex follows the same approach as Ecto, if you want a deterministic/static name you just need to configure the :telemetry_prefix for all your caches – The option :telemetry_prefix is documented in the docs.

We're also missing events entirely. I want to get telemetry events (and my fork did this) for each adapter so I can plot traces like this:

The events are at the cache level, you can get the events for each cache you are using, and the cache you know always what adapted is attached to. For a cache using a partitioned or replicated, you can get the events for the cache itself as well as the primary cached used underneath. And for the multi-level cache, you can get the events for each of the levels you're using. For example, if you have a partitioned cache with a telemetry prefix set to [:my_static_name], you will get:

  • For the partitioned cache itself: [:my_static_name, :command, :start], [:my_static_name, :command, :stop], [:my_static_name, :command, : exception].
  • For the underneath primary cache: [:my_static_name, :primary, :command, :start], [:my_static_name, :primary, :command, :stop], [:my_static_name, :primary, :command, : exception].

So, I agree that there is a lack of documentation in the adapters about this, as well as more examples with Telemetry, are missing, especially for the distributed/multi-level topologies, but the ability to get the events for each cache being used is there, all built-in adapters as well as the Redis and Cachex one support and emit these events.

I'm missing which adapter the telemetry event is coming from. I'm also missing which macro we're using. It would be nice to see if its cacheable or cache_put for Nebulex.Caching module.

I understand, but this is for me something more specific, besides, it is something that can be easily implemented on top of Nebulex if you need it. Anyhow, let's keep this feature request open to discussion, but it should be a separate issue.

I'm happy to make these changes, but frankly you didn't provide feedback on my last MR that had all this data and had a better external API. So I'm a bit frustrated as I love this library but I would prefer you help me help you rather than taking a stab at the functionality you think I want.

I actually gave you some feedback, if you check the PR, you will see I replied to your PR, and dropped several comments, and then I sent you also several update messages about the implementation I was working on at that time. Anyway, I think what you what is something like Spandex but for Nebulex, is that right? If so, one of the features I'd like to provide is something like spandex_ecto but for Nebulex, implementing spandex_nebulex. WDYT? Let me know your thoughts.

@cabol
Copy link
Owner

cabol commented Jul 29, 2021

Hey @mjezew !!

  1. there is no mention of the primary cache telemetry event

You're right, that documentation is missing. I will work on it!

  1. it would be great to clarify that the telemetry event is the path to the cache (if this does not change)

This is actually explained in the docs:

  1. I'd love to see a stricter type than map on adapter_metadata

I rather also strict types, but in this case is a bit tricky, because each adapter may have or add more attributes, then if we want to keep it flexible enough, we will end up having a field to the extra attributes being a map anyways, don't you think? – If you check the adapters, the adapter_meta has some common fields, but there are also specific fields. But if you have an idea, please share it so we can discuss it (in that case we can create a separate issue).

and +1 for changing the telemetry event names.

I think this can be achieved with the :telemetry_prefix option.

@SophisticaSean
Copy link
Author

Awesome. thanks for the heads up on :telemetry_prefix. That's all working now and as a bonus upgrading to master seems to have fixed this issue for me: #113 permanently.

I think something similar to spandex_ecto would be good, but we have to get the telemetry perfect inside of nebulex first.
How do you feel about this part:

I'm missing which adapter the telemetry event is coming from. I'm also missing which macro we're using. It would be nice to see if its cacheable or cache_put for Nebulex.Caching module.

Would love to see the adapter, as primary right now means different things depending on which cache that event comes from so it'd be nice to have adapter specific metadata available in the telemetry event.

@cabol
Copy link
Owner

cabol commented Aug 26, 2021

Hey @SophisticaSean 👋 ! Sorry for the late response, I took some time off but I'm back now 😄 !

About adding I'm missing which adapter the telemetry event is coming from, since the event has the adapter metadata, extracting the adapter from there should be easy to do, for example, when you define the metric you can use the tag_values:, like so:

Telemetry.Metrics.summary(
  "my_app.cache.command.stop.duration",
  unit: {:native, :millisecond},
  tags: [:cache, :adapter, :function_name],
  tag_values: &Map.merge(&1, %{cache: &1.adapter_meta.cache, adapter: &1.adapter_meta.cache.__adapter__})
)

Let me know if that works for you.

On the other hand, about I'm also missing which macro we're using. It would be nice to see if its cacheable or cache_put for Nebulex.Caching module, I'm not sure, but let me review it, will try to get back to you soon on this one.

@cabol
Copy link
Owner

cabol commented Aug 28, 2021

I've pushed some improvements for the Telemetry documentation on the adapters (4b10c5a). Basically, I added some docs to the adapters explaining better the primary cache events, also added in the Telemetry guide the example of how to include the adapter into the metrics tags, etc.

@cabol
Copy link
Owner

cabol commented Nov 13, 2021

Closing this issue since I haven't received any feedback a long time ago, feel free to reopen it or create a new one for reporting related issues or maybe new suggestions.

@cabol cabol closed this as completed Nov 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants