Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[idea] Amount of magic for publishing metrics #209

Open
allada opened this issue Jul 19, 2023 · 7 comments
Open

[idea] Amount of magic for publishing metrics #209

allada opened this issue Jul 19, 2023 · 7 comments

Comments

@allada
Copy link
Member

allada commented Jul 19, 2023

Looking for feedback on how automatical publishing metrics should be?

Right now I have 2 approaches, a procedural, but verbose one and a magical macro one.

Here's the syntax for the two:

// FilesystemStore example.
impl<Fe: FileEntry> MetricsComponent for FilesystemStore<Fe> {
    fn gather_metrics(&self, c: &mut CollectorState) {
        c.publish(
            "read_buff_size",
            self.read_buffer_size,
            "Size of the configured read buffer size",
        );
        c.publish(
            "active_drop_spawns",
            &self.shared_context.active_drop_spawns,
            "Number of active drop spawns",
        );
        c.publish(
            "temp_path",
            &self.shared_context.temp_path,
            "Path to the configured temp path",
        );
        c.publish(
            "content_path",
            &self.shared_context.content_path,
            "Path to the configured content path",
        );
        c.publish("evicting_map", &self.evicting_map, "");
    }
}

// VerifyStore example.
impl MetricsComponent for VerifyStore {
    fn gather_metrics(&self, c: &mut CollectorState) {
        c.publish(
            "verify_size",
            self.verify_size,
            "If the verification store is verifying the size of the data",
        );
        c.publish(
            "verify_hash",
            self.verify_hash,
            "If the verification store is verifying the hash of the data",
        );
        c.publish(
            "size_verification_failures",
            &self.size_verification_failures,
            "Number of failures the verification store had due to size mismatches",
        );
        c.publish(
            "hash_verification_failures",
            &self.hash_verification_failures,
            "Number of failures the verification store had due to hash mismatches",
        );
    }
}

// MemoryStore example.
impl MetricsComponent for MemoryStore {
    fn gather_metrics(&self, c: &mut CollectorState) {
        c.publish("evicting_map", &self.evicting_map, "");
    }
}

The macro version would look like this:

// FilesystemStore example.
publish_metrics! {
    FilesystemStore<Fe> {
        evicting_map,
        read_buff_size "Size of the configured read buffer size" Bytes,
        shared_context {
            active_drop_spawns "Number of active drop spawns",
            temp_path "Path to the configured temp path",
            content_path "Path to the configured content path",
        }
    }
}

// VerifyStore example.
publish_metrics! {
    VerifyStore {
        verify_size "If the verification store is verifying the size of the data",
        verify_hash "If the verification store is verifying the hash of the data",
        size_verification_failures "Number of failures the verification store had due to size mismatches",
        hash_verification_failures "Number of failures the verification store had due to hash mismatches",
    }
}

// MemoryStore example.
publish_metrics! {
    MemoryStore {
        evicting_map,
    }
}

These two examples would do 100% identical things, except the macro one would also make it easy to denote the type (which is a bit tricky to do with the procedural one.

Thoughts?

@allada
Copy link
Member Author

allada commented Jul 19, 2023

cc: @chrisstaite-menlo, @aaronmondal

@chrisstaite-menlo
Copy link
Collaborator

Oh, I do love the macros...

@chrisstaite-menlo
Copy link
Collaborator

chrisstaite-menlo commented Jul 19, 2023

Although, it is going to be confusing because of the "magic" self... I think on second thoughts could you go half way...

impl<Fe: FileEntry> MetricsComponent for FilesystemStore<Fe> {
    fn gather_metrics(&self, c: &mut CollectorState) {
        publish_metrics!(
            c,
            {
                self.read_buffer_size "Size of the configured read buffer size",
                self.shared_context.active_drop_spawns "Number of active drop spawns"
            }
         );
      }
}

@allada
Copy link
Member Author

allada commented Jul 19, 2023

If I was to go that far, I'd use something like:

impl<Fe: FileEntry> MetricsComponent for FilesystemStore<Fe> {
    fn gather_metrics(&self, c: &mut CollectorState) {
        publish!(c, read_buffer_size, "Size of the configured read buffer size");
        publish!(c, active_drop_spawns, "Number of active drop spawns");
      }
}

This would allow you to also do your custom stuff when needed.

@allada
Copy link
Member Author

allada commented Jul 19, 2023

Keep in mind though... This won't be a normal macro, this will require a procedural macro, which is way more difficult to understand, because it rewrites the AST tree.

I also considered something like this:

#[metrics]
pub struct SharedContext {
    // Used in testing to know how many active drop() spawns are running.
    // TODO(allada) It is probably a good idea to use a spin lock during
    // destruction of the store to ensure that all files are actually
    // deleted (similar to how it is done in tests).
    #[metric("Number of active drop spawns")]
    pub active_drop_spawns: AtomicU64,

    #[metric("Path to the configured temp path")]
    temp_path: String,

    #[metric("Path to the configured content path")]
    content_path: String,
}

#[metrics]
pub struct FilesystemStore<Fe: FileEntry = FileEntryImpl> {
    #[metric]
    evicting_map: EvictingMap<Arc<Fe>, SystemTime>,

    #[metric]
    shared_context: Arc<SharedContext>,

    #[metric("Size of the configured read buffer size")]
    read_buffer_size: usize,
}

I really like this idea, but it makes it incredibly difficult to do some edge case management.

@allada
Copy link
Member Author

allada commented Jul 19, 2023

I just went down a rabbit hole that I need to get out of... I have so many ideas on what to do with metrics and now I want to write a metrics library.

@aaronmondal
Copy link
Member

Hmm hard question 😅 I wonder how complex a procedural macro publish_metrics actually is. If it roughly rewrites Option 2 to something like Option 1, it might be on the simpler side where we could get away with more-or-less straightforward templating. Then again, that's basically the same thing as implementing the traits explicitly.

From another angle, could a custom serde serializer/deserializer impl be useful here? With the container attributes that might allow behavior somewhat similar to the #[metric] approach. https://serde.rs/impl-serialize.html

@allada allada changed the title Amount of magic for publishing metrics [idea] Amount of magic for publishing metrics Sep 30, 2023
allada added a commit to allada/nativelink-fork that referenced this issue Jul 25, 2024
Metrics got an entire overhaul. Instead of relying on a broken
prometheus library to publish our metrics, we now use the
`tracing` library and with OpenTelemetry that we bind together
then publish into a prometheus library.

Metrics are now mostly derive-macros. This means that the struct
can express what it wants to export and a help text. The library
will choose if it is able to export it.

Tracing now works by calling `.publish()` on the parent structs,
those structs need to call `.publish()` on all the child members
it wishes to publish data about. If a "group" is requested, use
the `group!()` macro, which under-the-hood calls `tracing::span`
with some special labels. At primitive layers, it will call the
`publish!()` macro, which will call `tracing::event!()` macro
under-the-hood with some special fields set. A custom
`tracing::Subscriber` will intercept all the events and spans
and convert them into a json-like object. This object can then
be exported as real json or encoded into other formats like
otel/prometheus.

closes: TraceMachina#1164, TraceMachina#650, TraceMachina#384, TraceMachina#209
towards: TraceMachina#206
allada added a commit to allada/nativelink-fork that referenced this issue Jul 25, 2024
Metrics got an entire overhaul. Instead of relying on a broken
prometheus library to publish our metrics, we now use the
`tracing` library and with OpenTelemetry that we bind together
then publish into a prometheus library.

Metrics are now mostly derive-macros. This means that the struct
can express what it wants to export and a help text. The library
will choose if it is able to export it.

Tracing now works by calling `.publish()` on the parent structs,
those structs need to call `.publish()` on all the child members
it wishes to publish data about. If a "group" is requested, use
the `group!()` macro, which under-the-hood calls `tracing::span`
with some special labels. At primitive layers, it will call the
`publish!()` macro, which will call `tracing::event!()` macro
under-the-hood with some special fields set. A custom
`tracing::Subscriber` will intercept all the events and spans
and convert them into a json-like object. This object can then
be exported as real json or encoded into other formats like
otel/prometheus.

closes: TraceMachina#1164, TraceMachina#650, TraceMachina#384, TraceMachina#209
towards: TraceMachina#206
allada added a commit to allada/nativelink-fork that referenced this issue Jul 25, 2024
Metrics got an entire overhaul. Instead of relying on a broken
prometheus library to publish our metrics, we now use the
`tracing` library and with OpenTelemetry that we bind together
then publish into a prometheus library.

Metrics are now mostly derive-macros. This means that the struct
can express what it wants to export and a help text. The library
will choose if it is able to export it.

Tracing now works by calling `.publish()` on the parent structs,
those structs need to call `.publish()` on all the child members
it wishes to publish data about. If a "group" is requested, use
the `group!()` macro, which under-the-hood calls `tracing::span`
with some special labels. At primitive layers, it will call the
`publish!()` macro, which will call `tracing::event!()` macro
under-the-hood with some special fields set. A custom
`tracing::Subscriber` will intercept all the events and spans
and convert them into a json-like object. This object can then
be exported as real json or encoded into other formats like
otel/prometheus.

closes: TraceMachina#1164, TraceMachina#650, TraceMachina#384, TraceMachina#209
towards: TraceMachina#206
allada added a commit to allada/nativelink-fork that referenced this issue Jul 25, 2024
Metrics got an entire overhaul. Instead of relying on a broken
prometheus library to publish our metrics, we now use the
`tracing` library and with OpenTelemetry that we bind together
then publish into a prometheus library.

Metrics are now mostly derive-macros. This means that the struct
can express what it wants to export and a help text. The library
will choose if it is able to export it.

Tracing now works by calling `.publish()` on the parent structs,
those structs need to call `.publish()` on all the child members
it wishes to publish data about. If a "group" is requested, use
the `group!()` macro, which under-the-hood calls `tracing::span`
with some special labels. At primitive layers, it will call the
`publish!()` macro, which will call `tracing::event!()` macro
under-the-hood with some special fields set. A custom
`tracing::Subscriber` will intercept all the events and spans
and convert them into a json-like object. This object can then
be exported as real json or encoded into other formats like
otel/prometheus.

closes: TraceMachina#1164, TraceMachina#650, TraceMachina#384, TraceMachina#209
towards: TraceMachina#206
allada added a commit to allada/nativelink-fork that referenced this issue Jul 25, 2024
Metrics got an entire overhaul. Instead of relying on a broken
prometheus library to publish our metrics, we now use the
`tracing` library and with OpenTelemetry that we bind together
then publish into a prometheus library.

Metrics are now mostly derive-macros. This means that the struct
can express what it wants to export and a help text. The library
will choose if it is able to export it.

Tracing now works by calling `.publish()` on the parent structs,
those structs need to call `.publish()` on all the child members
it wishes to publish data about. If a "group" is requested, use
the `group!()` macro, which under-the-hood calls `tracing::span`
with some special labels. At primitive layers, it will call the
`publish!()` macro, which will call `tracing::event!()` macro
under-the-hood with some special fields set. A custom
`tracing::Subscriber` will intercept all the events and spans
and convert them into a json-like object. This object can then
be exported as real json or encoded into other formats like
otel/prometheus.

closes: TraceMachina#1164, TraceMachina#650, TraceMachina#384, TraceMachina#209
towards: TraceMachina#206
allada added a commit to allada/nativelink-fork that referenced this issue Jul 25, 2024
Metrics got an entire overhaul. Instead of relying on a broken
prometheus library to publish our metrics, we now use the
`tracing` library and with OpenTelemetry that we bind together
then publish into a prometheus library.

Metrics are now mostly derive-macros. This means that the struct
can express what it wants to export and a help text. The library
will choose if it is able to export it.

Tracing now works by calling `.publish()` on the parent structs,
those structs need to call `.publish()` on all the child members
it wishes to publish data about. If a "group" is requested, use
the `group!()` macro, which under-the-hood calls `tracing::span`
with some special labels. At primitive layers, it will call the
`publish!()` macro, which will call `tracing::event!()` macro
under-the-hood with some special fields set. A custom
`tracing::Subscriber` will intercept all the events and spans
and convert them into a json-like object. This object can then
be exported as real json or encoded into other formats like
otel/prometheus.

closes: TraceMachina#1164, TraceMachina#650, TraceMachina#384, TraceMachina#209
towards: TraceMachina#206
allada added a commit to allada/nativelink-fork that referenced this issue Jul 25, 2024
Metrics got an entire overhaul. Instead of relying on a broken
prometheus library to publish our metrics, we now use the
`tracing` library and with OpenTelemetry that we bind together
then publish into a prometheus library.

Metrics are now mostly derive-macros. This means that the struct
can express what it wants to export and a help text. The library
will choose if it is able to export it.

Tracing now works by calling `.publish()` on the parent structs,
those structs need to call `.publish()` on all the child members
it wishes to publish data about. If a "group" is requested, use
the `group!()` macro, which under-the-hood calls `tracing::span`
with some special labels. At primitive layers, it will call the
`publish!()` macro, which will call `tracing::event!()` macro
under-the-hood with some special fields set. A custom
`tracing::Subscriber` will intercept all the events and spans
and convert them into a json-like object. This object can then
be exported as real json or encoded into other formats like
otel/prometheus.

closes: TraceMachina#1164, TraceMachina#650, TraceMachina#384, TraceMachina#209
towards: TraceMachina#206
allada added a commit to allada/nativelink-fork that referenced this issue Jul 26, 2024
Metrics got an entire overhaul. Instead of relying on a broken
prometheus library to publish our metrics, we now use the
`tracing` library and with OpenTelemetry that we bind together
then publish into a prometheus library.

Metrics are now mostly derive-macros. This means that the struct
can express what it wants to export and a help text. The library
will choose if it is able to export it.

Tracing now works by calling `.publish()` on the parent structs,
those structs need to call `.publish()` on all the child members
it wishes to publish data about. If a "group" is requested, use
the `group!()` macro, which under-the-hood calls `tracing::span`
with some special labels. At primitive layers, it will call the
`publish!()` macro, which will call `tracing::event!()` macro
under-the-hood with some special fields set. A custom
`tracing::Subscriber` will intercept all the events and spans
and convert them into a json-like object. This object can then
be exported as real json or encoded into other formats like
otel/prometheus.

closes: TraceMachina#1164, TraceMachina#650, TraceMachina#384, TraceMachina#209
towards: TraceMachina#206
allada added a commit to allada/nativelink-fork that referenced this issue Jul 26, 2024
Metrics got an entire overhaul. Instead of relying on a broken
prometheus library to publish our metrics, we now use the
`tracing` library and with OpenTelemetry that we bind together
then publish into a prometheus library.

Metrics are now mostly derive-macros. This means that the struct
can express what it wants to export and a help text. The library
will choose if it is able to export it.

Tracing now works by calling `.publish()` on the parent structs,
those structs need to call `.publish()` on all the child members
it wishes to publish data about. If a "group" is requested, use
the `group!()` macro, which under-the-hood calls `tracing::span`
with some special labels. At primitive layers, it will call the
`publish!()` macro, which will call `tracing::event!()` macro
under-the-hood with some special fields set. A custom
`tracing::Subscriber` will intercept all the events and spans
and convert them into a json-like object. This object can then
be exported as real json or encoded into other formats like
otel/prometheus.

closes: TraceMachina#1164, TraceMachina#650, TraceMachina#384, TraceMachina#209
towards: TraceMachina#206
allada added a commit to allada/nativelink-fork that referenced this issue Jul 26, 2024
Metrics got an entire overhaul. Instead of relying on a broken
prometheus library to publish our metrics, we now use the
`tracing` library and with OpenTelemetry that we bind together
then publish into a prometheus library.

Metrics are now mostly derive-macros. This means that the struct
can express what it wants to export and a help text. The library
will choose if it is able to export it.

Tracing now works by calling `.publish()` on the parent structs,
those structs need to call `.publish()` on all the child members
it wishes to publish data about. If a "group" is requested, use
the `group!()` macro, which under-the-hood calls `tracing::span`
with some special labels. At primitive layers, it will call the
`publish!()` macro, which will call `tracing::event!()` macro
under-the-hood with some special fields set. A custom
`tracing::Subscriber` will intercept all the events and spans
and convert them into a json-like object. This object can then
be exported as real json or encoded into other formats like
otel/prometheus.

closes: TraceMachina#1164, TraceMachina#650, TraceMachina#384, TraceMachina#209
towards: TraceMachina#206
allada added a commit to allada/nativelink-fork that referenced this issue Jul 26, 2024
Metrics got an entire overhaul. Instead of relying on a broken
prometheus library to publish our metrics, we now use the
`tracing` library and with OpenTelemetry that we bind together
then publish into a prometheus library.

Metrics are now mostly derive-macros. This means that the struct
can express what it wants to export and a help text. The library
will choose if it is able to export it.

Tracing now works by calling `.publish()` on the parent structs,
those structs need to call `.publish()` on all the child members
it wishes to publish data about. If a "group" is requested, use
the `group!()` macro, which under-the-hood calls `tracing::span`
with some special labels. At primitive layers, it will call the
`publish!()` macro, which will call `tracing::event!()` macro
under-the-hood with some special fields set. A custom
`tracing::Subscriber` will intercept all the events and spans
and convert them into a json-like object. This object can then
be exported as real json or encoded into other formats like
otel/prometheus.

closes: TraceMachina#1164, TraceMachina#650, TraceMachina#384, TraceMachina#209
towards: TraceMachina#206
allada added a commit to allada/nativelink-fork that referenced this issue Jul 26, 2024
Metrics got an entire overhaul. Instead of relying on a broken
prometheus library to publish our metrics, we now use the
`tracing` library and with OpenTelemetry that we bind together
then publish into a prometheus library.

Metrics are now mostly derive-macros. This means that the struct
can express what it wants to export and a help text. The library
will choose if it is able to export it.

Tracing now works by calling `.publish()` on the parent structs,
those structs need to call `.publish()` on all the child members
it wishes to publish data about. If a "group" is requested, use
the `group!()` macro, which under-the-hood calls `tracing::span`
with some special labels. At primitive layers, it will call the
`publish!()` macro, which will call `tracing::event!()` macro
under-the-hood with some special fields set. A custom
`tracing::Subscriber` will intercept all the events and spans
and convert them into a json-like object. This object can then
be exported as real json or encoded into other formats like
otel/prometheus.

closes: TraceMachina#1164, TraceMachina#650, TraceMachina#384, TraceMachina#209
towards: TraceMachina#206
allada added a commit to allada/nativelink-fork that referenced this issue Jul 26, 2024
Metrics got an entire overhaul. Instead of relying on a broken
prometheus library to publish our metrics, we now use the
`tracing` library and with OpenTelemetry that we bind together
then publish into a prometheus library.

Metrics are now mostly derive-macros. This means that the struct
can express what it wants to export and a help text. The library
will choose if it is able to export it.

Tracing now works by calling `.publish()` on the parent structs,
those structs need to call `.publish()` on all the child members
it wishes to publish data about. If a "group" is requested, use
the `group!()` macro, which under-the-hood calls `tracing::span`
with some special labels. At primitive layers, it will call the
`publish!()` macro, which will call `tracing::event!()` macro
under-the-hood with some special fields set. A custom
`tracing::Subscriber` will intercept all the events and spans
and convert them into a json-like object. This object can then
be exported as real json or encoded into other formats like
otel/prometheus.

closes: TraceMachina#1164, TraceMachina#650, TraceMachina#384, TraceMachina#209
towards: TraceMachina#206
allada added a commit to allada/nativelink-fork that referenced this issue Jul 26, 2024
Metrics got an entire overhaul. Instead of relying on a broken
prometheus library to publish our metrics, we now use the
`tracing` library and with OpenTelemetry that we bind together
then publish into a prometheus library.

Metrics are now mostly derive-macros. This means that the struct
can express what it wants to export and a help text. The library
will choose if it is able to export it.

Tracing now works by calling `.publish()` on the parent structs,
those structs need to call `.publish()` on all the child members
it wishes to publish data about. If a "group" is requested, use
the `group!()` macro, which under-the-hood calls `tracing::span`
with some special labels. At primitive layers, it will call the
`publish!()` macro, which will call `tracing::event!()` macro
under-the-hood with some special fields set. A custom
`tracing::Subscriber` will intercept all the events and spans
and convert them into a json-like object. This object can then
be exported as real json or encoded into other formats like
otel/prometheus.

closes: TraceMachina#1164, TraceMachina#650, TraceMachina#384, TraceMachina#209
towards: TraceMachina#206
allada added a commit that referenced this issue Jul 26, 2024
Metrics got an entire overhaul. Instead of relying on a broken
prometheus library to publish our metrics, we now use the
`tracing` library and with OpenTelemetry that we bind together
then publish into a prometheus library.

Metrics are now mostly derive-macros. This means that the struct
can express what it wants to export and a help text. The library
will choose if it is able to export it.

Tracing now works by calling `.publish()` on the parent structs,
those structs need to call `.publish()` on all the child members
it wishes to publish data about. If a "group" is requested, use
the `group!()` macro, which under-the-hood calls `tracing::span`
with some special labels. At primitive layers, it will call the
`publish!()` macro, which will call `tracing::event!()` macro
under-the-hood with some special fields set. A custom
`tracing::Subscriber` will intercept all the events and spans
and convert them into a json-like object. This object can then
be exported as real json or encoded into other formats like
otel/prometheus.

closes: #1164, #650, #384, #209
towards: #206
zbirenbaum pushed a commit to zbirenbaum/nativelink that referenced this issue Jul 27, 2024
Metrics got an entire overhaul. Instead of relying on a broken
prometheus library to publish our metrics, we now use the
`tracing` library and with OpenTelemetry that we bind together
then publish into a prometheus library.

Metrics are now mostly derive-macros. This means that the struct
can express what it wants to export and a help text. The library
will choose if it is able to export it.

Tracing now works by calling `.publish()` on the parent structs,
those structs need to call `.publish()` on all the child members
it wishes to publish data about. If a "group" is requested, use
the `group!()` macro, which under-the-hood calls `tracing::span`
with some special labels. At primitive layers, it will call the
`publish!()` macro, which will call `tracing::event!()` macro
under-the-hood with some special fields set. A custom
`tracing::Subscriber` will intercept all the events and spans
and convert them into a json-like object. This object can then
be exported as real json or encoded into other formats like
otel/prometheus.

closes: TraceMachina#1164, TraceMachina#650, TraceMachina#384, TraceMachina#209
towards: TraceMachina#206
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants