Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics #1102

Merged
43 commits merged into from Jun 24, 2021
Merged

Metrics #1102

43 commits merged into from Jun 24, 2021

Conversation

ghost
Copy link

@ghost ghost commented May 17, 2021

Summary of changes

Changes introduced in this pull request:

  • Adds the initial set of metrics for forest
  • Provides a simple way to set up the metrics stack (prometheus & grafana) using Docker

Reference issue to close (if applicable)

Closes #1096 & #1097

Other information and links

@ghost ghost marked this pull request as ready for review May 18, 2021 18:16
@cryptoquick
Copy link
Contributor

Nice! This looks like it measures block time and a few other things. How would you recommend I run it?

@ghost
Copy link
Author

ghost commented May 19, 2021

@cryptoquick At the moment those metrics are collected but are not viewable. Working to make them viewable today.

@cryptoquick
Copy link
Contributor

Okay, and this works without prometheus running, correct?

@ghost
Copy link
Author

ghost commented May 19, 2021

@cryptoquick that's correct

Copy link
Contributor

@cryptoquick cryptoquick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got an error when trying to run this:
thread 'async-std/runtime' panicked at 'calledResult::unwrap()on anErrvalue: InconsistentCardinality { expect: 7, got: 1 }', /home/hunter/.cargo/registry/src/github.com-1ecc6299db9ec823/prometheus-0.12.0/src/vec.rs:258:49
Screenshot from 2021-05-19 19-02-34

@cryptoquick
Copy link
Contributor

Also, you're missing the following metric as defined in the issue ACs:

  • Gauge of RocksDB size on disk

Maybe try measuring this after every flush:
https://doc.rust-lang.org/std/os/unix/fs/trait.MetadataExt.html#tymethod.size

@ghost
Copy link
Author

ghost commented May 20, 2021

@cryptoquick I didn't realize that moving this from draft to open would auto request review, that's my bad. I still have much to do, including the items you mentioned. I don't see a button that will move this from open -> draft again.

@cryptoquick
Copy link
Contributor

Oh, gotcha! Be sure to let us know when it's RFR. Also LMK if you think I could help in some way!

.circleci/config.yml Outdated Show resolved Hide resolved
@ec2
Copy link
Member

ec2 commented Jun 17, 2021

i know this isnt R4R yet, but you've incorrectly grouped Libp2p metrics (eg HELLO_REQUEST) as Gossip metrics.

Copy link
Contributor

@cryptoquick cryptoquick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just found one obvious typo.

I'm able to compile this and it runs fine, but I'm not able to test it on my laptop because, although I have docker configured on my desktop, I'm reluctant to do similar on my laptop, so my feedback on this will be somewhat limited for now.

blockchain/chain_sync/src/chain_muxer.rs Outdated Show resolved Hide resolved
@ec2
Copy link
Member

ec2 commented Jun 22, 2021

It LGTM. Im just setting up Docker on BigChungus right now to see this in live action

Copy link
Contributor

@cryptoquick cryptoquick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feedback I've provided has been addressed. @ec2, be sure to let us know how your Docker deploy works out.

Copy link
Member

@ec2 ec2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested out with the new metrics. Looks perfect!

Copy link
Contributor

@creativcoder creativcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! The only thing I felt is if Metrics could be global so we wouldn't have to pass it around everywhere. Alternatively, it could be a trait with default methods, which structs could then implement.

I tried running docker compose on aws machine but it fails. Posted in detail on slack.

@ghost
Copy link
Author

ghost commented Jun 23, 2021

Looks great! The only thing I felt is if Metrics could be global so we wouldn't have to pass it around everywhere. Alternatively, it could be a trait with default methods, which structs could then implement.

I tried running docker compose on aws machine but it fails. Posted in detail on slack.

This was an issue with the docker installation on Rahul's machine. I've updated the documentation to make installation easier.

Histogram, HistogramOpts,
};

lazy_static! {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kudos for making this a singleton! One more suggestion, why don't we try once_cell: async-rs/async-std#406 as people speak highly of it compared to lazy_static.

Copy link
Author

@ghost ghost Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I looked into it, but the stdlib impl is only available in nightly and despite that, it's a container type that would require every routine that needs the OnceCell<T> to call a getter method to retrieve the inner T.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all ears if you understand the benefits we might get from adding that getter every time we collect a metric.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, the benefits would be:

  • Lack of macros
  • Faster hot get
    I am not sure if that is enough for us to switch over at this point for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it faster than accessing a global static variable?

Is the concern with macros the incremental increase in compilation cost?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, you don't need to use the OnceCell container. Instead you can use: https://docs.rs/once_cell/1.8.0/once_cell/#lazy-initialized-global-data. Anyways if you feel the refactor is quite complex, then lazy_static it is! As for perf improvements, we will have to measure to see the benefits in context with forest's access patterns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why its faster on hot get. That's just what someone said. But i dont think its significant.
The concern with macros is increased compilation cost as well as the fact that there is code being generated. That being said, those are "meh" concerns for me... I'm ok with lazy_static. We use lazy_static in other parts of the codebase

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t feel enticed, seems more fashion than function

@ghost ghost merged commit 99fa386 into main Jun 24, 2021
@ghost ghost deleted the metrics branch June 24, 2021 15:12
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create Prometheus Server & Grafana Stack
3 participants