-
Notifications
You must be signed in to change notification settings - Fork 0
CP-1664: modify prometheus-cpp #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Expired metrics
No time taking under lock.
Less timestamping inside of locked code.
Conflicts: core/include/prometheus/detail/builder.h core/include/prometheus/detail/counter_builder.h core/include/prometheus/detail/histogram_builder.h core/include/prometheus/detail/summary_builder.h core/include/prometheus/family.h core/include/prometheus/registry.h core/src/counter.cc core/src/detail/counter_builder.cc core/src/detail/gauge_builder.cc core/src/detail/histogram_builder.cc core/src/detail/summary_builder.cc core/src/gauge.cc core/src/histogram.cc core/src/registry.cc core/src/summary.cc
…idbots/prometheus-cpp into CP-1664-modify-prometheus-cpp
| ClientMetric Collect() const; | ||
|
|
||
| private: | ||
| Gauge gauge_{0.0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code used other metrics to keep track of it's data. However, since the new code keeps track of its own retention time I have modified it to keep track of its own data to reduce duplication of the retention time variables.
|
|
||
| namespace prometheus { | ||
|
|
||
| enum class RetentionBehavior {Keep, Remove}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, I had this as part of the class definition of Family but I wanted to avoid templating.
| return Add(labels, detail::make_unique<T>(args...)); | ||
| std::shared_ptr<T> Add(const std::map<std::string, std::string>& labels, Args&&... args) { | ||
| return Add(labels, std::make_shared<T>(std::forward<Args>(args)...)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is hear to avoid forward declarations of Args. It is less efficient since it creates an object that may or may not get used. We could instead move the definition here but that would be messy.
| protected: | ||
| bool has_family_{false}; | ||
| std::atomic<double> retention_time_{1e9}; | ||
| std::atomic<std::time_t> last_update_{std::time(nullptr)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced we need these to be atomic, but it should be safer. Also, I tried to program it so that we could remove the atomic qualifier and the code would still compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you on this one. I think that atomic is worth having for these variables.
| bucket_counts_[bucket_index].Increment(); | ||
| sum_ = sum_ + value; | ||
| bucket_counts_[bucket_index] = bucket_counts_[bucket_index] + 1; | ||
| last_update_ = std::time(nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three lines are atomic operations:
- sum_ = x; is equiv to sum_.store(x);
- sum_ is equiv to sum_.load()
- ...
The operators +=, -= didn't seem to be supported by my compiler. Also, the use of this syntax instead of load and store allows us to interchange atomic variables with non-atomic variables.
core/src/summary.cc
Outdated
| sum_ += value; | ||
| quantile_values_.insert(value); | ||
| last_update_ = std::time(nullptr); | ||
| if (alert) AlertIfNoFamily(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AlertIfNoFamily => PrintAlerts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean if the alert has no family? Is it that the metric/pointer is scheduled for removal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, clearly you wouldn't understand since your family is awesome!
Joking aside, prometheus keeps a list of metrics (families) for publishing to sockets (which eventually get added to the database). Now that we have the ability to expire metrics (remove them from their family), we want to alert the user if and when they are trying to use a metric that will not get published.
core/tests/counter_test.cc
Outdated
| TEST(CounterTest, inc) { | ||
| Counter counter; | ||
| counter.Increment(); | ||
| counter.Increment(1, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't test default and suppress alerts. Could, if I brought back the Increment method with no arguments.
core/src/summary.cc
Outdated
| sum_ += value; | ||
| quantile_values_.insert(value); | ||
| last_update_ = std::time(nullptr); | ||
| if (alert) AlertIfNoFamily(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean if the alert has no family? Is it that the metric/pointer is scheduled for removal?
| std::vector<Counter> bucket_counts_; | ||
| Counter sum_; | ||
| std::atomic<double> sum_{0.0}; | ||
| std::vector<std::atomic<double>> bucket_counts_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this modified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously histogram used a series of counters to keep track of it's buckets and counters used guages to keep track of their values. This nesting of data structures proved troublesome when we now want to keep track of retention time for each metric (we don't want each bucket to have it's own retention time, and each counter in each bucket have it's own retention time >_<)
| protected: | ||
| bool has_family_{false}; | ||
| std::atomic<double> retention_time_{1e9}; | ||
| std::atomic<std::time_t> last_update_{std::time(nullptr)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you on this one. I think that atomic is worth having for these variables.
core/src/family.cc
Outdated
| if (found) { | ||
| matched = true; | ||
| } else { | ||
| matched = false; | ||
| break; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (found) { | |
| matched = true; | |
| } else { | |
| matched = false; | |
| break; | |
| } | |
| matched = found; | |
| if (!matched) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is sort of a bug in this logic. It should only update matched if not found.
| /* Check if the name is a match */ | ||
| if (boost::regex_match(name_, name_expr)) { | ||
| for (auto metric: metrics_) { | ||
| bool matched(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on this logic, I'm not too sure why we have both matched and found. I think it looks like you could just use matched and then set it to true under the constant or metric label conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bit of a bug in the logic, see followup comment.
| } | ||
| Change(value); | ||
| void Gauge::Increment(const double& value, const bool& alert) { | ||
| if (value < 0.0) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I think this line and the Decrement check makes sense, I also wonder if this check is necessary. Like if the user wants chooses to decrement by a negative number, there won't be any warning or error raised, so would it be better to allow the user to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree but I'm not sure that warrants changing the original behaviour or not (the more that we leave the same, the easier it will be for merge updates).
core/src/summary.cc
Outdated
|
|
||
| void Summary::Observe(const double value) { | ||
| const std::chrono::milliseconds& max_age, | ||
| const int& age_buckets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this change is necessary (to minimize modifications) --> in particular const int& age_buckets versus const int age_buckets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You caught me. I can change it back but I'll wait to see what the other reviewers say.
| std::vector<std::shared_ptr<Family<Counter>>> counters_; | ||
| std::vector<std::shared_ptr<Family<Gauge>>> gauges_; | ||
| std::vector<std::shared_ptr<Family<Histogram>>> histograms_; | ||
| std::vector<std::shared_ptr<Family<Summary>>> summaries_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe these should change to maps, to mock the behaviour of metrics in a family.
| std::vector<Counter> bucket_counts_; | ||
| Counter sum_; | ||
| std::atomic<double> sum_{0.0}; | ||
| std::vector<std::atomic<double>> bucket_counts_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously histogram used a series of counters to keep track of it's buckets and counters used guages to keep track of their values. This nesting of data structures proved troublesome when we now want to keep track of retention time for each metric (we don't want each bucket to have it's own retention time, and each counter in each bucket have it's own retention time >_<)
core/src/family.cc
Outdated
| if (found) { | ||
| matched = true; | ||
| } else { | ||
| matched = false; | ||
| break; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is sort of a bug in this logic. It should only update matched if not found.
| /* Check if the name is a match */ | ||
| if (boost::regex_match(name_, name_expr)) { | ||
| for (auto metric: metrics_) { | ||
| bool matched(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bit of a bug in the logic, see followup comment.
core/src/family.cc
Outdated
| if (found) { | ||
| matched = true; | ||
| } else { | ||
| matched = false; | ||
| break; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (found) { | |
| matched = true; | |
| } else { | |
| matched = false; | |
| break; | |
| } | |
| if (!found) { | |
| matched = false; | |
| break; | |
| } |
| } | ||
| Change(value); | ||
| void Gauge::Increment(const double& value, const bool& alert) { | ||
| if (value < 0.0) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree but I'm not sure that warrants changing the original behaviour or not (the more that we leave the same, the easier it will be for merge updates).
core/src/summary.cc
Outdated
|
|
||
| void Summary::Observe(const double value) { | ||
| const std::chrono::milliseconds& max_age, | ||
| const int& age_buckets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You caught me. I can change it back but I'll wait to see what the other reviewers say.
core/src/summary.cc
Outdated
| sum_ += value; | ||
| quantile_values_.insert(value); | ||
| last_update_ = std::time(nullptr); | ||
| if (alert) AlertIfNoFamily(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, clearly you wouldn't understand since your family is awesome!
Joking aside, prometheus keeps a list of metrics (families) for publishing to sockets (which eventually get added to the database). Now that we have the ability to expire metrics (remove them from their family), we want to alert the user if and when they are trying to use a metric that will not get published.
Co-Authored-By: sarahttan <sarah.t.tan@gmail.com>
sarahttan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the best of my knowledge, I've reviewed the changes, but it's definitely a bit hard to switch review sites ;)
core/src/metric_base.cc
Outdated
| bool MetricBase::HasFamily() { return has_family_; } | ||
|
|
||
| void MetricBase::AlertIfNoFamily() { | ||
| if (!has_family_) std::cerr << "prometheus-cpp:: This metric has no family" << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you extend the message to explain the consequence of not having a family (i.e. not publishing the metric to sockets) as you elaborated in one of your comments? It's not immediately clear why this important especially for new users.
core/include/prometheus/counter.h
Outdated
| /// | ||
| /// The counter will not change if the given amount is negative. | ||
| void Increment(double); | ||
| void Increment(const double& = 1, const bool& alert = true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this alert be set during construction of the metric so the user doesn't have to add this argument if it is false every time they call the Increment methods on Counters, Gauges etc? It seems a bit cumbersome to add during each update when this needs to only be set once. I think it will also make the user more intentional about the alerts during the creation of the metrics.
|
|
||
| namespace prometheus { | ||
|
|
||
| enum class RetentionBehavior {Keep, Remove}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment to explain the significance/behaviour of the retention i.e. keeps metric for specified retention time, removes when retention time is exceeded? I wasn't sure how the removal worked and add to search for it.
| /// | ||
| /// \param name Set the metric name. | ||
| /// \param help Set an additional description. | ||
| /// \param constant_labels Assign a set of key-value pairs (= labels) to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add documentation for the retention_behavior param.
core/include/prometheus/family.h
Outdated
| /// \return Zero or more samples for each dimensional data. | ||
| std::vector<MetricFamily> Collect() override; | ||
|
|
||
| bool UpdateRetentionTime(const double& retention_time, const std::string& re_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to make built-in types const ref? I don't see any benefit to this but if there is let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bools, ints, doubles no. Strings can have benifit as the addresss space is smaller than the string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I meant the ints, doubles etc except string.
core/include/prometheus/summary.h
Outdated
| std::chrono::milliseconds max_age = std::chrono::seconds{60}, | ||
| int age_buckets = 5); | ||
| const std::chrono::milliseconds& max_age = std::chrono::seconds{60}, | ||
| const int& age_buckets = 5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I don't see any optimizations/improvements to making the built-in types const ref
| } | ||
|
|
||
| double Counter::Value() const { return gauge_.Value(); } | ||
| double Counter::Value() const { return value_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the change you've made here with the counters etc keeping track of their values.
| if (!m.second->Expired()) { | ||
| family.metric.push_back(std::move(CollectMetric(m.first, m.second))); | ||
| } else if (retention_behavior_ == RetentionBehavior::Remove) { | ||
| to_be_removed.push_back(m.second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that we don't want to hold the lock for a long time but if you're using unordered maps for the metrics and labels, then the removal should be constant time in the average case. If removal from the unordered maps approach linear time then I think a better hash function needs to be chosen. Hence, do we need to store the metrics to be removed and then remove them later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be a solution. The Remove method does a few things, so I think I was just playing it safe. Additionally, the Remove method also has a lock, so that would either need to be removed (and only called when the data is locked, which is problematic) or change it to a scoped lock, which would need some thought.
I like the idea of calling it in place but I'm leaning towards leaving it since I like to play it safe with locks.
core/include/prometheus/family.h
Outdated
| /// \return Zero or more samples for each dimensional data. | ||
| std::vector<MetricFamily> Collect() override; | ||
|
|
||
| bool UpdateRetentionTime(const double& retention_time, const std::string& re_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, add method documentation for "UpdateRetentionTime"
core/src/family.cc
Outdated
| } | ||
|
|
||
| template <typename T> | ||
| bool Family<T>::UpdateRetentionTime(const double& retention_time, const std::string& re_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you just update the retention time for each metric internally every time the metric is updated (incremented, decremented etc.) and set a flag for removal every time the retention time is exceeded to circumvent the name/metric matching process?
Even if it remains as is, shouldn't this be a method that's called internally by the registry? Should the user explicitly have to call this function if they're set the retention behavior for a metric to remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rentention time is updated automatically, these are just for the developer to keep the metric alive if they are not updating the metric with new data. As for flagging them for removal, we want to flag them peroidically, otherwise we might get a metric that gets data once during the whole program and then never expires.
The current implementation has the metrics update the class member last_update_ instead of calling UpdateRetentionTime since that method is used to find the metric or group of metrics and update their retention time.
…oved erroneous pass by references).
The prometheus-cpp lib provides functionality to expose internal data to Prometheus via a series of sockets (unique to each node). Typically, each metric will get exposed for the lifetime of the process (node). Thus, if a node created 1000 metrics at the beginning of the program and never updated them again for the entire process, prometheus-cpp would continue exposing them to the socket for the entire process (similar to a while loop with 1000 print lines).
Additionally, prometheus-cpp uses a number of return by references, which poses the following issue:
This can be fixed by returning a shared pointer instead. However, the nuance to this is that a shared pointer that has been removed may update its data but that data will not get exposed.