-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus integration #96
Comments
Tried implementing using prometheus_ex + implementing the standard PhoenixInstrumenter. Nevertheless I'm not 100% sure if the PhoenixInstrumenter makes sense as there are two separate webservers within rig and I assume that that this standard library is not able to fetch Information for both of them nor to differentiate between them. considering #102 as the new application architecture, the Metrics endpoint is planned to be exposed via :rig_control I'm not sure if to implement within that application in /lib/Metrics (at least they do so in https://github.com/oestrich/ex_venture/lib/Metrics) Deciding between Prometheus_ex & wObserver, at least for Prometheus metrics I'd go for Prometheus_ex, as implementing custom metrics seems to look more straight forward & cleaner from a Code perspective wObserver nevertheless provides this fancy frontend showing these standard observer Infos, which might be helpful… @kevinbader: please provide feedback on my thoughts and how you'd like to have it implemented, I'd like to support on this :) |
@vanillebear nice appreciate the support :) I think you're right, there is a dependency issue like this. Let me summarize:
The only way to guarantee that |
Assuming that the current implementation will be refactored (e.g. rig_api -> rig_control) still I hope what I'm doing on the current Project structure, is still valid in a future layout :) Currently working on the implementation - The initial Setup of the new OTP app is done (including updates in dockerfile, etc.) + Exporting the endpoint via rig_api. Current app structure:
I'd propose following metrics for the ProxyInstrumenter as @kevinbader suggested. rig_subscriptions_total --> As Prometheus Counter + Label ConnectionID?! rig_request_size_bytes -> As Prometheus Summary Currently no idea for metrics on the Event hub, but in general we could use labels to identify the type (e.g. Kafka) I'm not 100% sure on were to track the metrics as I'm not 100% aware of the RIG implementation in general. Could someone propose were to implement? I didn't provide a PR yet as still work in progress. Current version can be found in https://github.com/vanillebear/reactive-interaction-gateway/tree/feature/96-promotheus-integration |
Nice work! Some thoughts:
I wonder if/how they could be useful..
For certain proxy features we build a "correlation ID" that is basically the encoded process ID; the same mechanism could be used to create a "connection ID" for a given websocket or SSE connection. What would you use the label for? For each event type there is one subscription table, so having a Gauge per event type should be as easy as reporting the number of rows in the respective (ETS) table. The number of subscriptions per connection can be calculated by grouping the rows by pid, but I guess this will get a bit slow with the number of subscriptions: as one process (pid = process id) may have multiple subscriptions in the same table, the pid column is not unique => no index, slow counting. And if we'd want to report the number of subscriptions regardless of event types, we'd even have to count this for each table (again, one table per event type). So perhaps we'll skip that metric. Other ideas for metrics:
I think the instrumentation modules should live next to what they're measuring: if the measured thing changes, the corresponding metrics might change as well, so it keeps the changes local. That said, I would throw everything into On a different matter, I'd prefer having a health endpoint that returns OK instead of querying the APIs endpoint - in other words I think the docs are good, it's just the implementation that is missing (which doesn't need to be in the same PR, of course). Do you already have any plans for how to create the dashboard? Not that we'd strictly need one, but it would be pretty neat. Grafana seems to supported "scripted dashboards", but I haven't looked into alternatives yet (are there any?). |
I thought that this could show the overall usage over time within grafana, but actually you'll see the same Information when looking on the current usage over time. So skipping that.
The label would be used to be able to have a detailed view per Connection ID within grafana. If that's not really relevant we could skip that
Actually the metrics will be increased/decreased when an subscription is added/removed by calling the respective function of the instrumenter. e.g.: by calling
Hmm, then we would need an dependency to ex_prometheus in every module as we need to implement the Prometheus.Metric module.
Understood - I'll undo the change & create an issue for that.
There are a few examples for beam/elixir grafana Dashboards provided here Metrics planned as of latest discussion: RigProxyInstrumenter
RigEventhubInstrumenter
RigControlInstrumenter
|
Yeah you're right. I assume that when the connection goes down the counter (Gauge) is decreased again. But what if the connection process crashes? Perhaps we might just assume that this doesn't happen too often and should be ignored, but the measurement won't be reliable then. Unfortunately, Erlang monitors cannot be used here as they don't perform well when creating many processes in a short period of time (at least according to my initial benchmarks).
Sure, I think I've mixed that up as well. The instrumeters don't change and the code that actually invokes the instrumenters lives along with what they measure, right? Seems fine to me that way.
Thanks!
really cool, didn't know about this. Might be details, but wouldn't it make more sense to have |
Thanks for your Feedback. Regarding naming, I think i missunderstood the Prometheus naming conventions & Counter/Gauge/etc. metrics types are Prometheus core concepts so should be known to everyone who wants to use them. I'll rename! |
Just an idea, seems to make sense to me.. However, I don't think you've "misunderstood" anything - indeed, the official naming conventions don't take the same approach. The docs have examples where Counters are post-fixed with btw, quite interesting: https://promcon.io/2017-munich/slides/best-practices-and-beastly-pitfalls.pdf |
Hi @kevinbader I have problems with writing tests for this… e.g. for blacklisted sessions they are cleaned up by a separate process (within Do you have an idea how to do this? Or is ok to have not that high test coverage for this? |
@vanillebear does this help? https://github.com/Accenture/reactive-interaction-gateway/blob/master/apps/rig_auth/test/blacklist_test.exs#L37 |
@kevinbader Unfortunately not... The current Metrics handling is implemented in Distributed_set.ex within "remove_expired_records" not in Rig_Auth I tried using this, but still the remove_expired_records isn't called? Sorry for this… Unfortunately I'm not quite sure/can't find how this method is called in general so I'm not sure how to test :/ Maybe also the wrong place to track the metrics? EDIT: I think I'll move the implementation to Blacklist.ex |
In general I think Thinking out loud, an integration test would probably look somewhat like this:
So what about this then instead:
|
Hello @kevinbader, @mmacai: We aligned that this issue should be used for initial setup of prometheus exporting the /metrics on rig_api I'll create an separate issue for implementation of rig specific metrics. Also for the problematic "Blacklist" implementation I've created an separate issue --> #156 just not to forget |
awesome, thanks! |
Should be fairly easy for Phoenix request metrics, for example see https://blog.smartlogic.io/monitoring-your-elixir-app-with-prometheus/
Ideas for custom metrics:
EDIT: could be implemented using https://github.com/shinyscorpion/wObserver
The text was updated successfully, but these errors were encountered: