Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra labels to prometheus report (warn and error level only) #68

Closed
wants to merge 5 commits into from

Conversation

sky-philipalmeida
Copy link

@sky-philipalmeida sky-philipalmeida commented Jan 8, 2020

Basic details about issues for prometheus report.
Afaik it can be largely improved.

Capabilities:

  • Expose detail messages for the outcome issues
  • Fine tune the output via lint level flag

Atm the final result on the Prometheus POV would be (example for service):
$popeye -o prometheus --pushgateway-address http://localhost:9091 -l error -n testns --out-detail verbose

popeye_sanitizer_reports_count{cluster="testcluster",issues="testns/kafka [POP-1100] No pods match service selector,testns/kafka [POP-1105] No associated endpoints",job="popeye",level="error",namespace="testns/kafka",resource="service"} | 1

Basic details about issues.
Can be largely improved just a POC
@sky-philipalmeida sky-philipalmeida changed the title Update prometheus.go Add extra labels to prometheus report - Issues Jan 8, 2020
@sky-philipalmeida sky-philipalmeida changed the title Add extra labels to prometheus report - Issues Add extra labels to prometheus report (warn and error level only) Jan 8, 2020
@sky-philipalmeida sky-philipalmeida marked this pull request as ready for review January 8, 2020 10:37
@derailed
Copy link
Owner

derailed commented Jan 8, 2020

@sky-philipalmeida Thank you for the PR and thoughts!!

One of the thing I love most about prometheus is it's all about that one thing ie gather metrics and not trying or pretending to be for anything else. IMHO this edges a bit off this essence as the labels are only there to support queries for series. I feel that Popeye prom output should all be about gathering signals and not shoehorn sanitization details. ie labels should be used for dimensionality and not kitchen drawers. Perhaps storing in a label a sanitizer report ID might be a better approach here?

That said, I am open for debate here of course, but these are my initial thoughts.

@sky-philipalmeida
Copy link
Author

@derailed I understand your pov.

So your are thinking, eg., to use the save option and later on alertmanager expose the saved file using a url by report id? That makes sense for sure and it is the proper way I think...

...but, there are some caveats afaik, I just checked and the current save options do not play well with HTML browser view, also prometheus output format does not support file save. So if this is the case afaik there should be a new HTML compatible report and support for the -s option on prometheus output format (that would also save to HTML if -s enabled).

Sincerely I think both approaches are valid.

Have a flag that the user could force a label with the issue list (basically my initial approach), this would make possible queries to specific namespaces, or only to specifc error codes which also makes sense.

If you are ok with this I have time to refactor in order to overcome the current Codesense as this is a quick win for my goal.

What do you think?

@derailed
Copy link
Owner

derailed commented Jan 8, 2020

@sky-philipalmeida Thank you so much for voicing your thoughts!

I think I agree with your approach so let me make sure we're on the same page.

Currently a user can generate and push prom metrics using the following command:

popeye -o prometheus --pushgateway-address=xxx

We want to store the associated sanitizer report location as a label along with the prom metrics.
So the user could issue the following command

popeye -o prometheus,json --pushgateway-address=xxx --save

So this command would allow the user to generate multiple report outputs and save them to disk and in the case the output is prometheus we would add a new label to the metrics with the location of the sanitizer report. This does also allow for generating different outputs in a single command no matter if prom is specified or not which I think could be useful. We generate the sanitization once and output to different formats.

  1. The -o option could be limited to 2 outputs formats or if more than 2 are specified the first non prometheus formatter would be the location for the metric labels.
  2. If only -o prometheus is specified and --save is given we would default to a json sanitizer report.

Would this satisfy your use cases? or can you think of a better way to achieve this?

Thank you Phil!!

pkg/popeye.go Outdated Show resolved Hide resolved
pkg/popeye.go Outdated Show resolved Hide resolved
internal/report/prometheus.go Outdated Show resolved Hide resolved
@sky-philipalmeida
Copy link
Author

sky-philipalmeida commented Jan 9, 2020

@derailed enabling multiple output formats will require careful refactor and maybe should be addressed in another feature request IMHO.

The current proposal enables:

  • save option on prometheus output format with default json output.
  • new label target if --save is present that will have the report path to the json file
  • new verbose option "--out-detail verbose" that enables new PromSQL queries and labels issues

Having the path on the target label would need maybe a webserver exposing the files in order to be usefull in Alertmanager links but that is just an ideia.

current implementation status:

popeye -o prometheus --pushgateway-address http://localhost:9091 -n test -l error --save

popeye_sanitizer_reports_count{cluster="test",job="popeye",level="error",namespace="test",resource="clusterrole",target="/var/folders/3n/14849dl537vb3ts_x_95l9_mfm4j0j/T/popeye/sanitizer_test_1578531141195068000.json"}

popeye -o prometheus --pushgateway-address http://localhost:9091 -n test -l error --out-detail verbose

popeye_sanitizer_reports_count{cluster="test",issues="test/test [POP-1100] No pods match service selector,test/test [POP-1105] No associated endpoints",job="popeye",level="error",namespace="test",resource="service"}

@derailed
Copy link
Owner

@sky-philipalmeida Thank you for review this! I think you are right we can tackle multiple reports in a separate push. As for the report location, I think we might need a different option. The report url could be provided in many shapes ie via http or others. So I am thinking perhaps --save does not quiet fit the bill. Looping in @guusvw here) are he owns the PR for s3 support.

NOTE: Not keen on the --out-details what happens if you reports are generating 10k+ issues per runs?
I think this should be out. The details are in the reports not the metrics.

I am kind of burnt this week but we need to think about --save being a flag to indicate how we generate the report url. ie file/http/s3 etc.. So I think perhaps --save file|s3|etc... might be more inline with what we're trying to achieve here??

popeye -o prometheus --pushgateway-address http://localhost:9091  --save=s3 -s3bucket=xxx 

@guusvw
Copy link
Contributor

guusvw commented Jan 13, 2020

I really like the idea of defining a "save storage" via --save and also have a flag for the concrete destination (s3 bucket, local file storage, FTP, ...), also be able to report Prometheus metrics in the same run would be awesome. (I'm currently doing this via two jobs)
I would also limit it to one "save report" & "report metrics" combination, anything else would make it to complicated and don't see a real use case for that.
It would also agree @sky-philipalmeida to do the refactoring not in this PR 👍
Maybe writing first a proposal and collect input and maybe we could afterwards split up the work.

@@ -226,6 +234,7 @@ func (p *Popeye) ensureOutput() error {
ext := "txt"
switch *p.flags.Output {
case "json":
case "prometheus":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case "prometheus":
case "json", "prometheus":

@sky-philipalmeida
Copy link
Author

@guusvw @derailed I will not proceed with the current change as we opted by another way. Using the json report parsing ourselfs into metric and exposing.
Thanks for all your help please feel free to close or change this PR as you see most fitted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants