Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Dashboard Charts (feedback welcome) #438

Open
bensheldon opened this issue Oct 29, 2021 · 5 comments
Open

Improve Dashboard Charts (feedback welcome) #438

bensheldon opened this issue Oct 29, 2021 · 5 comments
Projects

Comments

@bensheldon
Copy link
Owner

bensheldon commented Oct 29, 2021

I would love feedback on these charts:

  • Have a chart that is simply number of jobs performed successfully, and errored. Would be nice to have:
    • Last 1 hour, and next 15 minutes of upcoming queued/scheduled jobs, in 1 minute increments
    • Longer lookback of 72 hours hourly and maybe 30 days daily\
  • Queuing Latency. This is the big operational one. Duration between when a job is queued/scheduled, and when execution starts.
  • Have some kind of min/max/avg (wish postgres made medians easier) banded-line chart of job duration

Other charts?

I'd like to make these charts load asynchronously (SJR) so that the dashboard doesn't load too slowly.

@bensheldon bensheldon added this to Inbox in Backlog Oct 29, 2021
@aried3r
Copy link
Contributor

aried3r commented Oct 29, 2021

* Have a chart that is simply number of jobs performed successfully, and errored

That would be awesome! When we faced configuration problems we saw the dashboard printing a chart of enqueued jobs, but not executed ones, so it seemed everything was in order at first. Of course we wouldn't rely on just the dashboard for this (see also #403) but in this particular case it would have helped.

* Have some kind of min/max/avg (wish postgres made medians easier) [banded-line chart](https://github.com/gionkunz/chartist-js/issues/283#issuecomment-96365199) of job duration

We are printing some very basic charts internally and were having success using chartkick (which makes lazy loading easy by passing an endpoint rather than data) (or chartkick.js without the Ruby helpers) and active_median. Since Chartist seemed unmaintained, we use either Chartkick or Chart.js directly.
I'm not saying charting libraries should be switched, but it's something to keep in mind. Chart.js does support area charts.

As for medians, if you look at the postgres implementation of active_median, it used a 50% percentile which is the median I believe, which is something PostgreSQL offers.

@bensheldon
Copy link
Owner Author

@aried3r thanks for sharing that story about an incident. That's helpful to know these things have consequences.

I'll check out Chart.js as an alternative to Chartist. I think I can do the minimum of what is needed without chartkick to try to reduce the dependencies as much as possible.

I think this is the feature I want "Line Datasets" for building a time-series-box-and-whisker-like plot: https://www.chartjs.org/docs/3.2.0/samples/area/line-datasets.html

And thanks for pointing to active_median. That doesn't look too bad, though probably all this stuff (all the charts, not just medians) will be row-scanning the good_jobs table and not the most performant.

@bensheldon bensheldon moved this from Inbox to Prioritized backlog in Backlog Nov 4, 2021
@bensheldon
Copy link
Owner Author

bensheldon commented Oct 30, 2022

Continuing to think and work on this. For context, Heroku has these time bucket options:

Screenshot 2022-10-29 at 5 24 11 PM

I really would like to have a cumulative flow diagram.

Some thoughts on what to show:

  • Queue Latency (performed_at - COALESCE(scheduled_at, created_at)). Bucketed based on when an execution is Performed. This should be split out by queue. Or Min, Avg, P95, Max.
  • Perform Latency (finished_at - performed_at). Could be split by queue or job class. Or Min, Avg, P95, Max.
  • Total Latency (finished_at - COALESCE(scheduled_at, created_at)). Could be split by queue or job class. Or Min, Avg, P95, Max.

I dunno quite how to offer both a (Min, Avg, P95, Max)-breakdown, and Queue (or Job) -based breakdowns

@sandstrom
Copy link

Thanks for GoodJob, it's an awesome tool!

In the spirit of keeping the scope of this project down though, I wouldn't build a very advanced dashboard for it.

Instead, just expose the relevant metrics (queue length, etc) and let other tools provide the dashboard. These other tools could be Grafana/Prometheus, Cronitor.io (we're about to integrate them with GoodJob right now), BetterUptime, etc.

Better if this projects stays focused on the ruby/jobs side, ideally with only a simple HTML based dashboard (or only a JSON API with stats). Projects that spread themselves too thin tend to run into problems when the 1-2 maintainers enter a period in their life [family] with less time to allocate to an open-source project.

On that topic, try to get more project team members onboard. Perhaps some of the existing contributors would be willing to help out (https://github.com/bensheldon/good_job/graphs/contributors).

@jonahgeorge
Copy link
Sponsor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Backlog
  
Prioritized backlog
Status: Prioritized Backlog
Development

No branches or pull requests

4 participants