Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvement Calculator #226

Closed
benthomasson opened this issue Aug 4, 2020 · 7 comments
Closed

Performance Improvement Calculator #226

benthomasson opened this issue Aug 4, 2020 · 7 comments
Labels

Comments

@benthomasson
Copy link
Contributor

benthomasson commented Aug 4, 2020

Description

A common request from Tower operators is to improve the performance of their playbooks when applied to an inventory. This feature attempts to help them do that by pointing out places where improvements would be the most successful to the overall playbook run.

The feature works a bit like the ROI calculator in that it shows them the current state of their system and then they can tweak it to see what performance improvements would do to the over all performance of their playbooks. This is a visualization of Amdahl's Law as applied to Ansible playbooks.

The visualization could be based on this chart.

1024px-Optimizing-different-parts svg

Where A and B would be different tasks in a playbook.

We can present a bar chart showing the duration of the tasks in a playbook and provide fields with speed ups (1.0X by default) for each task. They can then tweak the speed ups for the tasks to see the overall speed up calculated by Amdahl's Law.

Additionally we can show tasks-per-host to graphically identify slow hosts. This could be in the same chart with expandable bars that expand to show bars for each host that ran that task. We can pre-expand some bars if the variance between durations is larger than some threshold which could be user defined as well.

This calculator can be used to compare the current state of a playbook run to hypothetical playbook runs based on user provided speed ups. It can also be used to compare the performance improvement between two runs of the same playbook calculating the per task speed ups and the overall playbook speed up.

Mock up

Add mock up here when ready

Related PRs

Add PRs here when ready

Verification

Screenshot

Add screenshot of implementation here when implemented

Steps

Add verification steps here when ready for QE

@Ladas
Copy link
Collaborator

Ladas commented Aug 4, 2020

@benthomasson we should be able to get the avg task time distribution for template from the event explorer API (after some tweaks and adding the real duration of tasks into rollups)

Then it's all UI magic to drag these, to compute possible speedups.

Btw. we should show avg task speed in the selected time period and maybe the distribution e.g. with quartile chart

https://github.com/RedHatInsights/tower-analytics-backend/issues/478

@Ladas
Copy link
Collaborator

Ladas commented Aug 5, 2020

@benthomasson currently we track these task states (similar to tower)

ok
failed
unreachable
skipped
retry
changed
ignored_failed
ignored_unreachable
rescued_failed
rescued_unreachable

I'll expose duration of each and we should show the distribution. And we should probably allow user to filter only some of these? E.g. unreachable and failed will be eliminated if we filter out only successful jobs.

Then this brings more useful insight, e.g. seeing some task taking a long time but always being skipped or never changing anything or having a lot of retries, etc... Each if these will provide a hint how we can optimize the task.

And we'd be probably showing e.g. average run of this task as changed for 1 host vs. average run of this task as skipped for 1 host

@cswiii
Copy link

cswiii commented Sep 1, 2020

Perhaps we could we call this something snazzy like "Performance Profiler"?

@benthomasson
Copy link
Contributor Author

I have changed the name a few times myself. I was calling it "Performance Planner" in my head recently. Performance Profiler sounds good.

@benthomasson
Copy link
Contributor Author

How do customers find the long running templates? Do we need a visualization or table of the longest running templates?

@benthomasson
Copy link
Contributor Author

This would be useful for developers or architects.

@jctanner
Copy link
Contributor

jctanner commented Jan 7, 2021

@jctanner jctanner closed this as completed Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants