Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Poor performance when querying playbooks or other resources with lots of children #158

Closed
dmsimard opened this issue Aug 19, 2020 · 5 comments
Labels
api Related to the API cli Related to the CLI performance Related to performance UI Related to the built-in user interface
Milestone

Comments

@dmsimard
Copy link
Contributor

dmsimard commented Aug 19, 2020

What component is this about ?

The API and to an extent it's consumers (built-in UI, CLI, ara-web, etc.)

What is happening ?

For playbooks with many results (say, >=5k), responding to an API call on /api/v1/playbooks/<id> or rendering the playbook in the UI inside the browser can be pretty slow.

Some metrics from a deployment with gunicorn and a local mysql database on a single good virtual machine:

  • 5 hosts, 276 results, 315 files
    Selection_033

  • 5 hosts, 1932 results, 203 files
    Selection_035

  • 53 hosts, 7580 results, 44 files
    Selection_036

  • 334 hosts, 21074 results, 231 files
    Selection_037

  • 195 hosts, 27465 results, 97 files
    Selection_038

This is because the API attempts, in good faith, to return all the data about a playbook in the right context in a single call without pagination. We do this for other resources as well, for example plays include their tasks which include their results which include their host.

In hindsight, that was a mistake because it doesn't scale very well. Learning to design an API live in production :)

What should be happening ?

Querying for a playbook's details shouldn't return ALL of it's children resources (plays, tasks, results, files, records, etc.) because these children resources can easily be obtained (with pagination and search parameters) by querying /api/v1/plays?playbook=<id>, /api/v1/results?playbook=<id>?status=failed and so on.

Fixing this would be a significant API change.

@dmsimard dmsimard added api Related to the API UI Related to the built-in user interface cli Related to the CLI plugins Related to the Ansible plugins and removed plugins Related to the Ansible plugins labels Aug 19, 2020
@dmsimard
Copy link
Contributor Author

Started exploring what this change means in practice in a WIP: https://review.opendev.org/#/c/746823/

@dmsimard dmsimard added this to the 1.5 milestone Aug 21, 2020
@Orabig
Copy link

Orabig commented Sep 2, 2020

This seems like a critical issue, before at some point (when running very long playbooks, or playbook on many hosts), the api call eventually fails because of gunicorn worker timeout.

Our ara is deployed on kubernetes with a mysql backend database, and in the ara-api pod we see the following in the logs (and the webpage returns a 500 error code)

[2020-09-02 14:45:35 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:31)
[2020-09-02 16:45:35 +0200] [31] [INFO] Worker exiting (pid: 31) 
[2020-09-02 14:45:36 +0000] [34] [INFO] Booting worker with pid: 34

The application is then useless because all logs about big playbooks are lost (it is impossible to retrieve them in any way)

@dmsimard
Copy link
Contributor Author

dmsimard commented Sep 2, 2020

Hey @Orabig and thanks for the feedback -- there will definitely be a fix for this by the time 1.5 releases.

For the time being, I'm iterating on https://review.opendev.org/#/c/746823/ which I am running in production and I confirm that I can browse playbooks with over 60k results and hundreds of hosts just fine -- the page loads instantly.

It's not yet ready to land yet (pagination and filtering needs tweaks) but it works so I am confident about it.

@dmsimard dmsimard added the performance Related to performance label Sep 4, 2020
@dmsimard
Copy link
Contributor Author

dmsimard commented Sep 5, 2020

More progress on the series of patches, now with filtering and pagination:

Screenshot from 2020-09-05 16-45-09

Need to spend a bit more time on it but it's almost good to go for a first iteration.

@dmsimard
Copy link
Contributor Author

dmsimard commented Sep 16, 2020

Hi @Orabig, FYI this will be released in 1.5 beta2 tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the API cli Related to the CLI performance Related to performance UI Related to the built-in user interface
Projects
None yet
Development

No branches or pull requests

2 participants