Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile and benchmark the ara callback and api server #6

Open
dmsimard opened this issue Sep 19, 2019 · 5 comments
Open

Profile and benchmark the ara callback and api server #6

dmsimard opened this issue Sep 19, 2019 · 5 comments

Comments

@dmsimard
Copy link
Contributor

@dmsimard dmsimard commented Sep 19, 2019

ansible-role-ara_api currently supports a few different deployment options:

  • with or without gunicorn to launch the django wsgi application for the API server
  • support for nginx in front of gunicorn
  • sqlite as the default database backend
  • support for mysql and postgresql

All of these options are already integration tested and we could leverage these tests to benchmark the performance of each option. The data would be interesting but also valuable in finding issues and improvement opportunities.

@dmsimard

This comment has been minimized.

Copy link
Contributor Author

@dmsimard dmsimard commented Oct 23, 2019

Interested in benchmarking a particular use case: instead of having a single central API server with a central database, instead have a local API server with a remote central database.

So instead of this:

+--------+
|callback| +-------+
+--------+         |
                   v
+--------+       +-+-+       +--------+
|callback| +---> |API| +---> |Database|
+--------+       +-+-+       +--------+
                   ^
+--------+         |
|callback| +-------+
+--------+

The callback would instead use a local API server:

+------------------+
| +--------+       |
| |callback|  API  | +-----------+
| +--------+       |             |
+------------------+             |
                                 v
+------------------+
| +--------+       |         +--------+
| |callback|  API  | +-----> |Database|
| +--------+       |         +--------+
+------------------+
                                 ^
+------------------+             |
| +--------+       |             |
| |callback|  API  | +-----------+
| +--------+       |
+------------------+
@dmsimard

This comment has been minimized.

Copy link
Contributor Author

@dmsimard dmsimard commented Oct 23, 2019

Found ansible-community/ara#80 while benchmarking

@zxaos

This comment has been minimized.

Copy link

@zxaos zxaos commented Oct 23, 2019

@dmsimard do you know offhand if multiple API servers behind a LB is a supported configuration too? I may try that in an attempt to address performance.

That is to say:

┌─────────┐
│callback │────────────┐           ┌─────┐
└─────────┘            │        ┌─▶│ API │──────────┐
                       ▼        │  └─────┘          ▼
┌─────────┐     ┌─────────────┐ │  ┌─────┐     ┌────────┐
│callback │────▶│Load Balancer│─┼─▶│ API │────▶│Database│
└─────────┘     └─────────────┘ │  └─────┘     └────────┘
                       ▲        │  ┌─────┐          ▲
┌─────────┐            │        └─▶│ API │──────────┘
│callback │────────────┘           └─────┘
└─────────┘
@dmsimard

This comment has been minimized.

Copy link
Contributor Author

@dmsimard dmsimard commented Oct 24, 2019

@zxaos the state is stored in the database so it should be possible.

AWX is built with the same backend (django/django-rest-framework) and they have a docker-compose stack that includes haproxy.

However, in the testing that I've done, the main bottleneck wasn't the API server, it was the callback plugin itself.
gunicorn and mysql were almost idle so I'm not sure how much you would gain by adding more API servers until we improve the callback plugin.

For now I'll keep profiling to see if there are low hanging fruits (like ansible-community/ara@974fa25) and then I'll tag 1.2.

@dmsimard

This comment has been minimized.

Copy link
Contributor Author

@dmsimard dmsimard commented Oct 30, 2019

For the sake of testing 1.2 before release, I ran Ansible's own integration tests with it and figured it would be worthwhile to capture various metrics.

The API server being tested ran gunicorn with nginx in front and was configured to use a remote MariaDB server located in another virtual machine on the same hypervisor (<1ms latency).

The two virtual machines had modest specs (4vcpu, 4gb ram) and were not very loaded throughout the tests.

API server

API Server cpu/disk/load
API Server ram/network

Database server

DB Server cpu/disk/load
DB Server ram/network

The data is currently available on https://api.trunk.demo.recordsansible.org right now.

Here are some results from after all tests completed:

  • 174 playbooks
  • 174 plays
  • 11655 tasks
  • 174 hosts
  • 11688 results
  • 1115 files (referencing 738 unique, compressed file contents)

37216 total API calls:

  • 174x POST /api/v1/playbooks
  • 174x PATCH /api/v1/playbooks
  • 174x POST /api/v1/plays
  • 174x PATCH /api/v1/plays
  • 11655x POST /api/v1/tasks
  • 11655x PATCH /api/v1/tasks
  • 174x POST /api/v1/hosts
  • 177x PATCH /api/v1/hosts
  • 11688x POST /api/v1/results

First call: [30/Oct/2019:21:29:11 +0000]
Last call: [30/Oct/2019:22:07:03 +0000]
Duration: ~38 minutes

Longest playbook: 8m34s

I also ran the logs through goaccess which provided the following information:

1 - Unique visitors per day - Including spiders

Hits       h% Vis.      v%   Bandwidth Data
 ----- ------- ---- ------- ----------- ----
 37216 100.00%    4 100.00%   26.40 MiB 30/Oct/2019

2 - Requested Files (URLs)       

 Hits      h% Vis.    v%   Bandwidth Mtd   Proto    Data
 ----- ------ ---- ----- ----------- ----- -------- ----
 11688 31.41%    4 0.03%   15.43 MiB POST  HTTP/1.1 /api/v1/results
 11655 31.32%    4 0.03%    3.74 MiB POST  HTTP/1.1 /api/v1/tasks
 1171   3.15%    4 0.03%    2.56 MiB POST  HTTP/1.1 /api/v1/files
 174    0.47%    4 0.03%  205.42 KiB POST  HTTP/1.1 /api/v1/playbooks
 174    0.47%    4 0.03%   44.47 KiB POST  HTTP/1.1 /api/v1/plays
 174    0.47%    4 0.03%   32.42 KiB POST  HTTP/1.1 /api/v1/hosts
 2      0.01%    1 0.01%   31.01 KiB PATCH HTTP/1.1 /api/v1/hosts/47
 2      0.01%    1 0.01%   31.16 KiB PATCH HTTP/1.1 /api/v1/hosts/129
 2      0.01%    1 0.01%   500.0   B PATCH HTTP/1.1 /api/v1/hosts/63
 1      0.00%    1 0.01%   348.0   B PATCH HTTP/1.1 /api/v1/tasks/250
 1      0.00%    1 0.01%   349.0   B PATCH HTTP/1.1 /api/v1/tasks/495
 1      0.00%    1 0.01%   362.0   B PATCH HTTP/1.1 /api/v1/tasks/734
[...]

5 - Visitor Hostnames and IPs                                                                                                                                                                                                                   Total: 4/4

 Hits      h% Vis.     v%   Bandwidth Data
 ----- ------ ---- ------ ----------- ----
 13092 35.18%    1 25.00%   10.46 MiB <first ip>
 11932 32.06%    1 25.00%    8.42 MiB <second ip>
 7834  21.05%    1 25.00%    5.04 MiB <third ip>
 4358  11.71%    1 25.00%    2.49 MiB <fourth ip>

13 - HTTP Status Codes    

Hits      h% Vis.     v%   Bandwidth Data
 ----- ------ ---- ------ ----------- ----
 37152 99.83%    8 80.00%   26.40 MiB 2xx Success
 56     0.15%    0  0.00%    2.41 KiB 4xx Client Error
 8      0.02%    2 20.00%    1.10 KiB 5xx Server Error

The client and server errors are worth digging into, it looks like most of them are related to files, for example:

Oct 30 21:52:04 gunicorn[21171]: Bad Request: /api/v1/files
Oct 30 21:52:04 gunicorn[21171]: 2019-10-30 21:52:04,150 WARNING django.request: Bad Request: /api/v1/files
Oct 30 21:52:04 gunicorn[21171]: 127.0.0.1 - - [30/Oct/2019:21:52:04 +0000] "POST /api/v1/files HTTP/1.0" 400 44 "-" "ara-http-client_1.2.0.0b1"

The 5xx errors were doing POSTs on results:

[30/Oct/2019:21:47:53 +0000] "POST /api/v1/results HTTP/1.0" 201 500 "-" "ara-http-client_1.2.0.0b1"

Edit: Another data point -- a MySQL dump weighed 9MB in plain text and 4.5MB gzipped.
Edit #2: ara-manage generate crashed without completing after almost 2 minutes:

[ara] Generating static files for 174 playbooks at /tmp/test...
Traceback (most recent call last):
  File "virtualenv/bin/ara-manage", line 10, in <module>
    sys.exit(main())
  File "/home/fedora/.ara/virtualenv/lib64/python3.7/site-packages/ara/server/__main__.py", line 35, in main
    execute_from_command_line(sys.argv)
  File "/home/fedora/.ara/virtualenv/lib64/python3.7/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/home/fedora/.ara/virtualenv/lib64/python3.7/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/fedora/.ara/virtualenv/lib64/python3.7/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/fedora/.ara/virtualenv/lib64/python3.7/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/home/fedora/.ara/virtualenv/lib64/python3.7/site-packages/ara/ui/management/commands/generate.py", line 83, in handle
    self.render("result.html", destination, **data)
  File "/home/fedora/.ara/virtualenv/lib64/python3.7/site-packages/ara/ui/management/commands/generate.py", line 37, in render
    f.write(render_to_string(template, kwargs))
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce9' in position 7079: surrogates not allowed

real	1m50.589s
user	1m34.707s
sys	0m3.223s

A tree on the target directory displayed 12 directories and 2396 files weighing 46MB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.