Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up v3 security groups #2255

Merged

Conversation

andy-paine
Copy link
Contributor

@andy-paine andy-paine commented May 4, 2021

Summary

For users with large numbers of space visibilities (1000+), the query string in SecurityGroupPresenter::space_guid_hash_for was getting extremely large as it does a SELECT ... WHERE IN (<list of visible space guids>) for each security group. As the goal of this query is to effectively find which spaces the security group is applied to, the number of returned rows is strictly less than or equal to the size of the visible space guid lists. By fetching the list of all space guids a security group is applied to and filtering in Ruby, we avoid having to serialize the large visible space list and create + send the large query (sometimes multiple MB in length!) across to the DB.

Best case scenario (for this change)

  • Many spaces cause the old query string to be very large
  • Most spaces do not have a security group applied (not many rows in security_groups_spaces) so few rows are returned

This should be a big improvement as the extra cost of doing include? in the Cloud Controller should be neglible when only a few rows are returned

Using:

  • 100k orgs
  • 100k spaces
  • 40k security groups
  • 20k security group spaces

Timings:

  • Before: cf curl /v3/security_groups - 68s
  • After: cf curl /v3/security_groups - 0.37s
  • Before: cf curl /v3/security_groups?per_page=500 - Timed out after 30 minutes (watching the DB it was doing some weird stuff)
  • After: cf curl /v3/security_groups?per_page=500 - 1.74s

Worst case scenario (for this change)

  • Fewer spaces so query string not as large
  • Every space has every security group applied (many rows in security_groups_spaces) so lots of rows are returned

This could be a significant downgrade if the include? is expensive due to lots of rows being returned as the filtering is no longer done in the DB

Using:

  • 2k orgs
  • 2k spaces
  • 500 security groups
  • 1 million security group spaces (one for every space x security group pair)

NOTE: timings aren't comparable to above as there is significantly less overall data otherwise the measurements would have to be in hours as there would be 4 billion security_groups_spaces rows!

Timings:

  • Before: cf curl /v3/security_groups - 8.1s
  • After: cf curl /v3/security_groups - 4.6s
  • Before: cf curl /v3/security_groups?per_page=500 - 96s
  • After: cf curl /v3/security_groups?per_page=500 - 48s

Initially there appeared to be no significant difference between approaches however by including a select(:guid) in the query, this reduced the size of the returned dataset enough that the new query outperformed the old one even in this scenario.

Conclusion

This adjusted query seems to be an improvement in both the worst and best case scenario for this change. I would also suggest that the "best case scenario" above is much more realistic for a real world scenario of having many spaces but only a few security groups per space.

  • I have reviewed the contributing guide

  • I have viewed, signed, and submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests using bundle exec rake

  • I have run CF Acceptance Tests

For users with access to a large number of spaces, the SQL query string
for checking which spaces a security group applies to gets quite big
(sometimes >1MB). By grabbing all the spaces a security group applies to
and doing the filtering in Ruby, this query string is much smaller. The
potential extra cost of running the `include?` seems to be offset by the
saving in creating/sending the large query string and using the
`select(:guid)` also saves some data transfer to further improve the
latency of this query.
@andy-paine andy-paine force-pushed the speed-up-v3-security-groups branch from 51c8d33 to 0a7353a Compare May 4, 2021 15:07
@sethboyles
Copy link
Member

sethboyles commented May 5, 2021

Thanks for the PR! This all makes sense and I will merge shortly.

Testing out the latter (worst) case on our environment, we saw a slight decrease in response time, from ~2m28s to ~2m9s

FWIW I experimented a little with the latter case, replacing dataset.select(:guid).all with dataset.select_map(:guid), the theory being that select_map avoids constructing a Ruby object per Space record, and instead simply returns an array, but only saw a minimal decrease in response time (from 2m9s -> 2m4s)

Additionally, I tried using eager loading, which batches all 2000 spaces into one query. Later in the presenter would have to be changed to use the already loaded array of Space records rather than a dataset. The queries themselves would execute quickly, but the Sequel ORM still takes a long time to stitch the 1 million associations together. The request ended up being longer than 3 minutes overall. I think eager loading might have had potential for your first case scenario, but given the improvement already shown I didn't try it out.

@sethboyles sethboyles merged commit 0c1374d into cloudfoundry:main May 5, 2021
This was referenced Aug 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants