Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache 'Administrator' group to improve performance of Workflow Tasks Page. #9161

Merged

Conversation

benbosman
Copy link
Member

@benbosman benbosman commented Oct 31, 2023

References

Description

Based on performance testing, with large amounts of Groups, the retrieval from the admin group from the database slows down

Instructions for Reviewers

The only change it to cache the Administrator Group, but only within the scope of one context.
This ensures the Group only is retrieved once in a request

This improved processing time from 14.5 seconds to 10.5 seconds on a database with many databases for the server/api/discover/search/objects?sort=score,DESC&page=0&size=10&configuration=workflow&embed=thumbnail&embed=item%2Fthumbnail call

Checklist

This checklist provides a reminder of what we are going to look for when reviewing your PR. You need not complete this checklist prior to creating your PR (draft PRs are always welcome). If you are unsure about an item in the checklist, don't hesitate to ask. We're here to help!

  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & integration tests). Exceptions may be made if previously agreed upon.
  • My PR passes Checkstyle validation based on the Code Style Guide.
  • My PR includes Javadoc for all new (or modified) public methods and classes. It also includes Javadoc for large or complex private methods.
  • My PR passes all tests and includes new/updated Unit or Integration Tests based on the Code Testing Guide.
  • If my PR includes new libraries/dependencies (in any pom.xml), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.
  • If my PR modifies REST API endpoints, I've opened a separate REST Contract PR related to this change.
  • If my PR includes new configurations, I've provided basic technical documentation in the PR itself.
  • If my PR fixes an issue ticket, I've linked them together.

@tdonohue
Copy link
Member

tdonohue commented Nov 1, 2023

@benbosman : While I'm not against this change, I'm finding this difficult to test. Could you better describe the setup required to see the change in performance? Currently your description says: "on a database with many databases" and that's obviously a typo.

I've tried testing this using an Admin account that is a member of 20 groups, and having 15 Items currently on the Workflow Tasks page. In that scenario, I'm not seeing a noticeable difference in performance for that /search/objects?...&configuration=workflow endpoint. Performance in this setup is nearly identical with or without this PR, with that endpoint taking about 2.8 seconds to respond. So, I must not be at the right scale to notice anything. Could you give me a sense of the scale needed to test this? Or maybe provide a sample/test data dump from PostgreSQL?

All that said, this PR makes logical sense that caching the Group can be helpful. It just hasn't been easy for me to see the performance improvement.

NOTE: While testing this PR, I've come to realize the behavior you have said is problematic. The Workflow configuration of /search/objects is embedding entirely too much information and the response is very large and complex. See this comment back on the ticket: #9053 (comment)

This large/complex response quite obviously loads many objects into memory & results in bad performance.

@tdonohue tdonohue changed the title 107891: Cache administrator group Cache 'Administrator' group to improve performance of Workflow Tasks Page. Nov 1, 2023
@benbosman
Copy link
Member Author

Hi @tdonohue,
Based on the initial issue, I've used a client's database where the Administrator group is part of 400 other groups, like collection admins, collection workflow members, …
On top of this, I have 1200 workflow tasks in this database

The improvement here is specific to this setup with large amounts of parent groups

I sadly can't share the database, it's not a random generated one, but rather based on a client's database

@tdonohue tdonohue added the 1 APPROVAL pull request only requires a single approval to merge. label Nov 2, 2023
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks @benbosman ! This looks good to me and makes sense. I've had difficulty proving the scale of the performance improvement, as I don't have a large enough data set. But, I've seen at least minor performance improvements on small data sets.

That said, I'm going to wait to merge this until sometime next week as I want to give @aroldorique an opportunity to test this as they reported the initial performance issue. Hopefully we find this at least provides some performance improvements (though we also plan to find additional performance improvements in #9164)

@tdonohue tdonohue added the port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release label Nov 3, 2023
@jonas-atmire
Copy link
Contributor

jonas-atmire commented Nov 8, 2023

@tdonohue
I'll expand a bit on more in depth investigations of the load-related impact of the PR as I've been testing this as well

For a DB with around 3.5k eperson objects, I went through several different combinations of:

  • DSpace running without the PR merged
  • DSpace running with this PR merged
  • Tomcat/solr just being restarted
  • Tomcat/solr having already been running

I've noticed that if tomcat/solr has been up and running for a while, and the requests/response has been cached, that the impact is negligible, which is to be expected.

The following is the average response time over 3 tests:

  • With the PR merged, tomcat up and running for a while:
    • 0.82s
  • With the PR merged, tomcat up and running just being restarted:
    • 3.37s
  • Without the PR merged, tomcat up and running for a while:
    • 1.03s
  • Without the PR merged, tomcat up and running just being restarted:
    • 15.03s

So ignoring the caching here, and only taking the recently restarted tomcat/solr combo into account, this goes from 15 seconds to about 3 seconds.

@tdonohue
Copy link
Member

tdonohue commented Nov 9, 2023

Thanks @benbosman and @jonas-atmire ! Merging as this looks good to me & has had testing from the original reporter. Any follow-up work will move to #9164

@tdonohue tdonohue merged commit 5601ff9 into DSpace:main Nov 9, 2023
15 checks passed
@dspace-bot
Copy link

@tdonohue tdonohue modified the milestones: 7.6.1, 8.0 Nov 9, 2023
@tdonohue tdonohue removed the port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release label Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 APPROVAL pull request only requires a single approval to merge. bug high priority performance / caching Related to performance or caching issues
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

Very Slow Workflow Tasks Page when logged in EPerson is member of many Groups
4 participants