New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache 'Administrator' group to improve performance of Workflow Tasks Page. #9161
Cache 'Administrator' group to improve performance of Workflow Tasks Page. #9161
Conversation
@benbosman : While I'm not against this change, I'm finding this difficult to test. Could you better describe the setup required to see the change in performance? Currently your description says: "on a database with many databases" and that's obviously a typo. I've tried testing this using an Admin account that is a member of 20 groups, and having 15 Items currently on the Workflow Tasks page. In that scenario, I'm not seeing a noticeable difference in performance for that All that said, this PR makes logical sense that caching the Group can be helpful. It just hasn't been easy for me to see the performance improvement. NOTE: While testing this PR, I've come to realize the behavior you have said is problematic. The Workflow configuration of This large/complex response quite obviously loads many objects into memory & results in bad performance. |
Hi @tdonohue, The improvement here is specific to this setup with large amounts of parent groups I sadly can't share the database, it's not a random generated one, but rather based on a client's database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Thanks @benbosman ! This looks good to me and makes sense. I've had difficulty proving the scale of the performance improvement, as I don't have a large enough data set. But, I've seen at least minor performance improvements on small data sets.
That said, I'm going to wait to merge this until sometime next week as I want to give @aroldorique an opportunity to test this as they reported the initial performance issue. Hopefully we find this at least provides some performance improvements (though we also plan to find additional performance improvements in #9164)
@tdonohue For a DB with around 3.5k eperson objects, I went through several different combinations of:
I've noticed that if tomcat/solr has been up and running for a while, and the requests/response has been cached, that the impact is negligible, which is to be expected. The following is the average response time over 3 tests:
So ignoring the caching here, and only taking the recently restarted tomcat/solr combo into account, this goes from 15 seconds to about 3 seconds. |
Thanks @benbosman and @jonas-atmire ! Merging as this looks good to me & has had testing from the original reporter. Any follow-up work will move to #9164 |
Successfully created backport PR for |
References
Description
Based on performance testing, with large amounts of Groups, the retrieval from the admin group from the database slows down
Instructions for Reviewers
The only change it to cache the Administrator Group, but only within the scope of one context.
This ensures the Group only is retrieved once in a request
This improved processing time from 14.5 seconds to 10.5 seconds on a database with many databases for the
server/api/discover/search/objects?sort=score,DESC&page=0&size=10&configuration=workflow&embed=thumbnail&embed=item%2Fthumbnail
callChecklist
This checklist provides a reminder of what we are going to look for when reviewing your PR. You need not complete this checklist prior to creating your PR (draft PRs are always welcome). If you are unsure about an item in the checklist, don't hesitate to ask. We're here to help!
pom.xml
), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.