New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance for Groups with many EPerson members. Fix pagination on endpoints #9078
Conversation
I tested this patch on DSpace 7.6. I was able to search and edit groups as expected, with no errors in the browser console or
I was not able to test the REST API directly because I don't know how to do it via simple HTTP requests. |
@alanorth : Thanks for the tests! I added more details under "Instructions for Reviewers" on how to test pagination. If you are interested, It can be done either via commandline (wget or curl) or from the HAL Browser. (Pagination also has some automated tests in my updates to |
Thanks @tdonohue. The HAL browser is easier than curl because we don't have to mess with signing tokens. I checked the two endpoints in the HAL browser and don't see a difference. Paging already seems to work on my unpatched DSpace 7.6 instance. For reference, I tried:
And I get this response in the HAL browser: {
"page": {
"number": 2,
"size": 1,
"totalPages": 521,
"totalElements": 521
}
} Perhaps I'm misunderstanding... |
@alanorth : Apologies, I've misworded the description slightly. Yes, paging already works in 7.6... but this PR changes the paging to provide better performance. In 7.6, paging will first load every object into memory (in your test, 512 objects will be loaded into memory, as So, if you test with 7.6, you'll see slower behavior to the paging, as every page will load It can be difficult to see the speed difference on smaller repositories, but it should still be noticeable. |
@tdonohue ok thanks for taking the time to explain. I was contemplating not commenting, assuming I was doing something stupid. So I am +1 on this by testing because it doesn't change functionality at all and, while I didn't notice the performance increase on localhost in my case, it didn't have any negative affect either. I will let someone else review the code since I don't know it well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @tdonohue , thank you very much for this improvement 🚀 !
However, I am a little bit concerned about the paginated queries that are not using any ordering. I think that in both cases we should offer an order, otherwise the pagination cannot be used in a real scenario.
On the other hand I found out some other little improvements that can be made, and that I suggest that can be done.
dspace-api/src/main/java/org/dspace/eperson/GroupServiceImpl.java
Outdated
Show resolved
Hide resolved
dspace-api/src/main/java/org/dspace/eperson/dao/EPersonDAO.java
Outdated
Show resolved
Hide resolved
…t empty. Need to check *both* EPerson and subgroup counts.
@vins01-4science : Thanks for your review. I've addressed all feedback in the latest commit. That said, I did NOT fix the issues you pointed out with nested |
… comment about indeterminante ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you @tdonohue 🚀
Merging as this is at +2. |
Successfully created backport PR for |
References
Description
This PR improves the performance of Groups which have many EPerson (or Group) members by adding proper pagination to these endpoints. (Previously all objects were always loaded into memory before pagination was applied.)
GET /api/eperson/groups/<:uuid>/epersons
GET /api/eperson/groups/<:uuid>/subgroups
This was implemented via new paginated methods in EPersonService/DAO and GroupService/DAO:
Warnings were added to JavaDoc for the following methods. All of which will have very BAD performance for large groups:
Group.getMembers()
Group.getMemberGroups()
EPersonService.findByGroups()
(unpaginated version)GroupService.allMembers()
allMembers()
) as described below.Finally, I discovered several usages of
GroupService.allMembers()
where the code only required thesize()
of that list. These usages were replaced with newcountAllMembers()
orcountByParent()
methods. All these usages have existing detailed tests were still pass with the newly refactored code:EPersonService.delete(Context, EPerson, cascade)
GroupService.removeMember(Context, Group, EPerson)
GroupService.removeMember(Context, Group, Group)
Instructions for Reviewers
/epersons
and/subgroups
endpoints (size = how many to return, page = which page of results to return). With this PR in place, small pages (small values ofsize
) will return a response quickly. Without this PR (e.g. in 7.6), thesize
of page doesn't matter as every object is loaded into memory regardless ofsize
before a response is returned.GET /api/eperson/groups/<:uuid>/epersons?page=0&size=1
(Only return one result at a time per page)GET /api/eperson/groups/<:uuid>/subgroups?page=0&size=5
(Return first page of 5 results per page)