DCJ-708: Refactor query performance #2412

rushtong · 2024-10-16T13:50:21Z

Addresses

Ticket: https://broadworkbench.atlassian.net/browse/DCJ-708

Summary

In this PR, we fix a long-running query bug that caused the instance to crash. This manifested in a never-ending loading screen for the user and an eventual redirect back to their console with an error message. The root of the problem is that it is very expensive to populate a large list of datasets which caused OOMs and the pod to crash/restart.

Fixes:

Rename service/dao methods to be more reflective of what they are actually doing
Query only for the Dataset ID instead of the full dataset object
- This allows the server to complete the response instead of OOM-ing and crashing the instance
Update tests to match new signatures.

Have you read CONTRIBUTING.md lately? If not, do that first.

Label PR with a Jira ticket number and include a link to the ticket
Label PR with a security risk modifier [no, low, medium, high]
PR describes scope of changes
Get a minimum of one thumbs worth of review, preferably two if enough team members are available
Get PO sign-off for all non-trivial UI or workflow changes
Verify all tests go green
Test this change deployed correctly and works on dev environment after deployment

…e layer

rjohanek · 2024-10-16T15:06:06Z

src/test/java/org/broadinstitute/consent/http/service/DarCollectionServiceTest.java

  void testCancelDarCollectionAsChair_ChairHasDatasets() {
    User user = new User();
-    user.setEmail("email");
+    user.setUserId(RandomUtils.nextInt(1, 10));


why a random user id for testing?

why a random user id for testing?

The only reason is to have something to query on. The actual value is irrelevant.

rjohanek · 2024-10-16T15:07:24Z

src/main/java/org/broadinstitute/consent/http/service/DarCollectionService.java

+   * @return List of Dataset IDs
+   */
+  public List<Integer> findDatasetIdsByDACUser(User user) {
+    return datasetDAO.findDatasetIdsByDACUserId(user.getUserId());


why change from user email to user id?

The original code was using string comparison which can sometimes be flaky based on casing, etc. In theory, both should work, but I feel like the integer primary key is more stable than than the email.

Yes, typically you want to query by the integer primary key because it will be indexed, which makes for fast searches.

rjohanek

this looks great! so much more efficient!

fboulnois

looks good, it might be worth describing the performance increase in the description 👍

fboulnois · 2024-10-16T17:38:22Z

src/main/java/org/broadinstitute/consent/http/service/DarCollectionService.java

+   * @return List of Dataset IDs
+   */
+  public List<Integer> findDatasetIdsByDACUser(User user) {
+    return datasetDAO.findDatasetIdsByDACUserId(user.getUserId());


Yes, typically you want to query by the integer primary key because it will be indexed, which makes for fast searches.

rushtong added 4 commits October 16, 2024 09:45

feat: refactor query to return just the values required by the servic…

14a26c7

…e layer

feat: revert back to inner join

f6b03ef

feat: simplify/harden query to user id instead of email address

c3f80e7

feat: fix user id population in tests

e41dee4

rushtong marked this pull request as ready for review October 16, 2024 14:23

rushtong requested a review from a team as a code owner October 16, 2024 14:23

rjohanek reviewed Oct 16, 2024

View reviewed changes

rjohanek approved these changes Oct 16, 2024

View reviewed changes

fboulnois reviewed Oct 16, 2024

View reviewed changes

fboulnois approved these changes Oct 16, 2024

View reviewed changes

rushtong merged commit 6edb7af into develop Oct 16, 2024

rushtong deleted the gr-DCJ-708-refactor-query branch October 16, 2024 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DCJ-708: Refactor query performance #2412

DCJ-708: Refactor query performance #2412

Uh oh!

rushtong commented Oct 16, 2024 •

edited

Loading

Uh oh!

rjohanek Oct 16, 2024

Uh oh!

rushtong Oct 16, 2024

Uh oh!

rjohanek Oct 16, 2024

Uh oh!

rushtong Oct 16, 2024

Uh oh!

fboulnois Oct 16, 2024

Uh oh!

rjohanek left a comment

Uh oh!

fboulnois left a comment •

edited

Loading

Uh oh!

fboulnois Oct 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DCJ-708: Refactor query performance #2412

DCJ-708: Refactor query performance #2412

Uh oh!

Conversation

rushtong commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Addresses

Summary

Uh oh!

rjohanek Oct 16, 2024

Choose a reason for hiding this comment

Uh oh!

rushtong Oct 16, 2024

Choose a reason for hiding this comment

Uh oh!

rjohanek Oct 16, 2024

Choose a reason for hiding this comment

Uh oh!

rushtong Oct 16, 2024

Choose a reason for hiding this comment

Uh oh!

fboulnois Oct 16, 2024

Choose a reason for hiding this comment

Uh oh!

rjohanek left a comment

Choose a reason for hiding this comment

Uh oh!

fboulnois left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fboulnois Oct 16, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rushtong commented Oct 16, 2024 •

edited

Loading

fboulnois left a comment •

edited

Loading