Skip to content

pre load ondemand tablets for linear scans#3429

Merged
keith-turner merged 3 commits intoapache:elasticityfrom
keith-turner:accumulo-3309
Jul 5, 2023
Merged

pre load ondemand tablets for linear scans#3429
keith-turner merged 3 commits intoapache:elasticityfrom
keith-turner:accumulo-3309

Conversation

@keith-turner
Copy link
Copy Markdown
Contributor

This PR contains my experimentation with making scans pre load ondemand tablets. Below are some log messages from running BulkSplitOptimizationIT which creates around 57 splits in an on demand table and scans it. The scan is faster that it was before this change, but still slower than if all the tablets were loaded.

2023-05-25T20:49:31,910 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 2 ondemand tablets to be hosted.
2023-05-25T20:49:37,046 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 2 ondemand tablets to be hosted.
2023-05-25T20:49:43,249 [logging.InternalLoggerFactory] DEBUG: Using SLF4J as the default logging framework
2023-05-25T20:49:43,308 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 4 ondemand tablets to be hosted.
2023-05-25T20:49:49,793 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 8 ondemand tablets to be hosted.
2023-05-25T20:49:55,886 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 11 ondemand tablets to be hosted.
2023-05-25T20:50:02,054 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 11 ondemand tablets to be hosted.
2023-05-25T20:50:09,089 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 11 ondemand tablets to be hosted.
2023-05-25T20:50:13,316 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 9 ondemand tablets to be hosted.
     100,000 records read |    2,369 records/sec |    7,900,000 bytes read |  187,168 bytes/sec | 42.208 secs   

These changes are not ready to commit, just posting in case anyone want to look over them.

@keith-turner
Copy link
Copy Markdown
Contributor Author

Wondering if working on #3430 before continuing with this would be best.

@keith-turner
Copy link
Copy Markdown
Contributor Author

With commit ddf749e seeing a big speedup when running the BulkSplitOptimizationIT

2023-05-26T21:39:06,004 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 2 ondemand tablets to be hosted.
2023-05-26T21:39:06,410 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 1 ondemand tablets to be hosted.
2023-05-26T21:39:06,471 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 1 ondemand tablets to be hosted.
2023-05-26T21:39:06,622 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 1 ondemand tablets to be hosted.
2023-05-26T21:39:06,665 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 1 ondemand tablets to be hosted.
2023-05-26T21:39:06,694 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 1 ondemand tablets to be hosted.
2023-05-26T21:39:06,721 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 1 ondemand tablets to be hosted.
2023-05-26T21:39:06,767 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 1 ondemand tablets to be hosted.
2023-05-26T21:39:06,903 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 2 ondemand tablets to be hosted.
2023-05-26T21:39:06,955 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 2 ondemand tablets to be hosted.
2023-05-26T21:39:07,052 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 3 ondemand tablets to be hosted.
2023-05-26T21:39:07,142 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 4 ondemand tablets to be hosted.
2023-05-26T21:39:07,268 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 5 ondemand tablets to be hosted.
2023-05-26T21:39:07,403 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 5 ondemand tablets to be hosted.
2023-05-26T21:39:07,540 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 5 ondemand tablets to be hosted.
2023-05-26T21:39:07,769 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 5 ondemand tablets to be hosted.
2023-05-26T21:39:07,936 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 5 ondemand tablets to be hosted.
2023-05-26T21:39:08,049 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 5 ondemand tablets to be hosted.
2023-05-26T21:39:08,165 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 5 ondemand tablets to be hosted.
2023-05-26T21:39:08,376 [clientImpl.ClientTabletCacheImpl] DEBUG: Requesting 3 ondemand tablets to be hosted.
     100,000 records read |   36,284 records/sec |    7,900,000 bytes read | 2,866,473 bytes/sec |  2.756 secs   

@keith-turner
Copy link
Copy Markdown
Contributor Author

In #3437 processing of tablet hosting request was moved from the tserver to the manager. For #3437 a subset of the changes in this PR were taken and improved. This PR still has client side changes that try to keep the next few tablets for a scan hosted.

With only the changes in #3437 that scan in BulkSplitOptimizationIT takes 17 seconds, which makes a hosting request for each of the ~50 tablets being scanned and waits. The changes in the PR which have the changes in #3437 plus the changes to try to keep the next few tablets hosted only takes 2.7 seconds for the same scan of ~50 tablets. So still need to continue experimenting with the changes in this PR.

@keith-turner
Copy link
Copy Markdown
Contributor Author

keith-turner commented May 31, 2023

I pulled some of the changes that were in the draft version of this PR into #3437 and #3438. This PR now focuses solely on making linear scans host tablet ahead of time. Those changes were forced pushed in 2a4b318 and I am taking this out of draft.

@keith-turner keith-turner marked this pull request as ready for review May 31, 2023 19:51
@keith-turner keith-turner changed the title WIP experimenting with pre loading ondemand tablets pre load ondemand tablets for linear scans May 31, 2023
@keith-turner keith-turner linked an issue May 31, 2023 that may be closed by this pull request
return null;
if (tl != null && locationNeed == LocationNeed.REQUIRED) {
Map<KeyExtent,CachedTablet> extentsToHost =
findExtentsToHost(context, minimumHostAhead * 2, hostAheadRange, lcSession, tl);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why multiply hostAhead by 2 ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment in 730a8eb

}

CachedTablet followingTablet = _findTablet(context, currTablet.endRow(), true, false, true,
lcSession, LocationNeed.REQUIRED);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the caller of this method requests tablet hosting for the result, won't REQUIRED bring the tablets online?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is only called when the calling method has location need of required. I had thought of passing location need to this method and having it fail if it was not required. I can try that and see what it looks like.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the preconditions check which helps tightly couple the method with the caller. Looking into the code the _findTablet method does not host tablets, it uses the location need to determine if it needs to do a metadata lookup for tablets in the cache w/o a location.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the check in 730a8eb

@keith-turner keith-turner merged commit 78dc351 into apache:elasticity Jul 5, 2023
@keith-turner keith-turner deleted the accumulo-3309 branch July 5, 2023 20:36
dlmarion added a commit that referenced this pull request Oct 5, 2023
)

The change in #3429 modified ClientTabletCache so that it would
preload Tablets for Scanners. It appears this change may
have caused the test in #3809 to fail. I modified the comparison
of the range to the tablet end key using the same logic that
ThriftScanner uses and it fixed the test.

Fixes #3809
dlmarion added a commit to dlmarion/accumulo that referenced this pull request Oct 6, 2023
Revert change introduced in apache#3654 to ManagerAssignmentIT for
an extra tablet being brought online, something that was introduced
in apache#3429 and fixed in apache#3812.
dlmarion added a commit that referenced this pull request Oct 6, 2023
Revert change introduced in #3654 to ManagerAssignmentIT for
an extra tablet being brought online, something that was introduced
in #3429 and fixed in #3812.
@ctubbsii ctubbsii added this to the 4.0.0 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

OnDemand Follow-on: Optimize how clients request tablet hosting

3 participants