Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CXXCBC-242: SDK Support for Native KV Range Scans #419

Merged
merged 10 commits into from Jun 28, 2023

Conversation

DemetrisChr
Copy link
Contributor

@DemetrisChr DemetrisChr commented Jun 22, 2023

Updated the implementation to follow the newest RFC changes

Notable changes

  • Added prefix_scan as top-level scan type
  • Changed scan terms from byte vectors to strings
  • Seed for sampling scan is randomly generated if not specified
  • Removed range_scan_cancelled error code, returning request_canceled instead
  • Removed sorting
  • Added concurrency option to orchestrator to set the maximum allowed level of concurrency
  • The constructor of range_scan_orchestrator takes the vbucket map instead of the number of vbuckets
  • Implemented new concurrency approach for scanning vbuckets:
    • Streams start with the maximum level of concurrency. If a temporary failure occurs (i.e. receiving busy status from the server) the stream is retried a later point and the concurrency is reduced by 1
    • When a stream finishes successfully or with a benign error, another stream is initiated to take its place. If the temporary failure occurs, a new stream is not initiated which effectively reduces the concurrency (unless streams are not being executed concurrently in which case that continues)
    • The number of streams being run per node are kept track of and when a new stream should start, a vbucket on the least busy node is selected
  • Renamed range_scan_timeout to key_value_scan_timeout in timeout defaults
  • Removed batch_time_limit from options, 90% of the timeout is used instead
  • Timeouts are now on both range_scan_create and range_scan_continue. There's also a check before retrying a stream if the time since the first attempt exceeds the timeout
  • The next methods of the scan result can return an error code in the case of a fatal error
  • Added cancel() method to scan_result that can cancel all streams
  • Errors on range scan continue or start are now separated into 'fatal' (which takes into account whether the scan is a sampling scan), 'retryable' or 'benign'

@DemetrisChr DemetrisChr requested a review from avsej June 22, 2023 16:53
@DemetrisChr DemetrisChr marked this pull request as draft June 22, 2023 16:53
@DemetrisChr DemetrisChr removed the request for review from avsej June 22, 2023 16:53
@DemetrisChr DemetrisChr requested a review from avsej June 22, 2023 17:02
@DemetrisChr DemetrisChr marked this pull request as ready for review June 22, 2023 17:03
@avsej avsej merged commit 9394a00 into couchbaselabs:main Jun 28, 2023
13 of 14 checks passed
@DemetrisChr DemetrisChr deleted the range-scan branch June 28, 2023 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants