-
Notifications
You must be signed in to change notification settings - Fork 4
feat: increase batch size to 256MB and add configurable batching with runtime system property #186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…iguration - Increased default batch size from 5MB to 256MB (Stream API limit) - Added system property 'coveo.push.batchSize' for runtime configuration - Made batch size configurable via constructor parameter in UpdateStreamService, PushService, and StreamService - Added validation to prevent exceeding 256MB API limit - Updated UpdateStreamService to create new file container per batch and push immediately - Enhanced StreamDocumentUploadQueue with flushAndPush() for per-batch workflow Configuration: - System property: -Dcoveo.push.batchSize=<bytes> - Constructor parameter: new UpdateStreamService(source, options, maxQueueSize) - Priority: constructor parameter > system property > default (256MB) Documentation: - Added CONFIGURATION.md with comprehensive configuration guide - Added UPGRADE_NOTES.md with migration guidance and behavioral changes - Updated README.md with configuration section Breaking changes: - UpdateStreamService now pushes each batch immediately instead of accumulating multiple batches in one container - Default batch size increased 51x (5MB -> 256MB) which may affect push frequency Closes: Update calls to stream catalog API handled per-batch with configurable size
Updated CONFIGURATION.md and README.md to reflect new constructor signatures and usage examples for UpdateStreamService, PushService, and StreamService. Added a new constructor to UpdateStreamService that accepts userAgents as an optional parameter for enhanced flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 66b789e805
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| public PushService(PushEnabledSource source) { | ||
| this(source, new BackoffOptionsBuilder().build()); | ||
| this(source, new BackoffOptionsBuilder().build(), DocumentUploadQueue.DEFAULT_QUEUE_SIZE); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honor system batch size in PushService defaults
The new global batch-size system property is only read via DocumentUploadQueue.getConfiguredBatchSize(), but the default PushService constructor hardcodes DocumentUploadQueue.DEFAULT_QUEUE_SIZE. As a result, -Dcoveo.push.batchSize=... has no effect for PushService unless callers use the 3‑arg constructor, contradicting the documented global configuration and leading to unexpectedly large 256MB batches (or ignored smaller sizes). Consider using getConfiguredBatchSize() in the default constructors.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. If someone sets a customBatchSize in DocumentUploadQueue, should they also need to pass it to PushService? Or should DocumentUploadQueue be the single source of truth for batch size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems concerning.
| logger.info("Creating file container, uploading, and pushing stream batch"); | ||
|
|
||
| // Use the new createUploadAndPush method that handles the complete workflow | ||
| HttpResponse<String> response = this.updateStreamService.createUploadAndPush(stream); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against null updateStreamService in flushAndPush
flushAndPush() unconditionally calls updateStreamService.createUploadAndPush(...), but updateStreamService is only set via setUpdateStreamService(...) and is never initialized in the constructors. If a caller constructs StreamDocumentUploadQueue directly (e.g., with an UploadStrategy) and relies on the auto-flush behavior when the batch limit is exceeded, the first auto-flush will throw a NullPointerException and drop the batch. Consider guarding against null or falling back to uploader.apply(...) when no UpdateStreamService is set.
Useful? React with 👍 / 👎.
alexprudhomme
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Did you test an e2e scenario ?
| protected static final int DEFAULT_QUEUE_SIZE = 256 * 1024 * 1024; | ||
|
|
||
| /** System property name for configuring the default batch size */ | ||
| public static final String BATCH_SIZE_PROPERTY = "coveo.push.batchSize"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just trying to understand. Why would someone use this to lower the batch size to 50 for example ? Would it save cost since the file containers would be smaller than by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Common reasons:
- Memory limits on client machines (256MB batches can be too heavy and lead to memory issues)
- Faster feedback for smaller incremental updates
- Network stability in constrained environments
- Debugging or troubleshooting with smaller payloads
It’s not about cost savings.
| public PushService(PushEnabledSource source) { | ||
| this(source, new BackoffOptionsBuilder().build()); | ||
| this(source, new BackoffOptionsBuilder().build(), DocumentUploadQueue.DEFAULT_QUEUE_SIZE); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. If someone sets a customBatchSize in DocumentUploadQueue, should they also need to pass it to PushService? Or should DocumentUploadQueue be the single source of truth for batch size?
| * | ||
| * @param source The source to push documents to. | ||
| * @param options The configuration options for exponential backoff. | ||
| * @param maxQueueSize The maximum batch size in bytes before auto-flushing (default: 256MB, max: 256MB). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe here I would add more emphasis on the 'in bytes' details and add an example
| public PushService(PushEnabledSource source) { | ||
| this(source, new BackoffOptionsBuilder().build()); | ||
| this(source, new BackoffOptionsBuilder().build(), DocumentUploadQueue.DEFAULT_QUEUE_SIZE); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems concerning.
PR Split CompleteThis PR has been split into 3 smaller, independently reviewable PRs:
Merge OrderPRs should be merged in order: #187 → #188 → #189 Fixes AppliedThe split PRs include fixes for the review feedback:
Once all 3 PRs are merged, this PR can be closed. |
Summary
This PR increases the default batch size from 5MB to 256MB (matching the Coveo Stream API limit) and introduces configurable batching for all push services. The batch size can now be configured globally via system property (
coveo.push.batchSize) or per-service via constructor parameter.Key Changes
Features
UpdateStreamService,PushService,StreamService)-Dcoveo.push.batchSize=134217728Architecture Improvements
UpdateStreamServicenow follows catalog stream API best practices where each batch (when exceeding the limit or on close) gets its own file container and is immediately pushed.*Better alignment: Matches Coveo's catalog stream API best practices