Skip to content

Conversation

@y-lakhdar
Copy link
Contributor

Summary

This PR increases the default batch size from 5MB to 256MB (matching the Coveo Stream API limit) and introduces configurable batching for all push services. The batch size can now be configured globally via system property (coveo.push.batchSize) or per-service via constructor parameter.

Key Changes

Features

  • Increased default batch size: From 5MB to 256MB for all services (UpdateStreamService, PushService, StreamService)
  • Configurable batching: Support for custom batch sizes up to 256MB via:
    • System property: -Dcoveo.push.batchSize=134217728
    • Constructor parameter for per-instance configuration
  • Validation: Automatic validation ensures batch sizes don't exceed the 256MB Stream API limit

Architecture Improvements

  • File container rotation per batch: UpdateStreamService now follows catalog stream API best practices where each batch (when exceeding the limit or on close) gets its own file container and is immediately pushed.
  • Workflow: Each flush operation now performs: Create container → Upload → Push (instead of reusing containers)
    *Better alignment: Matches Coveo's catalog stream API best practices

achouan-coveo and others added 5 commits January 15, 2026 14:43
…iguration

- Increased default batch size from 5MB to 256MB (Stream API limit)
- Added system property 'coveo.push.batchSize' for runtime configuration
- Made batch size configurable via constructor parameter in UpdateStreamService, PushService, and StreamService
- Added validation to prevent exceeding 256MB API limit
- Updated UpdateStreamService to create new file container per batch and push immediately
- Enhanced StreamDocumentUploadQueue with flushAndPush() for per-batch workflow

Configuration:
- System property: -Dcoveo.push.batchSize=<bytes>
- Constructor parameter: new UpdateStreamService(source, options, maxQueueSize)
- Priority: constructor parameter > system property > default (256MB)

Documentation:
- Added CONFIGURATION.md with comprehensive configuration guide
- Added UPGRADE_NOTES.md with migration guidance and behavioral changes
- Updated README.md with configuration section

Breaking changes:
- UpdateStreamService now pushes each batch immediately instead of accumulating multiple batches in one container
- Default batch size increased 51x (5MB -> 256MB) which may affect push frequency

Closes: Update calls to stream catalog API handled per-batch with configurable size
Updated CONFIGURATION.md and README.md to reflect new constructor signatures and usage examples for UpdateStreamService, PushService, and StreamService. Added a new constructor to UpdateStreamService that accepts userAgents as an optional parameter for enhanced flexibility.
@y-lakhdar y-lakhdar requested a review from a team as a code owner January 29, 2026 14:32
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 66b789e805

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 12 to 14
public PushService(PushEnabledSource source) {
this(source, new BackoffOptionsBuilder().build());
this(source, new BackoffOptionsBuilder().build(), DocumentUploadQueue.DEFAULT_QUEUE_SIZE);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor system batch size in PushService defaults

The new global batch-size system property is only read via DocumentUploadQueue.getConfiguredBatchSize(), but the default PushService constructor hardcodes DocumentUploadQueue.DEFAULT_QUEUE_SIZE. As a result, -Dcoveo.push.batchSize=... has no effect for PushService unless callers use the 3‑arg constructor, contradicting the documented global configuration and leading to unexpectedly large 256MB batches (or ignored smaller sizes). Consider using getConfiguredBatchSize() in the default constructors.

Useful? React with 👍 / 👎.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. If someone sets a customBatchSize in DocumentUploadQueue, should they also need to pass it to PushService? Or should DocumentUploadQueue be the single source of truth for batch size?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems concerning.

Comment on lines +94 to +98
logger.info("Creating file container, uploading, and pushing stream batch");

// Use the new createUploadAndPush method that handles the complete workflow
HttpResponse<String> response = this.updateStreamService.createUploadAndPush(stream);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard against null updateStreamService in flushAndPush

flushAndPush() unconditionally calls updateStreamService.createUploadAndPush(...), but updateStreamService is only set via setUpdateStreamService(...) and is never initialized in the constructors. If a caller constructs StreamDocumentUploadQueue directly (e.g., with an UploadStrategy) and relies on the auto-flush behavior when the batch limit is exceeded, the first auto-flush will throw a NullPointerException and drop the batch. Consider guarding against null or falling back to uploader.apply(...) when no UpdateStreamService is set.

Useful? React with 👍 / 👎.

@y-lakhdar y-lakhdar changed the title feat: increase batch size to 256MB and add configurable batching with runtime system property support feat: increase batch size to 256MB and add configurable batching with runtime system property Jan 29, 2026
Copy link
Contributor

@alexprudhomme alexprudhomme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Did you test an e2e scenario ?

protected static final int DEFAULT_QUEUE_SIZE = 256 * 1024 * 1024;

/** System property name for configuring the default batch size */
public static final String BATCH_SIZE_PROPERTY = "coveo.push.batchSize";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to understand. Why would someone use this to lower the batch size to 50 for example ? Would it save cost since the file containers would be smaller than by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common reasons:

  • Memory limits on client machines (256MB batches can be too heavy and lead to memory issues)
  • Faster feedback for smaller incremental updates
  • Network stability in constrained environments
  • Debugging or troubleshooting with smaller payloads
    It’s not about cost savings.

Comment on lines 12 to 14
public PushService(PushEnabledSource source) {
this(source, new BackoffOptionsBuilder().build());
this(source, new BackoffOptionsBuilder().build(), DocumentUploadQueue.DEFAULT_QUEUE_SIZE);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. If someone sets a customBatchSize in DocumentUploadQueue, should they also need to pass it to PushService? Or should DocumentUploadQueue be the single source of truth for batch size?

*
* @param source The source to push documents to.
* @param options The configuration options for exponential backoff.
* @param maxQueueSize The maximum batch size in bytes before auto-flushing (default: 256MB, max: 256MB).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe here I would add more emphasis on the 'in bytes' details and add an example

Comment on lines 12 to 14
public PushService(PushEnabledSource source) {
this(source, new BackoffOptionsBuilder().build());
this(source, new BackoffOptionsBuilder().build(), DocumentUploadQueue.DEFAULT_QUEUE_SIZE);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems concerning.

@y-lakhdar
Copy link
Contributor Author

PR Split Complete

This PR has been split into 3 smaller, independently reviewable PRs:

PR Title Status
#187 fix(PlatformClient): validate userAgents is not null Draft
#188 feat(DocumentUploadQueue): add configurable batch size infrastructure Draft
#189 feat(services): add configurable batch size with file container rotation Draft

Merge Order

PRs should be merged in order: #187#188#189

Fixes Applied

The split PRs include fixes for the review feedback:

  1. PushService system property: Now uses getConfiguredBatchSize() instead of hardcoded DEFAULT_QUEUE_SIZE
  2. Null guard in flushAndPush(): Guards against null updateStreamService with fallback to uploader.apply()

Once all 3 PRs are merged, this PR can be closed.

@y-lakhdar y-lakhdar closed this Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants