-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
A good improvement for to SolrIO would be to allow the caller to provide a commitWithin parameter. Currently the batch is passed to the underlying solrClient which results in defaulting to the configured server behavior.
The justification for exposing this is that the collection in the target SOLR server might be configured in a way that is not suitable for this beam job. E.g. a server tuned to accept real time updates with fast flush times from streaming Beam job 1, while Beam job 2 is doing an nightly bulk load.
This is related to (BEAM-3849, BEAM-3848, BEAM-3820) and should be considered together. I understand that the policy of Beam is not to expose parameters for tuning. When it comes to the IOs which are for interfacing with external systems I recommend this policy be reconsidered. The IO modules typically wrap clients to target systems (CloudSolrClient in this case) which all have tunable parameters for good reason. My recommendation would be to keep SolrIO.write() providing sensible defaults but expose an additional builder e.g. SolrIO.writeBuilder().withCommitWithinMs(300000).withBatchSize(9000).build() .
Please feel free to assign to me if of interest and I'll provide a PR.
Imported from Jira BEAM-3862. Original Jira may contain additional context.
Reported by: timrobertson100.