METRON-672: SolrIndexingIntegrationTest fails intermittently #424
Conversation
To test this locally, because it's extremely sporadic that it happens locally (1 out of every 50 times I run the test), I did the following:
|
isLoaded.set(true); | ||
return null; | ||
} | ||
); | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you kill the extra semicolon?
} | ||
} | ||
while(bytes == null || bytes.length == 0); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop the return, since it's a void method.
@@ -38,6 +39,7 @@ | |||
private String enrichmentConfigsPath; | |||
private String indexingConfigsPath; | |||
private String profilerConfigPath; | |||
private Optional<Function<ConfigUploadComponent, Void>> postStartCallback = Optional.empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this just use Consumer instead of Function? Since the second type parameter is Void, it seems like the Function is just being a Consumer anyway
Thanks for taking the effort to dig into this. Great work. Other than a couple minor comments, I'm very happy with this. |
@justinleet comments addressed, let me know if there's anything else. |
Thanks again for this. I'm +1 on it. |
This failure is due to a change in default behavior when indexing was split off into a separate configuration file. The default batch size was changed from
5
to1
in particular. This, by itself, is not a problem, but theIndexingIntegrationTest
(base class for Solr and Elastic search integration tests):The writing of the input data may happen before the topology fully loads or the configuration fully loads, especially if the machine running the unit tests is under load (like with travis). As a result, the first record may end up with the default batch size (of 1) and write out immediately because the indexing configs haven't loaded into zookeeper just yet. In that circumstance, eventually the configs load and the batch size is set to
5
. Meanwhile we've written 10 records and are expecting 10 in return, but because you wrote the first out already and then the next 5, we have another 4 pending to be written by theBulkMessageWriterBolt
.So, the failure scenario is as follows:
The fix is to ensure that we don't write out messages to kafka until the configs are loaded, which is what this PR does.