New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReindexIT#testReindexTask can fail with assertion error. #46144
Comments
Pinging @elastic/es-distributed |
The server log file contents are here:
Looks like it was very slow in running this. Not only reindex was slow, but also storing the task result afterwards. |
Following is some of the log of activity right before/after:
It looks like archives are being build concurrently with test running, which I imagine could be the cause of the slowness depending on available iops/bandwidth on disks. Following are my main suspects of harmful parallel activity (which is rather much guesswork):
A hypothesis here is that there is heavy writing (possibly in multiple threads) going on while the test runs and that this delays the test enough that it fails. |
Ping @elastic/es-core-infra I hope you can comment on this and maybe read more out of the logs (on what activity was really parallel) than I can. Maybe also add info on how our CI servers are sized (seems like the job used 16 parallel threads)? |
The resolution done for #46091 should also ensure this test stays within its 10s limit, so closing this. |
Unfortunately the failure doesn't reproduce for me locally, and I didn't see any clues in the logs as to what could be causing the reindex to fail to complete. This failure first popped up in a master intake build.
Link to the build: https://elasticsearch-ci.elastic.co/view/Elasticsearch%20master/job/elastic+elasticsearch+master+multijob+fast+part1/1032/
Command to reproduce:
Relevant excerpt from the logs:
The text was updated successfully, but these errors were encountered: