Only do a single listAll from FileSwitchDir #9666

mikemccand · 2015-02-12T00:39:22Z

With #6636 listAll now calls it twice (once for the mmap dir, once for niofs dir) ... I think we should fix this to be a single call?

rmuir · 2015-02-12T01:06:01Z

Patch is good to me. I still think the thing to fix would be removing the Files.isDirectory() call on each entry. Alternatively, since we are specialized here for the FSDir case, and if we dont need to filter out subdirectories, we could just implement the obvious list (java 8 Files.list().toArray but a little more for java7).

But, if this file listing is a hotspot for some reason, these changes will only make it Nx faster. Instead code should not list files unnecessarily. I am extremely concerned if we overoptimize here, that those problems will never get fixed.

rmuir · 2015-02-12T01:37:54Z

I ran a benchmark, on a static directory for simplicity. Test folder has 36,400 files, which is a bit on the high end, but its not unrealistic. I run one hundred iterations of each method.

The files.isDirectory() check is really the bad guy:
With my "fast drive" (ext4 on an SSD): its 2.6x faster (21ms per-call instead of 57ms per call)
With my "slow drive" (ntfs on a spinning disk): its over 60x faster (22ms per-call instead of 1.3s per call)

Still, as i said before, we are talking about something on the order of milliseconds and we need to ensure its not called unnecessarily unless its needed.

@imotov

A regression was introduced in the ES stats API call somewhere between ES 1.2.x. and 1.4.x that makes stats calls a lot more expensive. See elastic/elasticsearch#9681, https://groups.google.com/d/topic/elasticsearch/bOyBxgI9cMA/discussion and elastic/elasticsearch#9666 which are related issues. A ticket describing the exact issue is yet to be filed but I've been working offline on this with @imotov. The river makes frequent calls to the stats API which compounds the issue. Requesting only the needed thread_pool flags will contribute to mitigating the load on the cluster while the ES team fixes the stats issue on their side.

@imotov

A regression was introduced in the ES stats API call somewhere between ES 1.2.x. and 1.4.x that makes stats calls a lot more expensive. See elastic/elasticsearch#9681, https://groups.google.com/d/topic/elasticsearch/bOyBxgI9cMA/discussion and elastic/elasticsearch#9666 which are related issues. A ticket describing the exact issue is yet to be filed but I've been working offline on this with @imotov. The river makes frequent calls to the stats API which compounds the issue. Requesting only the needed thread_pool flags will contribute to mitigating the load on the cluster while the ES team fixes the stats issue on their side.

s1monw · 2015-02-23T08:47:16Z

LGTM

In #6636 we switched to a default FileSwitchDirectory that made .listAll run twice on the same underlying file system directory. This fixes listAll to do a single directory listing again. Closes #9666

In elastic#6636 we switched to a default FileSwitchDirectory that made .listAll run twice on the same underlying file system directory. This fixes listAll to do a single directory listing again. Closes elastic#9666

mikemccand self-assigned this Feb 12, 2015

mikemccand added the v0.12.0 label Feb 12, 2015

only do a single listAll

acda4fb

mikemccand added v2.0.0-beta1 v1.5.0 v1.4.4 and removed v0.12.0 labels Feb 12, 2015

spinscale added v1.4.5 and removed v1.4.4 labels Feb 19, 2015

gpstathis mentioned this pull request Feb 19, 2015

Clear unused flags to reduce load on cluster. richardwilly98/elasticsearch-river-mongodb#472

Closed

drewr force-pushed the master branch from dcc3da0 to 7c20a8a Compare February 20, 2015 16:48

mikemccand closed this in 6867f4a Feb 23, 2015

clintongormley added :Core/Infra/Core Core issues without another label >enhancement labels Mar 19, 2015

clintongormley changed the title ~~Core: only do a single listAll from FileSwitchDir~~ Only do a single listAll from FileSwitchDir Jun 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only do a single listAll from FileSwitchDir #9666

Only do a single listAll from FileSwitchDir #9666

mikemccand commented Feb 12, 2015

rmuir commented Feb 12, 2015

rmuir commented Feb 12, 2015

s1monw commented Feb 23, 2015

Only do a single listAll from FileSwitchDir #9666

Only do a single listAll from FileSwitchDir #9666

Conversation

mikemccand commented Feb 12, 2015

rmuir commented Feb 12, 2015

rmuir commented Feb 12, 2015

s1monw commented Feb 23, 2015