Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only do a single listAll from FileSwitchDir #9666

Closed
wants to merge 1 commit into from

Conversation

mikemccand
Copy link
Contributor

With #6636 listAll now calls it twice (once for the mmap dir, once for niofs dir) ... I think we should fix this to be a single call?

@mikemccand mikemccand self-assigned this Feb 12, 2015
@rmuir
Copy link
Contributor

rmuir commented Feb 12, 2015

Patch is good to me. I still think the thing to fix would be removing the Files.isDirectory() call on each entry. Alternatively, since we are specialized here for the FSDir case, and if we dont need to filter out subdirectories, we could just implement the obvious list (java 8 Files.list().toArray but a little more for java7).

But, if this file listing is a hotspot for some reason, these changes will only make it Nx faster. Instead code should not list files unnecessarily. I am extremely concerned if we overoptimize here, that those problems will never get fixed.

@rmuir
Copy link
Contributor

rmuir commented Feb 12, 2015

I ran a benchmark, on a static directory for simplicity. Test folder has 36,400 files, which is a bit on the high end, but its not unrealistic. I run one hundred iterations of each method.

The files.isDirectory() check is really the bad guy:
With my "fast drive" (ext4 on an SSD): its 2.6x faster (21ms per-call instead of 57ms per call)
With my "slow drive" (ntfs on a spinning disk): its over 60x faster (22ms per-call instead of 1.3s per call)

Still, as i said before, we are talking about something on the order of milliseconds and we need to ensure its not called unnecessarily unless its needed.

gpstathis pushed a commit to gpstathis/elasticsearch-river-mongodb that referenced this pull request Feb 14, 2015
A regression was introduced in the ES stats API call somewhere between ES 1.2.x. and 1.4.x that makes stats calls a lot more expensive. See elastic/elasticsearch#9681, https://groups.google.com/d/topic/elasticsearch/bOyBxgI9cMA/discussion and elastic/elasticsearch#9666 which are related issues. A ticket describing the exact issue is yet to be filed but I've been working offline on this with @imotov. The river makes frequent calls to the stats API which compounds the issue. Requesting only the needed thread_pool flags will contribute to mitigating the load on the cluster while the ES team fixes the stats issue on their side.
@spinscale spinscale added v1.4.5 and removed v1.4.4 labels Feb 19, 2015
gpstathis pushed a commit to gpstathis/elasticsearch-river-mongodb that referenced this pull request Feb 19, 2015
A regression was introduced in the ES stats API call somewhere between ES 1.2.x. and 1.4.x that makes stats calls a lot more expensive. See elastic/elasticsearch#9681, https://groups.google.com/d/topic/elasticsearch/bOyBxgI9cMA/discussion and elastic/elasticsearch#9666 which are related issues. A ticket describing the exact issue is yet to be filed but I've been working offline on this with @imotov. The river makes frequent calls to the stats API which compounds the issue. Requesting only the needed thread_pool flags will contribute to mitigating the load on the cluster while the ES team fixes the stats issue on their side.
@s1monw
Copy link
Contributor

s1monw commented Feb 23, 2015

LGTM

mikemccand added a commit that referenced this pull request Feb 23, 2015
In #6636 we switched to a default FileSwitchDirectory that made
.listAll run twice on the same underlying file system directory.

This fixes listAll to do a single directory listing again.

Closes #9666
mikemccand added a commit that referenced this pull request Feb 23, 2015
In #6636 we switched to a default FileSwitchDirectory that made
.listAll run twice on the same underlying file system directory.

This fixes listAll to do a single directory listing again.

Closes #9666
@clintongormley clintongormley added :Core/Infra/Core Core issues without another label >enhancement labels Mar 19, 2015
@clintongormley clintongormley changed the title Core: only do a single listAll from FileSwitchDir Only do a single listAll from FileSwitchDir Jun 6, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
In elastic#6636 we switched to a default FileSwitchDirectory that made
.listAll run twice on the same underlying file system directory.

This fixes listAll to do a single directory listing again.

Closes elastic#9666
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants