Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a hybrid directory default using mmapfs / niofs #6636

Merged
merged 1 commit into from Jul 9, 2014

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Jun 27, 2014

mmapfs is really good for random access but can have sideeffects if
memory maps are large depending on the operating system etc. A hybrid
solution where only selected files are actually memory mapped but others
mostly consumed sequentially brings the best of both worlds and
minimizes the memory map impact.
This commit mmaps only the dvd and tim file for fast random access
on docvalues and term dictionaries.

@s1monw
Copy link
Contributor Author

s1monw commented Jun 27, 2014

FYI - I think the name here can be improved... I just wanted to get the code out asap.

@s1monw
Copy link
Contributor Author

s1monw commented Jul 2, 2014

@rmuir @kimchy any comment or better ideas for the name?

@rjernst
Copy link
Member

rjernst commented Jul 2, 2014

The name file_switch doesn't mean anything to me. What about something like minimal_mmap or mixed_mmap_nio?

@rmuir
Copy link
Contributor

rmuir commented Jul 2, 2014

One idea is just default. Its whatever we think is a good default...

@kimchy
Copy link
Member

kimchy commented Jul 2, 2014

I like default

@rjernst
Copy link
Member

rjernst commented Jul 3, 2014

default is good

@dakrone
Copy link
Member

dakrone commented Jul 3, 2014

I don't think default is a good name, what happens if we make an improvement in the future and this is no longer the default? It also makes it harder to talk about in conversation:

"what directory type are you using?"
"oh, I'm using the default"
"... what version of ES are you on?"
"I'm on Elasticsearch 0.90/1.0/1.3"
"oh, then the default is actually niofs/mmapfs/mixed"
"okay, then I'm using that"
"wait, which operating system are you using?"
"windows"
"oh, then the default is mmapfs"
"I'm using a 32-bit JDK"
"oh, then the default is simplefs, you should really stop using a 32-bit windows JVM..."

I like mixed_mmap_nio personally.

@rmuir
Copy link
Contributor

rmuir commented Jul 3, 2014

@dakrone but that situation already exists today, no? Even if we name it mixed_mmap_nio, we'd have the exact same conversation.

@dakrone
Copy link
Member

dakrone commented Jul 3, 2014

@rmuir yep, it totally does, I do think that naming it default would make it worse though. It means an extra "wait, did you set it to 'default', or do you have it unset and you're using the defaults?" question all the time.

@rmuir
Copy link
Contributor

rmuir commented Jul 3, 2014

but if our default is always default then there is no confusion?

@dakrone
Copy link
Member

dakrone commented Jul 3, 2014

Yep, there is none, so default is a fine name for this. If it changes in the future though, we're deferring the conversation to give this a name (if/when this changes in ES version 3.4.1 or so)

@dakrone
Copy link
Member

dakrone commented Jul 4, 2014

Okay, after more discussion we agreed on default for now.

@dakrone dakrone removed the review label Jul 4, 2014
@s1monw
Copy link
Contributor Author

s1monw commented Jul 9, 2014

@rmuir @rjernst @dakrone @kimchy I updated this commit - would love to get another review

@s1monw s1monw added the review label Jul 9, 2014
`niofs` for the rest.
operating environment will be automatically chosen: `mmap` on
Windows 64bit, `simplefs` on Windows 32bit, and `default`
(hybrid `niofs` and `mmap`) for the rest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should still be mmapfs (in both places) since that's what the name of the setting is, otherwise someone could set it to mmap and it wouldn't take effect.

@kimchy
Copy link
Member

kimchy commented Jul 9, 2014

LGTM, some comments were already made :)

@@ -46,26 +92,32 @@ public IndexStoreModule(Settings settings) {
@Override
public Iterable<? extends Module> spawnModules() {
Class<? extends Module> indexStoreModule = NioFsIndexStoreModule.class;
// Same logic as FSDirectory#open ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI this is a bit out of date with respect to FSDirectory.open. Actually we default to mmap on all 64-bit systems now, as its faster on the macos X too.

@s1monw
Copy link
Contributor Author

s1monw commented Jul 9, 2014

I pushed several new commits

@s1monw s1monw added the review label Jul 9, 2014
@s1monw
Copy link
Contributor Author

s1monw commented Jul 9, 2014

I think it's ready

@rmuir
Copy link
Contributor

rmuir commented Jul 9, 2014

looks good.

@kimchy
Copy link
Member

kimchy commented Jul 9, 2014

+1, LGTM

`mmapfs` is really good for random access but can have sideeffects if
memory maps are large depending on the operating system etc. A hybrid
solution where only selected files are actually memory mapped but others
mostly consumed sequentially brings the best of both worlds and
minimizes the memory map impact.
This commit mmaps only the `dvd` and `tim` file for fast random access
on docvalues and term dictionaries.

Closes elastic#6636
@s1monw s1monw merged commit d82a434 into elastic:master Jul 9, 2014
@s1monw s1monw removed the review label Jul 9, 2014
@s1monw s1monw deleted the file_switch_dir branch July 9, 2014 22:03
@s1monw s1monw changed the title [STORE] Make file_switch directory default using mmapfs / niofs [STORE] Make a hybrid directory default using mmapfs / niofs Jul 9, 2014
s1monw added a commit that referenced this pull request Jul 9, 2014
`mmapfs` is really good for random access but can have sideeffects if
memory maps are large depending on the operating system etc. A hybrid
solution where only selected files are actually memory mapped but others
mostly consumed sequentially brings the best of both worlds and
minimizes the memory map impact.
This commit mmaps only the `dvd` and `tim` file for fast random access
on docvalues and term dictionaries.

Closes #6636
@uschindler
Copy link
Contributor

There is already a similar issue in Lucene: https://issues.apache.org/jira/browse/LUCENE-1743

This one was not about random access (it did not exist at that time), but the idea is the same. A file switch by file name is more natural to me.

@uschindler
Copy link
Contributor

@s1monw
ElasticSearch has settings "index.compound_format" and "index.compound_on_flush" (see http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html). If this is used (not many people do, but some do because they run out of file descriptors), this would make all files use NIO only, although they contain the term dictionary or docvalues.

Maybe add "cfs" to the list of extensions?

@rmuir
Copy link
Contributor

rmuir commented Jul 16, 2014

We should not add .cfs in my opinion. if such cfs options are enabled, it means we are mmaping the whole index again, which we want to avoid (purpose of this issue). By default, the only thing using .cfs are tiny segments flushed from indexwriter. Because they are small performance is not really critical there.

@uschindler
Copy link
Contributor

@s1monw @rmuir @kimchy One cool thing for the WeakRef haters: This reduces also load to GC, because we dont create so many weak refs when cloning MMapIndexinputs: Random access inputs dont really need to be cloned all the time and on every request (no state involved, as position is not needed). Stuff like posting lists are cloned all the time, but those are now not weak ref'ed because they are read using NIO.

@clintongormley clintongormley changed the title [STORE] Make a hybrid directory default using mmapfs / niofs Internal: Make a hybrid directory default using mmapfs / niofs Jul 16, 2014
mikemccand added a commit that referenced this pull request Feb 23, 2015
In #6636 we switched to a default FileSwitchDirectory that made
.listAll run twice on the same underlying file system directory.

This fixes listAll to do a single directory listing again.

Closes #9666
mikemccand added a commit that referenced this pull request Feb 23, 2015
In #6636 we switched to a default FileSwitchDirectory that made
.listAll run twice on the same underlying file system directory.

This fixes listAll to do a single directory listing again.

Closes #9666
mikemccand added a commit that referenced this pull request Feb 23, 2015
In #6636 we switched to a default FileSwitchDirectory that made
.listAll run twice on the same underlying file system directory.

This fixes listAll to do a single directory listing again.

Closes #9666
@clintongormley clintongormley added the :Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. label Jun 7, 2015
@clintongormley clintongormley changed the title Internal: Make a hybrid directory default using mmapfs / niofs Make a hybrid directory default using mmapfs / niofs Jun 7, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
In elastic#6636 we switched to a default FileSwitchDirectory that made
.listAll run twice on the same underlying file system directory.

This fixes listAll to do a single directory listing again.

Closes elastic#9666
@bleskes bleskes mentioned this pull request Mar 8, 2016
@clintongormley clintongormley added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement release highlight v1.3.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants