Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resiliency: Forbid index names over 100 characters in length #7252

Merged
merged 1 commit into from
Aug 13, 2014

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Aug 13, 2014

Fixes #4417

I picked 100 sort of arbitrarily, I'm open to any suggestions for a better limit.

@jpountz
Copy link
Contributor

jpountz commented Aug 13, 2014

I was just looking at this page: http://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits and 255 seems to be a common limit across main filesystems? It seems that the limit is in bytes though, so maybe we should also check that index names are in ASCII?

@dakrone
Copy link
Member Author

dakrone commented Aug 13, 2014

@jpountz I think the name restrictions should fall under #6736, we already have tests for non-ascii names, so I'm not sure we want to make this breaking change in this issue, this is just to prevent DOSing a system.

@jpountz
Copy link
Contributor

jpountz commented Aug 13, 2014

But then 100 might not be enough, I just applied your patch and ran the following code and was still able to reproduce the issue:

        final char[] chars = new char[110];
        int c = 6068;
        int len = 0;
        while (true) {
            final int count = Character.toChars(c, chars, len);
            if (len + count > 100) {
                break;
            } else {
                len += count;
            }
        }
        final String indexName = new String(chars, 0, len);
        createIndex(indexName);

(This weird String uses a code point that is one UTF16 char but 3 UTF8 bytes, so although its length is less than 100, the number of bytes is greater than 255 which is the limit on my filesystem.)

So I think we either need to decrease the limit (at least for the case when the host encodes file names using UTF8) or enforce that index names are in ASCII first?

@jpountz
Copy link
Contributor

jpountz commented Aug 13, 2014

Or maybe we could temporarily do something like:

if (index is ASCII) {
  maxLength = 255; // common limit for most filesystems
} else {
  maxLength = 80; // protect against crazy chars in index names
}

@dakrone
Copy link
Member Author

dakrone commented Aug 13, 2014

So I think we either need to decrease the limit (at least for the case when the host encodes file names using UTF8) or enforce that index names are in ASCII first?

What about enforcing it is below 100 UTF-8 bytes? We could use index.getBytes("UTF-8").length?

@jpountz
Copy link
Contributor

jpountz commented Aug 13, 2014

That would work for me.

@dakrone
Copy link
Member Author

dakrone commented Aug 13, 2014

Closing in favor of #6736

@dakrone dakrone closed this Aug 13, 2014
@dakrone
Copy link
Member Author

dakrone commented Aug 13, 2014

Reopening, this will be back-ported to 1.3.

@dakrone dakrone reopened this Aug 13, 2014
@dakrone dakrone added v1.3.0 and removed v1.3.0 labels Aug 13, 2014
@jpountz
Copy link
Contributor

jpountz commented Aug 13, 2014

LGTM

@dakrone dakrone merged commit fe86edd into elastic:master Aug 13, 2014
@nik9000
Copy link
Member

nik9000 commented Aug 13, 2014

Quick sanity check: my longest index is 44 bytes and I encode a ton of
stuff in that. What I'm indexing, flavor of index, and time the index was
built. 100 bytes ought to be enough for everyone.

@jpountz jpountz removed the review label Aug 18, 2014
@clintongormley clintongormley changed the title Forbid index names over 100 characters in length Resiliency: Forbid index names over 100 characters in length Sep 8, 2014
@dakrone dakrone deleted the fix-long-index-names branch September 9, 2014 13:52
@klausbrunner
Copy link

@nik9000 @clintongormley I'm afraid just like 640k back in the day, 100 bytes isn't enough for everyone. We also encode a fair bit of prefix stuff into the index name, and due to the way our customers generate their "external" index names, it just isn't enough for all use cases.

My suggestion would be to make this configurable (with a default of 100), and I'll submit a patch if this has a chance of being accepted.

@nik9000
Copy link
Member

nik9000 commented Sep 30, 2014

I suppose I should have surrounded my last sentence with sarcasm tags. Its certainly enough for me at this point and will probably stay that way for the foreseeable future but I don't claim to speak for everyone.

@dakrone
Copy link
Member Author

dakrone commented Sep 30, 2014

@klausbrunner out of curiosity, how long is your longest index name?

@klausbrunner
Copy link

@dakrone About 140 characters, which can be a bit more in UTF8 bytes as we allow some non ASCII characters.

@clintongormley
Copy link

@dakrone any problem with making the limit 255 bytes?

@dakrone
Copy link
Member Author

dakrone commented Oct 14, 2014

@clintongormley no, no problem with it, that works fine for me.

@clintongormley
Copy link

ok good - i think we should get that into 1.4?

@dakrone
Copy link
Member Author

dakrone commented Oct 14, 2014

@clintongormley sure, want to open another issue for it and assign me? I will work on it when I get a chance, shouldn't take too long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Elasticsearch goes into infinite loop with long database names
5 participants