Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pig Storage index name with "_" #91

Closed
nmaillard opened this issue Oct 3, 2013 · 3 comments
Closed

Pig Storage index name with "_" #91

nmaillard opened this issue Oct 3, 2013 · 3 comments

Comments

@nmaillard
Copy link

Hello everyone

I have come across this situation this morning where an "_" in an ES index name will crash the PigStorage.

Here is my pig script

DEFINE ESStorage org.elasticsearch.hadoop.pig.ESStorage('es.host=myhost');
A = LOAD '/file' USING PigStorage() AS (id:long, name, url:chararray, picture: chararray);
STORE A INTO 'index_1/artists' USING org.elasticsearch.hadoop.pig.ESStorage();

This will crash with .StringIndexOutOfBoundsException

The reason is in org.elasticsearch.hadoop.rest.Resource.
line 32 we split location as such:int location = resource.lastIndexOf("_");
I'm guessing this is to find the _search part of the ressource in the case of load.
However in store this will make my store 'index_1/artists' to 'index'
and the next lines:
location = localRoot.substring(0, root.length() - 1).lastIndexOf("/");
will send back a -1 since there is no "/"

I'm thinking look for a stricter matching "search" instead of ""
or disallow "_" in index names all together and raise an error if one is present.

I'm using the current master, I'll go ahead and try my first idea of a stricter matching and let you know how it works.

thanks for all the hard work

@costin
Copy link
Member

costin commented Oct 3, 2013

The entire parsing of the URL needs to be overhauled especially as one might not use _search and rely on an embedded
QueryDSL.
For the time being I suggest using a different index name as a temporary workaround.

Thanks!

On 03/10/2013 12:30 PM, Nicolas Maillard wrote:

Hello everyone

I have come across this situation this morning where an "_" in an ES index name will crash the PigStorage.

Here is my pig script

DEFINE ESStorage org.elasticsearch.hadoop.pig.ESStorage('es.host=myhost');
A = LOAD '/file' USING PigStorage() AS (id:long, name, url:chararray, picture: chararray);
STORE A INTO 'index_1/artists' USING org.elasticsearch.hadoop.pig.ESStorage();

This will crash with .StringIndexOutOfBoundsException

The reason is in org.elasticsearch.hadoop.rest.Resource.
line 32 we split location as such:int location = resource.lastIndexOf("_");
I'm guessing this is to find the _search part of the ressource in the case of load.
However in store this will make my store 'index_1/artists' to 'index'
and the next lines:
location = localRoot.substring(0, root.length() - 1).lastIndexOf("/");
will send back a -1 since there is no "/"

I'm thinking look for a stricter matching "search" instead of ""
or disallow "_" in index names all together and raise an error if one is present.

I'm using the current master, I'll go ahead and try my first idea of a stricter matching and let you know how it works.

thanks for all the hard work


Reply to this email directly or view it on GitHub #91.

Costin

@dmoore247
Copy link

Same as issue #80, I hit this problem too :(

@costin
Copy link
Member

costin commented Dec 5, 2013

This is fixed in master - let me know if you still encounter issues.

@costin costin closed this as completed Dec 5, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants