Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support external versioning of documents #343

Closed
aewhite opened this issue Dec 14, 2014 · 15 comments
Closed

Support external versioning of documents #343

aewhite opened this issue Dec 14, 2014 · 15 comments

Comments

@aewhite
Copy link

aewhite commented Dec 14, 2014

As per ES Bulk API Docs it is possible to handle external version types. Currently when applying an external version I get the error:

org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [Conflict(409) - [VersionConflictEngineException[[index-name][0] [type][id]: version conflict, current [-1], provided [1418513087126]]]]; Bailing out..

Perhaps I missed a feature but my browsing of the code did not indicate that this feature was supported. I am using elasticsearch-hadoop-2.0.2 with elasticsearch-1.4.0

@costin
Copy link
Member

costin commented Jan 12, 2015

@aewhite Hi,

This is supported through the es.mapping.version setting (explained here).

@aewhite
Copy link
Author

aewhite commented Jan 12, 2015

That will set the version on the document but when used with custom version number, ES generates the error I referenced above. I need some way to set version_type to external on the bulk requests to ES. That's what I don't see documented or in the source code.

@costin
Copy link
Member

costin commented Jan 12, 2015

Sorry, I misread your post. Looks like an oversight which should be addressed.

@costin
Copy link
Member

costin commented Jan 12, 2015

Just added support for it and published the builds. One can now specify the external type through es.mapping.version.type - if not specified, it will automatically be added as external if a version mapping is specified.
In other words, if you specify es.mapping.version, automatically version_type will be added as external. If you want to fully control the value, simply specify the value.

Please try it out and let me know how it works for you. Thanks!

costin added a commit that referenced this issue Jan 12, 2015
@aewhite
Copy link
Author

aewhite commented Jan 12, 2015

Great, it might be a while before I can test this personally since we have a workaround in place and priorities are in flux. I took a peek the associated commits for this issue and I would expect them to solve the issue. Thanks.

@costin
Copy link
Member

costin commented Jan 12, 2015

Sure - let me know how it goes. Any rough ETAs (just to know how to schedule the issue)? Cheers.

@aewhite
Copy link
Author

aewhite commented Jan 13, 2015

Turns out that I am going to have a couple of days to work on ES after all. Expect an update by the end of the week.

@aewhite
Copy link
Author

aewhite commented Jan 13, 2015

Ok, I tested with v2.1.0Beta3 (that didn't work). I didn't see a v2.1.0Beta4 but thought I would try since you tagged it in the issue. I then tested with the development snapshot and that did work.

I look forward to the official release. This feature will help us out a lot.

@aewhite
Copy link
Author

aewhite commented Jan 13, 2015

Something I have noticed is that if a document does produce a conflict the entire job will fail with a 409.

I understand this behaviour from an ES standpoint but I'm not sure how ES-Hadoop should handle it. In my case I just want to ignore the conflict since my process guarantees that that newer versions are more up to date than older version.

From a usability standpoint it does seem surprising that a single version conflict would cause a complete failure but perhaps that is the most correct stance to take.

@costin
Copy link
Member

costin commented Jan 14, 2015

@aewhite Currently, es-hadoop takes a fail fast approach. In the future, we could potentially support error trapping so the problematic documents are 'logged' somewhere (which can be tricky depending on the environment - HDFS might be read-only or unavailable).

@aewhite
Copy link
Author

aewhite commented Jan 14, 2015

Sounds good, in that case I guess my original issue is resolved.

@costin
Copy link
Member

costin commented Jan 14, 2015

@aewhite By the way, there is an issue, created by a user, for ignoring 'failures` here - #308, in case you are interested. Feel free to add your input.

Closing this one...

@costin costin closed this as completed Jan 14, 2015
costin added a commit that referenced this issue Jan 14, 2015
costin added a commit that referenced this issue Jan 14, 2015
[DOC] Explain version_type setting
relates to #343
(cherry picked from commit a5148b8)
(cherry picked from commit ef76969)
(cherry picked from commit bc2f096)
@gautamjeyaraman
Copy link

Hi, I too tried this with v2.1.0Beta3 and it din't work. Found that this was added to v2.1.0Beta4 but its not yet released right? When will v2.1.0Beta4 be out? Can you tell a date approximately or should I go with the nightly build for now?

@costin
Copy link
Member

costin commented Feb 26, 2015

Try the nightly/dev build - there's no ETA for Beta4 yet.

@gautamjeyaraman
Copy link

It is working in the nightly/dev build. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants