Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Elasticsearch 5 #259

Closed
jimmyjones2 opened this issue Sep 25, 2016 · 16 comments
Closed

Support Elasticsearch 5 #259

jimmyjones2 opened this issue Sep 25, 2016 · 16 comments

Comments

@jimmyjones2
Copy link
Contributor

jimmyjones2 commented Sep 25, 2016

elasticdump 2.4.2, elasticsearch 5.0.0-beta1

elasticdump --input=http://localhost:9200 --output=abc.json --type=data

Sun, 25 Sep 2016 13:07:15 GMT | starting dump
Sun, 25 Sep 2016 13:07:15 GMT | Error Emitted => {"error":{"root_cause":[{"type":"parsing_exception","reason":"The field [fields] is no longer supported, please use [stored_fields] to retrieve stored fields or _source filtering if the field is not stored","line":1,"col":36}],"type":"parsing_exception","reason":"The field [fields] is no longer supported, please use [stored_fields] to retrieve stored fields or _source filtering if the field is not stored","line":1,"col":36},"status":400}
Sun, 25 Sep 2016 13:07:15 GMT | Total Writes: 0
Sun, 25 Sep 2016 13:07:15 GMT | dump ended with error (get phase) => Error: {"error":{"root_cause":[{"type":"parsing_exception","reason":"The field [fields] is no longer supported, please use [stored_fields] to retrieve stored fields or _source filtering if the field is not stored","line":1,"col":36}],"type":"parsing_exception","reason":"The field [fields] is no longer supported, please use [stored_fields] to retrieve stored fields or _source filtering if the field is not stored","line":1,"col":36},"status":400}

As a workaround, the following works:

elasticdump --input=http://localhost:9200 --output=abc.json --type=data --searchBody '{"query": { "match_all": {} }, "stored_fields": ["*"], "_source": true }'
@evantahler
Copy link
Collaborator

Are you able to tackle this change in a pull request?

@sspilleman
Copy link

You have to change:

./node_modules/elasticdump/elasticdump.js

self.options.searchBody = {"query": { "match_all": {} }, "fields": ["*"], "_source": true };

to

self.options.searchBody = {"query": { "match_all": {} }, "stored_fields": ["*"], "_source": true };

@evantahler
Copy link
Collaborator

Looks like we need a test then to poll for which version ES we are connecting too before running any export action. That shouldn't be too bad.

@sspilleman
Copy link

sspilleman commented Oct 15, 2016

Yes, or parameterize is on the command line somewhere, which might be easier to do as a quick fix. Polling is definitely a cleaner / safer option though, as it's error-prone otherwise

@guenhter
Copy link

guenhter commented Nov 3, 2016

Test before any action would be the most convenient way. So the user isn't bothered with any version-specific stuff.

@nfantone
Copy link

nfantone commented Nov 3, 2016

I'd like to point out that I just tried:

elasticdump --input=http://staging.elasticsearch:9200/some-index --output=http://localhost:9200/some-index --type=data

Where output was a 5.0.0 Elasticsearch instance and input was at 2.3.4 and it all went smoothly.

@guenhter
Copy link

guenhter commented Nov 3, 2016

This works well because the query for reading is fine for ES 2.x
It wont work the other way around.

@AlexKovalevich
Copy link

Is it possible to support ingest pipeline name introduced in ES 5.0.0?
When doing bulk index - I can add it like this:
{"index":{"_id":123,"_type":"my_type","pipeline":"my_pipeline"}}. But elasticdump does not support bulk mode any longer, and I can't pass it as parameter like this:
elasticdump --input=my_source_file.json --output="http://server:9200/my_index/?pipeline=my_pipeline"

@AlexKovalevich
Copy link

adding to the
c:\Users%myUser%\AppData\Roaming\npm\node_modules\elasticdump\lib\transports\elasticsearch.js
"pipeline" to extraFields helped for me!

from line 319:
elasticsearch.prototype.setData = function(data, limit, offset, callback) {
if (data.length === 0){ return callback(null, 0); }

var self = this;
var error = null;
var extraFields = ['routing', 'parent', 'timestamp', 'ttl', 'pipeline'];
var writes = 0;

It reads pipeline from "{"index":{"_id":123,"_type":"my_type","pipeline":"my_pipeline"}}." and deliveres to ES correctly.

@schmandforke
Copy link

schmandforke commented Nov 21, 2016

Got another exception after the patch from @sspilleman:

elasticdump --input=http://xx.xx.xx.xxx:9200/.kibana --output=$ --type=data
Mon, 21 Nov 2016 20:34:23 GMT | Error Emitted => {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"No search type for [scan]"}],"type":"illegal_argument_exception","reason":"No search type for [scan]"},"status":400}

@trompx
Copy link

trompx commented Dec 1, 2016

Getting the same error as @schmandforke with kibana 5.0.2

@evantahler
Copy link
Collaborator

Support for ES 5 will be solved by #268

@evantahler
Copy link
Collaborator

@schmandforke and @trompx can you please create a new issue for what you are seeing with kibana, including an index dump which can be used to test against.

@nan008
Copy link

nan008 commented Mar 9, 2017

@AlexKovalevich how did you attach pipeline to the index? I am trying to restore the dump from the old server using the pipeline with the elasticdump to avoid reindexing, and I added the pipeline to the extrafields in elasticsearch.js but it is still not restoring the dump without the fields I want to get rid of from the old machine

@yizipiaoxiang
Copy link

yizipiaoxiang commented Apr 24, 2019

elasticdump 5.1.0, elasticsearch 5.5.1
[root@api-www bin]# elasticdump --type=data --searchBody '{"query": { "match_all": {} }, "stored_fields": ["*"], "_source": true }' --input=http://localhost:9200/es_index --output=$ | gzip > /workspace/data/es_index.json.gz

Problem solved

@AlexKovalevich
Copy link

@nan008
I my case I had a text dump with pipelines in it. The issue was that elastic dump was ignoring pipeline value from the dump and was't passing it ES, so my fix helped with that.

From what I can see in your comment your source may not have pipeline values at first place. In this case the solution may differ.

Keep in mind, that when you import one document, you can add pipeline as request param
"POST server/yourindice/?pipeline=pipeline_name".
I don't know if it will work for bulk upload though, cause bulk upload expects pipeline to be in document definition:
{"index":{"_id":123,"pipeline":"pipeline_name"}}
{yourDocument}

So, if ES DUMP supports it, I'd try reimporting docs by one with explicit pipeline parameter, if it doesn't - I don't clean stream based solution. You'd have to add "pipeline":"pipeline_name" to your dump somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants