Unclear docs #990

ov7a · 2017-05-16T15:49:46Z

Partial copy of this thread:
https://discuss.elastic.co/t/writing-to-multiple-indices-and-documentation-about-it/82859

There are some unclear things in the documentation

At some point it says:

Note that multiple indices and/or types are allowed only for reading

But the next paragraph is titled "Dynamic/multi resource writes" and states that

For writing, elasticsearch-hadoop allows the target resource to be resolved at runtime by using patterns (by using the {} format), resolved at runtime based on the data being streamed to Elasticsearch. That is, one can save documents to a certain index or type based on one or multiple fields resolved from the document about to be saved.

Is seems like these two statements contradict each other. I'm still not sure is it possible to write to multiple indices or not.

Secondly, the example with timestamp also looks unclear. I think about resource as 'index/type' (please fix me if I'm wrong). In the example

# index the documents based on their date
es.resource.write = my-collection/{@timestamp:YYYY.MM.dd}

timestamp is a type right? But usually we separate indices by time, not data types. This seems wrong. I would expect timestamp to be a part of index:

# index the documents based on their date
es.resource.write = my-collection.{@timestamp:YYYY.MM.dd}/{media_type}

Is it valid resource? Will it work as expected (i.e. write to multiple indices)?

P.S. If I misplaced the issue, please specify a proper place to report it. I had zero replies at forum for almost a month. Elasticsearch/docs says: "If you find an error in the documentation, you should open an issue or pull request on the repository which contains the docs"

The text was updated successfully, but these errors were encountered:

jbaiera · 2017-05-16T16:21:06Z

This could certainly be cleared up a bit more:

Note that multiple indices and/or types are allowed only for reading

This corresponds more toward using index and type names like _all/foo, where multiple indices are being read by usage of a pattern sent to Elasticsearch.

For writing, elasticsearch-hadoop allows the target resource to be resolved at runtime by using patterns (by using the {} format), resolved at runtime based on the data being streamed to Elasticsearch. That is, one can save documents to a certain index or type based on one or multiple fields resolved from the document about to be saved.

In this case, it explains that you can use a special pattern (denoted by curly braces) to have the connector determine which index and type to save documents to at runtime using the values stored in the documents' fields. This is different from the above because we are resolving the resource to a single target resource at write time for each document. If you were to use this pattern with something that does not resolve to a single index (_all/{field} for instance will have a single type resolved, but the _all index does not correspond to a single index) then the writing operation will not be successful.

es.resource.write = my-collection/{@ timestamp:YYYY.MM.dd}

~ versus ~

es.resource.write = my-collection.{@ timestamp:YYYY.MM.dd}/{media_type}

This is mostly to highlight that you can use the @timestamp field in a pattern, and format it however you like. These patterns can exist in either the index path element or the type path element, it makes no difference. Multiple patterns can be used as well, they will be resolved at runtime, as long as in resolving them with data from the document they point to a single index afterward.

Does that clear things up? I'll look into expanding the documentation around this to clarify the differences in "multiple indices" for each situation.

ov7a · 2017-05-16T17:12:04Z

Yes, it does clear the things up, thank you.

Is it ok to create a feature-request for an ability to explicitly pass an index for a document through it metadata (alongside with id)?

jbaiera · 2017-05-16T17:18:10Z

@ov7a I think you should be able to do this currently by just specifying a field pattern as the entire index path item, like {index}/type

ov7a · 2017-05-16T17:25:31Z

What if index name depends on other things (not only fields) and I do not want to store them?
Or index name is a complex function depending on multiple fields?

ov7a · 2017-05-16T17:31:13Z

E.g.
if (field1 == value1 || field2 == value2)
index = index1
else
index = index2

jbaiera · 2017-05-16T18:51:39Z

@ov7a Even if we were to specify a configuration property that selects the entire index from the document field before writing it, it would be the exact same functionality as the existing pattern use case I explained above. In the even that you need to implement complex logic for selecting the index to send data to, I would suggest implementing that logic as part of your transformation steps before persisting to Elasticsearch. Finally, if you are concerned about adding unneeded fields to your index mappings, you can always mark those metafields to be excluded from the final document sent to Elasticsearch by using the es.mapping.exclude property. The fields will be available on the document for the purposes of filling in the index name, but will be omitted from the final rendered JSON data that is sent to Elasticsearch.

ov7a · 2017-05-16T18:58:19Z

I thought that way, but es.mapping.exclude feature is ignored when es.input.json is specified :(

jbaiera · 2017-05-16T19:00:02Z

Yeah, es.input.json is meant to be a performance boosting option to avoid the serialization overhead. Since to include es.mapping.exclude with that would mean we would have to parse the JSON fragments to remove them, we thought it best to leave it off.

ov7a · 2017-05-16T19:06:30Z

So, if I want es.input.json and complex index logic I have two options:

Extra field stored
Opt out of json

The feature request I want to propose is to provide extra metadata for each document, so it would be possible to have both json input and complex logic.

jbaiera · 2017-05-16T19:14:11Z

That's fine with me to open an enhancement ticket for that. Thanks!

jbaiera added the doc label May 16, 2017

ov7a mentioned this issue May 16, 2017

Ability to explicitly pass an index/type for a document through it metadata #991

Closed

jbaiera closed this as completed in 4d36e16 Jun 13, 2017

jbaiera added the v6.0.0 label Jun 13, 2017

jbaiera added the v6.0.0-beta1 label Jul 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unclear docs #990

Unclear docs #990

ov7a commented May 16, 2017

jbaiera commented May 16, 2017 •

edited

Loading

ov7a commented May 16, 2017

jbaiera commented May 16, 2017

ov7a commented May 16, 2017

ov7a commented May 16, 2017

jbaiera commented May 16, 2017

ov7a commented May 16, 2017

jbaiera commented May 16, 2017

ov7a commented May 16, 2017 •

edited

Loading

jbaiera commented May 16, 2017

Unclear docs #990

Unclear docs #990

Comments

ov7a commented May 16, 2017

jbaiera commented May 16, 2017 • edited Loading

ov7a commented May 16, 2017

jbaiera commented May 16, 2017

ov7a commented May 16, 2017

ov7a commented May 16, 2017

jbaiera commented May 16, 2017

ov7a commented May 16, 2017

jbaiera commented May 16, 2017

ov7a commented May 16, 2017 • edited Loading

jbaiera commented May 16, 2017

jbaiera commented May 16, 2017 •

edited

Loading

ov7a commented May 16, 2017 •

edited

Loading