Skip to content

Commit

Permalink
[DOC] add section on array include/exclude
Browse files Browse the repository at this point in the history
relates #482 #484
  • Loading branch information
costin committed Jan 7, 2016
1 parent 2e992b9 commit 74a687f
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 7 deletions.
36 changes: 30 additions & 6 deletions docs/src/reference/asciidoc/core/configuration.adoc
Expand Up @@ -179,12 +179,6 @@ The document field/property name containing the document time-to-live. To specif
`es.mapping.timestamp` (default none)::
The document field/property name containing the document timestamp. To specify a constant, use the `<CONSTANT>` format.

added[2.1]
`es.mapping.date.rich` (default true)::
Whether to create a _rich_ +Date+ like object for +Date+ fields in {es} or returned them as primitives (+String+ or +long+). By default this is
true. The actual object type is based on the library used; noteable exception being Map/Reduce which provides no built-in +Date+ object and as such
+LongWritable+ and +Text+ are returned regardless of this setting.

added[2.1]
`es.mapping.include` (default none)::
Field/property to be included in the document sent to {es}. Useful for _extracting_ the needed data from entities. The syntax is similar
Expand Down Expand Up @@ -218,6 +212,36 @@ in the document with the exception of any nested field named +description+. Addi
document id extracted from field +uuid+.


[float]
[[cfg-field-info]]
==== Field information (when reading from {es})

added[2.1]
`es.mapping.date.rich` (default true)::
Whether to create a _rich_ +Date+ like object for +Date+ fields in {es} or returned them as primitives (+String+ or +long+). By default this is
true. The actual object type is based on the library used; noteable exception being Map/Reduce which provides no built-in +Date+ object and as such
+LongWritable+ and +Text+ are returned regardless of this setting.

added[2.2]
`es.field.read.as.array.include` (default empty)::
Fields/properties that should be considered as arrays/lists. Since {es} can map one or multiple values to a field, {eh} cannot determine from the mapping
whether to instantiate one value or a array type (depending on the library type). When encountering multiple values, {eh} will automatically use
the array/list type but in strict mapping scenarios (like Spark SQL) this might lead to an unexpected schema change.
The syntax is similar to that of {es} {ref}/search-request-source-filtering.html[include/exclude].
Multiple values can be specified by using a comma. By default, no value is specified meaning no properties/fields are included.

For example:
[source,ini]
----
# mapping nested.bar as an array
es.field.read.as.array.include = nested.bar
----

`es.field.read.as.array.exclude` (default empty)::
Fields/properties that should be considered as arrays/lists. Similar to `es.field.read.as.array.include` above. Multiple values can be specified by using a comma.
By default, no value is specified meaning no properties/fields are excluded (and since none is included as indicated above), no field is treated as array
before-hand.

[float]
==== Metadata (when reading from {es})

Expand Down
8 changes: 7 additions & 1 deletion docs/src/reference/asciidoc/core/mapping.adoc
Expand Up @@ -26,7 +26,13 @@ When it comes to handling dates, {es} always uses the http://en.wikipedia.org/wi

It is important to note that JSON objects (delimited by `{}` and typically associated with maps) are *unordered*, in other words, they do *not* maintain order. JSON
arrays (typically associated with lists or sequences) are *ordered*, that is, they *do* preserve the initial insertion order. This impacts the way objects are read from {es} as one might find the insertion structure to be different than the extraction one.
It is however easy to circuvent this problem - as JSON objects (maps) contain fields, use the field names (or keys) instead of their position inside the document to reliably get their values (in Java terms think of a JSON object as a +HashMap+ as oppose to a +LinkedHashMap+).
It is however easy to circumvent this problem - as JSON objects (maps) contain fields, use the field names (or keys) instead of their position inside the document to reliably get their values (in Java terms think of a JSON object as a +HashMap+ as oppose to a +LinkedHashMap+).

[float]
[[mapping-multi-values]]
=== Handling array/multi-values fields

{es} treats fields with single or multi-values the same; in fact the mapping provides no information about this. As a client though, it means one cannot tell whether a field is single-valued or not until is actually being read. In most cases this is not an issue and {eh} automatically creates the necessary list/array on the fly. However in environments with strict schema such as Spark SQL, changing a field actual value from its declared type is not allowed worse yet, this information needs to be available even before reading the data. And since the mapping is not conclusive enough, {eh} allows the user to specify the extra information through <<cfg-field-info, field information>> specifically +es.field.read.as.array.include+ and +es.field.read.as.array.include+.

[float]
=== Automatic mapping
Expand Down
2 changes: 2 additions & 0 deletions docs/src/reference/asciidoc/core/spark.adoc
Expand Up @@ -540,6 +540,7 @@ JavaRDD<Map<String, Object>> filtered = esRDD.filter(doc ->
[float]
==== Type conversion

IMPORTANT: When dealing with multi-value/array fields, please see <<mapping-multi-values, this>> section and in particular <<cfg-field-info, these>> configuration option.
IMPORTANT: If automatic index creation is used, please review <<auto-mapping-type-loss,this>> section for more information.

{eh} automatically converts Spark built-in types to {es} {ref}/mapping-types.html[types] (and back) as shown in the table below:
Expand Down Expand Up @@ -1173,6 +1174,7 @@ JavaSchemaRDD people = JavaEsSparkSQL.esRDD(jsql, "spark/people", "?q=Smith" <1>
[float]
==== Spark SQL Type conversion

IMPORTANT: When dealing with multi-value/array fields, please see <<mapping-multi-values, this>> section and in particular <<cfg-field-info, these>> configuration option.
IMPORTANT: If automatic index creation is used, please review <<auto-mapping-type-loss,this>> section for more information.

{eh} automatically converts Spark built-in types to {es} {ref}/mapping-types.html[types] (and back) as shown in the table below:
Expand Down

0 comments on commit 74a687f

Please sign in to comment.