[DOC] add section on array include/exclude

relates #482 #484
elastic · Jan 7, 2016 · 74a687f · 74a687f
1 parent 2e992b9
commit 74a687f
Show file tree

Hide file tree

Showing 3 changed files with 39 additions and 7 deletions.
diff --git a/docs/src/reference/asciidoc/core/configuration.adoc b/docs/src/reference/asciidoc/core/configuration.adoc
@@ -179,12 +179,6 @@ The document field/property name containing the document time-to-live. To specif
 `es.mapping.timestamp` (default none)::
 The document field/property name containing the document timestamp. To specify a constant, use the `<CONSTANT>` format.
 
-added[2.1]
-`es.mapping.date.rich` (default true)::
-Whether to create a _rich_ +Date+ like object for +Date+ fields in {es} or returned them as primitives (+String+ or +long+). By default this is
-true. The actual object type is based on the library used; noteable exception being Map/Reduce which provides no built-in +Date+ object and as such
-+LongWritable+ and +Text+ are returned regardless of this setting.
-
 added[2.1]
 `es.mapping.include` (default none)::
 Field/property to be included in the document sent to {es}. Useful for _extracting_ the needed data from entities. The syntax is similar
@@ -218,6 +212,36 @@ in the document with the exception of any nested field named +description+. Addi
 document id extracted from field +uuid+.
 
 
+[float]
+[[cfg-field-info]]
+==== Field information (when reading from {es})
+
+added[2.1]
+`es.mapping.date.rich` (default true)::
+Whether to create a _rich_ +Date+ like object for +Date+ fields in {es} or returned them as primitives (+String+ or +long+). By default this is
+true. The actual object type is based on the library used; noteable exception being Map/Reduce which provides no built-in +Date+ object and as such
++LongWritable+ and +Text+ are returned regardless of this setting.
+
+added[2.2]
+`es.field.read.as.array.include` (default empty)::
+Fields/properties that should be considered as arrays/lists. Since {es} can map one or multiple values to a field, {eh} cannot determine from the mapping
+whether to instantiate one value or a array type (depending on the library type). When encountering multiple values, {eh} will automatically use 
+the array/list type but in strict mapping scenarios (like Spark SQL) this might lead to an unexpected schema change.
+The syntax is similar to that of {es} {ref}/search-request-source-filtering.html[include/exclude]. 
+Multiple values can be specified by using a comma. By default, no value is specified meaning no properties/fields are included.
+
+For example:
+[source,ini]
+----
+# mapping nested.bar as an array
+es.field.read.as.array.include = nested.bar
+----
+
+`es.field.read.as.array.exclude` (default empty)::
+Fields/properties that should be considered as arrays/lists. Similar to `es.field.read.as.array.include` above. Multiple values can be specified by using a comma. 
+By default, no value is specified meaning no properties/fields are excluded (and since none is included as indicated above), no field is treated as array
+before-hand.
+
 [float]
 ==== Metadata (when reading from {es})
 

diff --git a/docs/src/reference/asciidoc/core/mapping.adoc b/docs/src/reference/asciidoc/core/mapping.adoc
@@ -26,7 +26,13 @@ When it comes to handling dates, {es} always uses the http://en.wikipedia.org/wi
 
 It is important to note that JSON objects (delimited by `{}` and typically associated with maps) are *unordered*, in other words, they do *not* maintain order. JSON
 arrays (typically associated with lists or sequences) are *ordered*, that is, they *do* preserve the initial insertion order. This impacts the way objects are read from {es} as one might find the insertion structure to be different than the extraction one.
-It is however easy to circuvent this problem - as JSON objects (maps) contain fields, use the field names (or keys) instead of their position inside the document to reliably get their values (in Java terms think of a JSON object as a +HashMap+ as oppose to a +LinkedHashMap+).
+It is however easy to circumvent this problem - as JSON objects (maps) contain fields, use the field names (or keys) instead of their position inside the document to reliably get their values (in Java terms think of a JSON object as a +HashMap+ as oppose to a +LinkedHashMap+).
+
+[float]
+[[mapping-multi-values]]
+=== Handling array/multi-values fields
+
+{es} treats fields with single or multi-values the same; in fact the mapping provides no information about this. As a client though, it means one cannot tell whether a field is single-valued or not until is actually being read. In most cases this is not an issue and {eh} automatically creates the necessary list/array on the fly. However in environments with strict schema such as Spark SQL, changing a field actual value from its declared type is not allowed worse yet, this information needs to be available even before reading the data. And since the mapping is not conclusive enough, {eh} allows the user to specify the extra information through <<cfg-field-info, field information>> specifically +es.field.read.as.array.include+ and +es.field.read.as.array.include+.
 
 [float]
 === Automatic mapping

diff --git a/docs/src/reference/asciidoc/core/spark.adoc b/docs/src/reference/asciidoc/core/spark.adoc
@@ -540,6 +540,7 @@ JavaRDD<Map<String, Object>> filtered = esRDD.filter(doc ->
 [float]
 ==== Type conversion
 
+IMPORTANT: When dealing with multi-value/array fields, please see <<mapping-multi-values, this>> section and in particular <<cfg-field-info, these>> configuration option.
 IMPORTANT: If automatic index creation is used, please review <<auto-mapping-type-loss,this>> section for more information.
 
 {eh} automatically converts Spark built-in types to {es} {ref}/mapping-types.html[types] (and back) as shown in the table below:
@@ -1173,6 +1174,7 @@ JavaSchemaRDD people = JavaEsSparkSQL.esRDD(jsql, "spark/people", "?q=Smith" <1>
 [float]
 ==== Spark SQL Type conversion
 
+IMPORTANT: When dealing with multi-value/array fields, please see <<mapping-multi-values, this>> section and in particular <<cfg-field-info, these>> configuration option.
 IMPORTANT: If automatic index creation is used, please review <<auto-mapping-type-loss,this>> section for more information.
 
 {eh} automatically converts Spark built-in types to {es} {ref}/mapping-types.html[types] (and back) as shown in the table below: