es.read.field.exclude behaves strangely with arrays #590

jeffsteinmetz · 2015-11-04T04:47:11Z

index a single document into spark/mappingtest such as:

{ "foo" : [5,6], 
  "nested": { 
     "bar" : [  {"date":"2015-01-01", "scores":[1,2]},
                {"date":"2015-01-02", "scores":[3,4]} ], 
     "what": "now" 
   } 
}

then run

    val conf = new SparkConf()
      .setAppName("test")
      .setMaster("local")
    conf.set("es.nodes", "localhost")
    conf.set("es.read.field.exclude", "nested.bar")

    val sc = new SparkContext(conf)
    val sql = new SQLContext(sc)
    val test = sql.esDF("spark/mappingtest")
    println(test.schema.treeString)

yields a schema with nested.bar still included, and other odd structs

root
 |-- foo: long (nullable = true)
 |-- nested: struct (nullable = true)
 |    |-- bar: struct (nullable = true)
 |    |    |-- date: timestamp (nullable = true)
 |    |    |-- scores: long (nullable = true)
 |    |-- what: string (nullable = true)
 |-- date: timestamp (nullable = true)
 |-- scores: long (nullable = true)
 |-- what: string (nullable = true)

expected

root
 |-- foo: long (nullable = true)
 |-- nested: struct (nullable = true)
 |    |-- what: string (nullable = true)

changing the exclude to conf.set("es.read.field.exclude", "nested")

returns

root
 |-- foo: long (nullable = true)
 |-- bar: struct (nullable = true)
 |    |-- date: timestamp (nullable = true)
 |    |-- scores: long (nullable = true)
 |-- date: timestamp (nullable = true)
 |-- scores: long (nullable = true)
 |-- what: string (nullable = true)

expected

root
 |-- foo: long (nullable = true)

conf.set("es.read.field.exclude", "bar*") returns the even stranger:

root
 |-- foo: long (nullable = true)
 |-- nested: struct (nullable = true)
 |    |-- bar: struct (nullable = true)
 |    |    |-- date: timestamp (nullable = true)
 |    |    |-- scores: long (nullable = true)
 |    |-- what: string (nullable = true)
 |-- bar: struct (nullable = true)
 |    |-- date: timestamp (nullable = true)
 |    |-- scores: long (nullable = true)
 |-- date: timestamp (nullable = true)
 |-- scores: long (nullable = true)
 |-- what: string (nullable = true)

conf.set("es.read.field.exclude", "nested*")

does however return

root
 |-- foo: long (nullable = true)

I also get:

java.lang.IllegalArgumentException: fields should have distinct names.
        at org.apache.spark.sql.types.DataTypes.createStructType(DataTypes.java:214)
        at org.elasticsearch.spark.sql.SchemaUtils$.convertToStruct(SchemaUtils.scala:105)
        at org.elasticsearch.spark.sql.SchemaUtils$.discoverMapping(SchemaUtils.scala:61)
        at org.elasticsearch.spark.sql.EsSparkSQL$.esDF(EsSparkSQL.scala:27)
        at org.elasticsearch.spark.sql.EsSparkSQL$.esDF(EsSparkSQL.scala:21)

when the json document contains arrays with matching key names but in different structs when using an exclude.
Likely because the exclude I use ends up with some artifacts moved into different tree levels as seen above.

The text was updated successfully, but these errors were encountered:

relates #590

costin · 2016-01-08T02:01:49Z

Hi,

Found the problem and fixed it (at least the filter parsing). However there is some more work needed when used with Spark SQL since the exclusion (in mapping) is not handled when reading the _source itself. See #648 - will update after a bit more digging.

costin · 2016-01-27T20:01:43Z

As there hasn't been any update, closing the issue.
@jeffsteinmetz Let me know if the latest dev version doesn't work for you. Cheers!

jeffsteinmetz · 2016-01-29T17:36:57Z

Confirmed, all of the tests above worked as expected. Thank you.

costin · 2016-01-29T17:46:05Z

Thanks for confirming. Cheers!

costin added :Spark v2.2.0-rc1 v2.1.3 bug labels Nov 15, 2015

costin mentioned this issue Jan 7, 2016

Multiple errors using Spark SQL with elasticsearch-spark_2.11 with version 2.2.0-m1 #644

Closed

costin added a commit that referenced this issue Jan 8, 2016

Fix bug in filtering field information

a70a9ea

relates #590

costin added a commit that referenced this issue Jan 8, 2016

Fix bug in filtering field information

f65b70f

relates #590

costin added v2.2.0 and removed v2.2.0-rc1 labels Jan 8, 2016

costin closed this as completed Jan 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

es.read.field.exclude behaves strangely with arrays #590

es.read.field.exclude behaves strangely with arrays #590

jeffsteinmetz commented Nov 4, 2015

costin commented Jan 8, 2016

costin commented Jan 27, 2016

jeffsteinmetz commented Jan 29, 2016

costin commented Jan 29, 2016

es.read.field.exclude behaves strangely with arrays #590

es.read.field.exclude behaves strangely with arrays #590

Comments

jeffsteinmetz commented Nov 4, 2015

costin commented Jan 8, 2016

costin commented Jan 27, 2016

jeffsteinmetz commented Jan 29, 2016

costin commented Jan 29, 2016