Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

es.read.field.exclude behaves strangely with arrays #590

Closed
jeffsteinmetz opened this issue Nov 4, 2015 · 4 comments
Closed

es.read.field.exclude behaves strangely with arrays #590

jeffsteinmetz opened this issue Nov 4, 2015 · 4 comments

Comments

@jeffsteinmetz
Copy link

index a single document into spark/mappingtest such as:

{ "foo" : [5,6], 
  "nested": { 
     "bar" : [  {"date":"2015-01-01", "scores":[1,2]},
                {"date":"2015-01-02", "scores":[3,4]} ], 
     "what": "now" 
   } 
}

then run

    val conf = new SparkConf()
      .setAppName("test")
      .setMaster("local")
    conf.set("es.nodes", "localhost")
    conf.set("es.read.field.exclude", "nested.bar")

    val sc = new SparkContext(conf)
    val sql = new SQLContext(sc)
    val test = sql.esDF("spark/mappingtest")
    println(test.schema.treeString)

yields a schema with nested.bar still included, and other odd structs

root
 |-- foo: long (nullable = true)
 |-- nested: struct (nullable = true)
 |    |-- bar: struct (nullable = true)
 |    |    |-- date: timestamp (nullable = true)
 |    |    |-- scores: long (nullable = true)
 |    |-- what: string (nullable = true)
 |-- date: timestamp (nullable = true)
 |-- scores: long (nullable = true)
 |-- what: string (nullable = true)

expected

root
 |-- foo: long (nullable = true)
 |-- nested: struct (nullable = true)
 |    |-- what: string (nullable = true)

changing the exclude to conf.set("es.read.field.exclude", "nested")

returns

root
 |-- foo: long (nullable = true)
 |-- bar: struct (nullable = true)
 |    |-- date: timestamp (nullable = true)
 |    |-- scores: long (nullable = true)
 |-- date: timestamp (nullable = true)
 |-- scores: long (nullable = true)
 |-- what: string (nullable = true)

expected

root
 |-- foo: long (nullable = true)

conf.set("es.read.field.exclude", "bar*") returns the even stranger:

root
 |-- foo: long (nullable = true)
 |-- nested: struct (nullable = true)
 |    |-- bar: struct (nullable = true)
 |    |    |-- date: timestamp (nullable = true)
 |    |    |-- scores: long (nullable = true)
 |    |-- what: string (nullable = true)
 |-- bar: struct (nullable = true)
 |    |-- date: timestamp (nullable = true)
 |    |-- scores: long (nullable = true)
 |-- date: timestamp (nullable = true)
 |-- scores: long (nullable = true)
 |-- what: string (nullable = true)

conf.set("es.read.field.exclude", "nested*")

does however return

root
 |-- foo: long (nullable = true)

I also get:

java.lang.IllegalArgumentException: fields should have distinct names.
        at org.apache.spark.sql.types.DataTypes.createStructType(DataTypes.java:214)
        at org.elasticsearch.spark.sql.SchemaUtils$.convertToStruct(SchemaUtils.scala:105)
        at org.elasticsearch.spark.sql.SchemaUtils$.discoverMapping(SchemaUtils.scala:61)
        at org.elasticsearch.spark.sql.EsSparkSQL$.esDF(EsSparkSQL.scala:27)
        at org.elasticsearch.spark.sql.EsSparkSQL$.esDF(EsSparkSQL.scala:21)

when the json document contains arrays with matching key names but in different structs when using an exclude.
Likely because the exclude I use ends up with some artifacts moved into different tree levels as seen above.

@costin
Copy link
Member

costin commented Jan 8, 2016

Hi,

Found the problem and fixed it (at least the filter parsing). However there is some more work needed when used with Spark SQL since the exclusion (in mapping) is not handled when reading the _source itself. See #648 - will update after a bit more digging.

@costin costin added v2.2.0 and removed v2.2.0-rc1 labels Jan 8, 2016
@costin
Copy link
Member

costin commented Jan 27, 2016

As there hasn't been any update, closing the issue.
@jeffsteinmetz Let me know if the latest dev version doesn't work for you. Cheers!

@costin costin closed this as completed Jan 27, 2016
@jeffsteinmetz
Copy link
Author

Confirmed, all of the tests above worked as expected. Thank you.

@costin
Copy link
Member

costin commented Jan 29, 2016

Thanks for confirming. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants