Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

es.field.read.as.array.include only works one level deep #589

Closed
jeffsteinmetz opened this issue Nov 3, 2015 · 9 comments
Closed

es.field.read.as.array.include only works one level deep #589

jeffsteinmetz opened this issue Nov 3, 2015 · 9 comments

Comments

@jeffsteinmetz
Copy link

I just ran into a real world situation, where there is a list of arrays, inside another array.

val sc = new SparkContext(conf)
val cfg = collection.mutable.Map("es.field.read.as.array.include" -> "nested.bar,foo,nested.bar.scores")
val json = """{"foo" : [5,6], "nested": { "bar" : [{"date":"2015-01-01", "scores":[1,2]},{"date":"2015-01-01", "scores":[3,4]}], "what": "now" } }"""
sc.makeRDD(Seq(json)).saveJsonToEs("spark/mappingtest")
val df = new SQLContext(sc).read
      .options(cfg)
      .format("org.elasticsearch.spark.sql")
      .load("spark/mappingtest")
println(df.collect().toList)

addin nested.bar.scores to es.field.read.as.array.include seems to not help ES hadoop with a hint that there is an array at this level.

throws:
EsHadoopIllegalStateException: Field 'nested.bar.scores' not found; typically this occurs with arrays which are not mapped as single value

@costin
Copy link
Member

costin commented Jan 8, 2016

@jeffsteinmetz Hi,
I've pushed a fix in master which addressed this (snapshot is currently building and should be in Maven shortly); please try it out.

@jeffsteinmetz
Copy link
Author

Great, I'll take a look.
On Jan 7, 2016 5:59 PM, "Costin Leau" notifications@github.com wrote:

@jeffsteinmetz https://github.com/jeffsteinmetz Hi,
I've pushed a fix in master which addressed this (snapshot is currently
building and should be in Maven shortly); please try it out.


Reply to this email directly or view it on GitHub
#589 (comment)
.

@jeffsteinmetz
Copy link
Author

I am still getting
org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field 'nested.bar.scores' not found; typically this occurs with arrays which are not mapped as single value using latest snapshot build.

@costin
Copy link
Member

costin commented Jan 29, 2016

I've incorporated your example in the integration tests which passes just fine.
Can you please take a look and let me know what's the difference?

@costin costin reopened this Jan 29, 2016
@costin
Copy link
Member

costin commented Jan 29, 2016

Actually look at the test above and the difference is in the option - I've changed that form es.field.read.. to es.read.field.as.array to make it more consistent.
Changing the field strings or using ConfigurationOptions should fix the problem.

@jeffsteinmetz
Copy link
Author

using ES_READ_FIELD_AS_ARRAY_INCLUDE fixed the issue! Thanks man.

@costin
Copy link
Member

costin commented Jan 29, 2016

@jeffsteinmetz Cheers. Thanks for the feedback. And your patience. And for continuing to report issues on this project - much appreciated!

@fpopic
Copy link

fpopic commented Jul 29, 2016

Sorry if not nice place to ask:

Is it somehow possible to store scala map[k,v] like nested object in elasticsearch?

When I read Exception in thread "main" org.apache.spark.sql.AnalysisException:
cannot resolve 'cast(Extras as map<string,bigint>)' due to data type mismatch:

cannot cast

StructType(StructField(From_Stall_Id,LongType,true), StructField(HoldingPen_Id,LongType,true), StructField(To_Stall_Id,LongType,true))
to MapType(StringType, LongType, true);

before writing schema:

root
 |-- Events_Id: long (nullable = true)
 |-- Extras: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

after reading schema:

root
 |-- Events_Id: long (nullable = true)
 |-- Extras: struct (nullable = true)
 |    |-- HoldingPen_Id: string (nullable = true)
 |    |-- To_Stall_Id: string (nullable = true)
 |    |-- From_Stall_Id: string (nullable = true)

I use UDF and withColumn to parse it right,
but is it possible to avoid parsing StructType to MapType:

    val extrasUdf = sparkSQL.udf.register("es_extras", (from : Any, to : Any, pen : Any) => {

      val extras = scala.collection.mutable.Map.empty[String, Long]

      from match {case from : Long => extras += ("From_Stall_Id" -> from); case _ =>}
      to match {case to : Long => extras += ("To_Stall_Id" -> to); case _ =>}
      pen match {case pen : Long => extras += ("HoldingPen_Id" -> pen); case _ =>}

      extras 
    })

@hemangakbari
Copy link

@fpopic set following while reading
reader.option("es.read.metadata", True)
reader.option("es.read.metadata.field", Extras)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants