You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[X ] Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
The easier it is to track down the bug, the faster it is solved.
Feature Request. Start by telling us what problem you’re trying to solve.
Often a solution already exists! Don’t send pull requests to implement new features without
first getting our support. Sometimes we leave features out on purpose to keep the project small.
Issue description
Define some boolean fields in ES (using mapping). Load values 0/1 for those fields. Try to read them from Spark. This throws an exception. Only false/true values are supported.
I understand that for custom dates its also not working (See Issue #624 ). But for booleans, it seems pretty easy to support correct boolean values as specified here : https://www.elastic.co/guide/en/elasticsearch/reference/current/boolean.html. That is:
False values: false, "false", "off", "no", "0", "" (empty string), 0, 0.0
True values: Anything that isn’t false.
Using the correct java.lang.Boolean value should be easy.
Possible workarounds:
reindex your ES dataset with false/true values but this can be annoying in some cases
I Think that it makes sense, but if I am wrong, well tell me, no problem.
Steps to reproduce
Code:
package bug.reproduce;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;
import org.elasticsearch.spark.sql.api.java.JavaEsSparkSQL;
/**
* Reproduce bug where reading 0/1 boolean values from ES is not working.
* Prepare ES index with test data using the curl or Sense following commands:
DELETE /spark
POST /spark
{
"mappings": {
"data": {
"dynamic_templates": [
{
"startingWithBool": {
"match": "bool*",
"match_mapping_type": "string",
"mapping": {
"type": "boolean"
}
}
}
]
}
}
}
GET /spark/_mapping
POST /spark/data/1
{
"id": "1",
"boolField" : "false"
}
POST /spark/data/2
{
"id": "2",
"boolField" : "true"
}
POST /spark/data/3
{
"id": "3",
"boolField" : "0"
}
POST /spark/data/4
{
"id": "4",
"boolField" : "1"
}
GET /spark/data/1
GET /spark/data/2
GET /spark/data/3
GET /spark/data/4
*/
public class SparkEsBooleanBug {
public static void main(String[] args)
{
/**
* Setup Spark and ES connection
*/
SparkConf sparkConf = new SparkConf().setAppName("Spark-ES Reproduce Bug").setMaster("local");
// Point to ES local instance
sparkConf.set("spark.es.nodes", "localhost");
sparkConf.set("spark.es.port", "9200");
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);
SQLContext sqlContext = new SQLContext(javaSparkContext);
// Setup table wrapping ES index
DataFrame esDataFrame = JavaEsSparkSQL.esDF(sqlContext, "spark/data");
esDataFrame.registerTempTable("TEST_TABLE");
/**
* Read documents with false/true values-> no problem
*/
System.out.println("################# Reading documents with false/true values, this should be ok...");
DataFrame resultDataFrame = sqlContext.sql("SELECT * FROM TEST_TABLE WHERE id <= '2'");
resultDataFrame.show();
/**
* Read documents with 0/1 values-> exception:
* org.elasticsearch.hadoop.rest.EsHadoopParsingException: Cannot parse value [1] for field [boolField]
*/
System.out.println("################# Reading documents with 0/1 values, this raises an exception...");
resultDataFrame = sqlContext.sql("SELECT * FROM TEST_TABLE WHERE id >= '3'");
resultDataFrame.show();
}
}
Strack trace:
org.elasticsearch.hadoop.rest.EsHadoopParsingException: Cannot parse value [1] for field [boolField]
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:713)
at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:806)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:704)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:458)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:383)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:278)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:251)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:456)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:86)
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: For input string: "1"
at scala.collection.immutable.StringLike$class.parseBoolean(StringLike.scala:238)
at scala.collection.immutable.StringLike$class.toBoolean(StringLike.scala:226)
at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:31)
at org.elasticsearch.spark.serialization.ScalaValueReader.parseBoolean(ScalaValueReader.scala:112)
at org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$booleanValue$1.apply(ScalaValueReader.scala:111)
at org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$booleanValue$1.apply(ScalaValueReader.scala:111)
at org.elasticsearch.spark.serialization.ScalaValueReader.checkNull(ScalaValueReader.scala:81)
at org.elasticsearch.spark.serialization.ScalaValueReader.booleanValue(ScalaValueReader.scala:111)
at org.elasticsearch.spark.serialization.ScalaValueReader.readValue(ScalaValueReader.scala:67)
at org.elasticsearch.spark.sql.ScalaRowValueReader.readValue(ScalaEsRowValueReader.scala:28)
at org.elasticsearch.hadoop.serialization.ScrollReader.parseValue(ScrollReader.java:726)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:711)
... 34 more
Version Info
OS: : Linux
JVM : 7
Hadoop/Spark: spark-sql_2.10, version 1.6.1
ES-Hadoop : spark-sql_2.10, version 1.6.1
ES : elasticsearch-spark_2.10, version 2.3.2 and also 2.3.2 for external instance
The text was updated successfully, but these errors were encountered:
What kind an issue is this?
The easier it is to track down the bug, the faster it is solved.
Often a solution already exists! Don’t send pull requests to implement new features without
first getting our support. Sometimes we leave features out on purpose to keep the project small.
Issue description
Define some boolean fields in ES (using mapping). Load values 0/1 for those fields. Try to read them from Spark. This throws an exception. Only false/true values are supported.
I understand that for custom dates its also not working (See Issue #624 ). But for booleans, it seems pretty easy to support correct boolean values as specified here : https://www.elastic.co/guide/en/elasticsearch/reference/current/boolean.html. That is:
False values: false, "false", "off", "no", "0", "" (empty string), 0, 0.0
True values: Anything that isn’t false.
Using the correct java.lang.Boolean value should be easy.
Possible workarounds:
I Think that it makes sense, but if I am wrong, well tell me, no problem.
Steps to reproduce
Code:
Strack trace:
Version Info
OS: : Linux
JVM : 7
Hadoop/Spark: spark-sql_2.10, version 1.6.1
ES-Hadoop : spark-sql_2.10, version 1.6.1
ES : elasticsearch-spark_2.10, version 2.3.2 and also 2.3.2 for external instance
The text was updated successfully, but these errors were encountered: