You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pushdown option not working as expected when using spark data frames. I created a simple app in scala to launch the thrift server and serve data frames created from elastic search documents.
A simple count query takes 15 secs to return a count of 6000.
Description
Steps to reproduce
This is how i launch the server. I use a spark JDBC driver from progress to execute the count query
That's because count operation cannot be pushed down, especially for DataFrame. That is Spark doesn't push down the operation, instead it loads all the items.
Several versions ago, we thought about implementing a custom method for RDDs however it was reverted since it had different semantics - the count was returned in an instance however the RDD content was not initialized which led to subtle changes.
Until Spark pushed down this operation, count ends up loading all the items.
Issue description
Pushdown option not working as expected when using spark data frames. I created a simple app in scala to launch the thrift server and serve data frames created from elastic search documents.
A simple count query takes 15 secs to return a count of 6000.
Description
Steps to reproduce
This is how i launch the server. I use a spark JDBC driver from progress to execute the count query
spark-submit --jars /usr/local/bin/elasticsearch-hadoop-2.2.0.jar --class "SimpleApp" --master local[4] target/scala-2.10/simple-project_2.10-1.0.jar
Code:
Strack trace:
Version Info
OS: : Centos 7
JVM : openjdk version "1.8.0_65"
Hadoop/Spark: spark-1.6.1-bin-hadoop2.6
ES-Hadoop : 2.2.0
ES : 2.1.1
The text was updated successfully, but these errors were encountered: