Closed
Description
Hi,
I am doing the following to fetch data from my ES instance
SparkConf conf = new SparkConf().setAppName("Simple Application")
.set("es.resource", "myindex/account")
.set("es.nodes","192.168.224.94").set("es.port","9200")
.set("es.index.auto.create","no").set("es.nodes.discovery","false").set("pushdown","true");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
DataFrame myEsDump = JavaEsSparkSQL.esDF(sqlContext);
myEsDump.registerTempTable("allAccounts");
DataFrame accounts = sqlContext.sql("SELECT name FROM allAccounts WHERE name = 'Name-801'");
This runs fine and gives me the record I want. However, it appears that this never makes a ES query. I have enabled slow logging for all queries and I never see ES being queried. What would be the reason that all the ES documents are being sucked in and a filter being applied in the Spark layer? I though that enabling pushdown should disable such behavior.
Here are the versions that I am using
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark_2.10</artifactId>
<version>2.2.0-rc1</version>
</dependency>