Skip to content

SQL query never gets translated to ES search query with pushdown enabled #681

Closed
@RobbieHer

Description

@RobbieHer

Hi,
I am doing the following to fetch data from my ES instance

SparkConf conf = new SparkConf().setAppName("Simple Application")
                        .set("es.resource", "myindex/account")
                        .set("es.nodes","192.168.224.94").set("es.port","9200")
                    .set("es.index.auto.create","no").set("es.nodes.discovery","false").set("pushdown","true");
    JavaSparkContext sc = new JavaSparkContext(conf);
    SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);

    DataFrame myEsDump = JavaEsSparkSQL.esDF(sqlContext);
    myEsDump.registerTempTable("allAccounts");

   DataFrame accounts = sqlContext.sql("SELECT name FROM allAccounts WHERE name = 'Name-801'");

This runs fine and gives me the record I want. However, it appears that this never makes a ES query. I have enabled slow logging for all queries and I never see ES being queried. What would be the reason that all the ES documents are being sucked in and a filter being applied in the Spark layer? I though that enabling pushdown should disable such behavior.

Here are the versions that I am using

<dependency> <!-- Spark dependency -->
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-core_2.10</artifactId>
  <version>1.6.0</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch-spark_2.10</artifactId>
    <version>2.2.0-rc1</version>
</dependency>

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions