Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL query never gets translated to ES search query with pushdown enabled #681

Closed
RobbieHer opened this issue Jan 29, 2016 · 4 comments
Closed

Comments

@RobbieHer
Copy link

Hi,
I am doing the following to fetch data from my ES instance

SparkConf conf = new SparkConf().setAppName("Simple Application")
                        .set("es.resource", "myindex/account")
                        .set("es.nodes","192.168.224.94").set("es.port","9200")
                    .set("es.index.auto.create","no").set("es.nodes.discovery","false").set("pushdown","true");
    JavaSparkContext sc = new JavaSparkContext(conf);
    SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);

    DataFrame myEsDump = JavaEsSparkSQL.esDF(sqlContext);
    myEsDump.registerTempTable("allAccounts");

   DataFrame accounts = sqlContext.sql("SELECT name FROM allAccounts WHERE name = 'Name-801'");

This runs fine and gives me the record I want. However, it appears that this never makes a ES query. I have enabled slow logging for all queries and I never see ES being queried. What would be the reason that all the ES documents are being sucked in and a filter being applied in the Spark layer? I though that enabling pushdown should disable such behavior.

Here are the versions that I am using

<dependency> <!-- Spark dependency -->
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-core_2.10</artifactId>
  <version>1.6.0</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch-spark_2.10</artifactId>
    <version>2.2.0-rc1</version>
</dependency>
@costin
Copy link
Member

costin commented Jan 29, 2016

Added minor edits to your post for proper formatting

Push-down applies only for DataFrames registered through Spark's DataSources API as explained in the docs. Applying SQL on custom DataFrames not registered that way, means Spark will not apply any pushdown.

@costin costin closed this as completed Jan 29, 2016
@RobbieHer
Copy link
Author

Thanks @costin .

Could you let me know what is wrong in the following set of steps? I registered the dataFrame through Spark's DataSource, but still dont see the query getting executed on ElasticSearch

`

SparkConf conf = new SparkConf().setAppName("Simple Application");
Map<String,String> dataFrameOptions = new HashMap<String,String>();
dataFrameOptions.put("es.resource", "myindex/account");

dataFrameOptions.put("es.nodes","192.168.224.94");

dataFrameOptions.put("es.port","9200");
dataFrameOptions.put("es.index.auto.create","no");
dataFrameOptions.put("es.nodes.discovery","false");
dataFrameOptions.put("pushdown","true");
dataFrameOptions.put("double.filtering","false");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
DataFrame myEsDump = sqlContext.read().format("org.elasticsearch.spark.sql").options(dataFrameOptions).load("myindex/account");
myEsDump.registerTempTable("allAccounts");
DataFrame accounts = sqlContext.sql("SELECT name FROM allAccounts WHERE name = 'Name-888'");

`

@costin
Copy link
Member

costin commented Feb 1, 2016

The spark documentation should provide enough info. Notice that the dataframes are manipulated as is without having it registered as a table. In the end, the both should work the same way however there might be a different code path taken by using a registered table.
Have you tried running the query directly on the dataframe instead of going through a temporary table?

@RobbieHer
Copy link
Author

Thanks again! I was able to capture the packets and saw that it did indeed generate ES queries. It may have been some abnormality with the ES loggers that is causing it to not log the scan/scroll search queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants