Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to index with elasticsearch-spark on serverless elasticsearch #2222

Open
1 task
RalphSchuurman opened this issue May 7, 2024 · 1 comment
Open
1 task

Comments

@RalphSchuurman
Copy link

RalphSchuurman commented May 7, 2024

What kind an issue is this?

  • Bug report.

Issue description

I received access to Elasticsearch serverless and would like to move over, but I am unable to get the elasticsearch-spark connector to work. I am using Databricks with 13.3LTS Runtime, Scala 2.12 and Spark 3.4.1. Using org.elasticsearch:elasticsearch-spark-30_2.12:8.11.0 because when calling the client using the elasticsearch-serverless library it gave 8.11.0 as version.

from elasticsearch_serverless import Elasticsearch
client = Elasticsearch(serverless-endpoint, api_key = 'xxx')
client.info()

gives

ObjectApiResponse({'name': 'serverless', 'cluster_name': 'xxx', 'cluster_uuid': 'xxx', 'version': {'number': '8.11.0', 'build_flavor': 'serverless', 'build_type': 'docker', 'build_hash': '00000000', 'build_date': '2023-10-31', 'build_snapshot': False, 'lucene_version': '9.7.0', 'minimum_wire_compatibility_version': '8.11.0', 'minimum_index_compatibility_version': '8.11.0'}, 'tagline': 'You Know, for Search'})

Steps to reproduce

Code:


endpoint = 'serverless-endpoint'
username = 'username'
password = 'password'
index_name = 'index'
(df.write.format("org.elasticsearch.spark.sql")
    .option( "es.nodes",   endpoint)
    .option("es.port","443")
    .option("es.mapping.id","Identifier")
    .option("es.net.ssl","true")
    .option( "es.nodes.wan.only", "true" )
    .option( "es.net.http.auth.user", headers["username"])
    .option( "es.net.http.auth.pass", headers["password"])
    .option( "es.field.read.empty.as.null", "true")
    .option('es.batch.write.retry.count', "5")
    .option('es.bath.write.retry.wait', "25")
    .option("es.write.operation", "upsert")
    .mode('append')
    .save(index_name))

Strack trace:

Py4JJavaError: An error occurred while calling o609.save.
: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

Changin es.nodes.wan.only to false does not change the outcome.

Version Info

OS: : Databricks with 13.3LTS Runtime, Scala 2.12 and Spark 3.4.1
Hadoop/Spark: org.elasticsearch:elasticsearch-spark-30_2.12:8.11.0
ES : Elasticsearch Serverless

@masseyke
Copy link
Member

masseyke commented May 7, 2024

Hi @RalphSchuurman. Serverless Elasticsearch currently only supports a subset of full Elasticsearch functionality. Es-hadoop/spark is not supported, and there are no immediate plans to support it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants