Unable to index with elasticsearch-spark on serverless elasticsearch #2222

RalphSchuurman · 2024-05-07T12:20:17Z

What kind an issue is this?

Bug report.

Issue description

I received access to Elasticsearch serverless and would like to move over, but I am unable to get the elasticsearch-spark connector to work. I am using Databricks with 13.3LTS Runtime, Scala 2.12 and Spark 3.4.1. Using org.elasticsearch:elasticsearch-spark-30_2.12:8.11.0 because when calling the client using the elasticsearch-serverless library it gave 8.11.0 as version.

from elasticsearch_serverless import Elasticsearch
client = Elasticsearch(serverless-endpoint, api_key = 'xxx')
client.info()

gives

ObjectApiResponse({'name': 'serverless', 'cluster_name': 'xxx', 'cluster_uuid': 'xxx', 'version': {'number': '8.11.0', 'build_flavor': 'serverless', 'build_type': 'docker', 'build_hash': '00000000', 'build_date': '2023-10-31', 'build_snapshot': False, 'lucene_version': '9.7.0', 'minimum_wire_compatibility_version': '8.11.0', 'minimum_index_compatibility_version': '8.11.0'}, 'tagline': 'You Know, for Search'})

Steps to reproduce

Code:


endpoint = 'serverless-endpoint'
username = 'username'
password = 'password'
index_name = 'index'
(df.write.format("org.elasticsearch.spark.sql")
    .option( "es.nodes",   endpoint)
    .option("es.port","443")
    .option("es.mapping.id","Identifier")
    .option("es.net.ssl","true")
    .option( "es.nodes.wan.only", "true" )
    .option( "es.net.http.auth.user", headers["username"])
    .option( "es.net.http.auth.pass", headers["password"])
    .option( "es.field.read.empty.as.null", "true")
    .option('es.batch.write.retry.count', "5")
    .option('es.bath.write.retry.wait', "25")
    .option("es.write.operation", "upsert")
    .mode('append')
    .save(index_name))

Strack trace:

Py4JJavaError: An error occurred while calling o609.save.
: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

Changin es.nodes.wan.only to false does not change the outcome.

Version Info

OS: : Databricks with 13.3LTS Runtime, Scala 2.12 and Spark 3.4.1
Hadoop/Spark: org.elasticsearch:elasticsearch-spark-30_2.12:8.11.0
ES : Elasticsearch Serverless

The text was updated successfully, but these errors were encountered:

masseyke · 2024-05-07T13:18:05Z

Hi @RalphSchuurman. Serverless Elasticsearch currently only supports a subset of full Elasticsearch functionality. Es-hadoop/spark is not supported, and there are no immediate plans to support it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to index with elasticsearch-spark on serverless elasticsearch #2222

Unable to index with elasticsearch-spark on serverless elasticsearch #2222

RalphSchuurman commented May 7, 2024 •

edited

masseyke commented May 7, 2024

Unable to index with elasticsearch-spark on serverless elasticsearch #2222

Unable to index with elasticsearch-spark on serverless elasticsearch #2222

Comments

RalphSchuurman commented May 7, 2024 • edited

What kind an issue is this?

Issue description

Steps to reproduce

Version Info

masseyke commented May 7, 2024

RalphSchuurman commented May 7, 2024 •

edited