how to use elasticsearch-spark to connect to elasticsearch server behind a proxy server #643

andrimirandi · 2016-01-05T10:53:40Z

I have used elasticsearch-spark to connect to elasticsearch server without a problem,
but somehow I found problem connecting to elasticsearch server behind proxy server
here's my configuration

my server configuration is as follow :

I have a single public ip address which I put in my routing server ( lets say its x.x.x.x)
I many nodes, one of them is elasticsearch which use a local ip (192.168.0.*)

when I try to connect to x.x.x.x using elasticsearch-spark using this code

val elasticIndex = "index/type"
val url = "x.x.x.x:9200"
val reader = sqlContext.read.
     format("org.elasticsearch.spark.sql").
     option("es.net.http.auth.user","username").
     option("es.net.http.auth.pass","password").
     option("es.nodes",url)

println(s"Loading: ${url} ...")
val df = reader.load(elasticIndex)
df.printSchema()
df.count()

the printSchema works fine...
but the count returns connection timeout error

when I take a look at the logs, I found that the local address 192.168.0.* is still being used

16/01/05 00:38:33 DEBUG ScalaEsRowRDD: Discovered Elasticsearch version [1.4.4]
16/01/05 00:38:33 DEBUG HttpMethodBase: Resorting to protocol version default close connection policy
16/01/05 00:38:33 DEBUG HttpMethodBase: Should NOT close connection, using HTTP/1.1
16/01/05 00:38:33 DEBUG HttpConnection: Releasing connection back to connection manager.
16/01/05 00:38:33 DEBUG DefaultHttpParams: Set parameter http.method.retry-handler = org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport$1@408711b5
16/01/05 00:38:33 DEBUG DefaultHttpParams: Set parameter http.connection-manager.timeout = 60000
16/01/05 00:38:33 DEBUG DefaultHttpParams: Set parameter http.socket.timeout = 60000
16/01/05 00:38:33 INFO CommonsHttpTransport: Using detected HTTP Auth credentials...
16/01/05 00:38:33 DEBUG DefaultHttpParams: Set parameter http.authentication.preemptive = true
16/01/05 00:38:33 DEBUG DefaultHttpParams: Set parameter http.tcp.nodelay = true
16/01/05 00:38:33 DEBUG HttpMethodDirector: Preemptively sending default basic credentials
16/01/05 00:38:33 DEBUG HttpMethodDirector: Authenticating with BASIC <any realm>@43.231.128.57:9200
16/01/05 00:38:33 DEBUG HttpMethodParams: Credential charset not configured, using HTTP element charset
16/01/05 00:38:33 DEBUG HttpConnection: Open connection to x.x.x.x:9200
16/01/05 00:38:33 DEBUG HttpMethodBase: Adding Host request header
16/01/05 00:38:33 DEBUG ScalaEsRowRDD: Nodes discovery enabled - found [192.168.0.5:9200]
16/01/05 00:38:33 DEBUG HttpMethodBase: Resorting to protocol version default close connection policy
16/01/05 00:38:33 DEBUG HttpMethodBase: Should NOT close connection, using HTTP/1.1
16/01/05 00:38:33 DEBUG HttpConnection: Releasing connection back to connection manager.

am I missing something ?
please help

Thanks

The text was updated successfully, but these errors were encountered:

costin · 2016-01-05T11:36:15Z

Handling your deployment is explained in the docs.

Next time, for questions please use the forum instead of the bug tracker. Thanks!

andrimirandi · 2016-01-07T05:55:28Z

Thanks

yogeshdarji · 2016-11-18T21:56:47Z

Hello, i run the same code as yours in my docker container:

Code:

import org.apache.spark.SparkContext 
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._
import org.elasticsearch.spark.sql._

val elasticIndex = "twitter"
val url = "172.17.X.X:9200"
val reader = sqlContext.read.format("org.elasticsearch.spark.sql").option("es.nodes",url)

println(s"Loading: ${url} ...")
val df = reader.load(elasticIndex)
df.printSchema()
df.count()

Getting below error while executing val df = reader.load(elasticIndex)

scala> val df = reader.load(elasticIndex)
java.lang.ClassNotFoundException: Failed to find data source: org.elasticsearch.spark.sql. Please find packages at http://spark-packages.org
  at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
  at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
  ... 58 elided
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.spark.sql.DefaultSource
  at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
  at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
  at scala.util.Try$.apply(Try.scala:192)
  at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
  at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
  at scala.util.Try.orElse(Try.scala:84)
  at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)
  ... 61 more

Am I missing anything. Any help is appreciated!

ebuildy · 2017-04-07T18:58:31Z

Indeed, this is a very good question!

Took me a few time to figure out:

If you use an Uber jar (via minimizeJar to true), this class will not be included, hence the error, despite you follow deployement recommandation.

wangzboo · 2020-07-06T09:52:00Z

谢谢

costin added question v2.2.0-rc1 v2.1.3 labels Jan 5, 2016

costin closed this as completed Jan 5, 2016

costin added the :Rest label Jan 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to use elasticsearch-spark to connect to elasticsearch server behind a proxy server #643

how to use elasticsearch-spark to connect to elasticsearch server behind a proxy server #643

andrimirandi commented Jan 5, 2016

costin commented Jan 5, 2016

andrimirandi commented Jan 7, 2016

yogeshdarji commented Nov 18, 2016 •

edited

ebuildy commented Apr 7, 2017

wangzboo commented Jul 6, 2020

how to use elasticsearch-spark to connect to elasticsearch server behind a proxy server #643

how to use elasticsearch-spark to connect to elasticsearch server behind a proxy server #643

Comments

andrimirandi commented Jan 5, 2016

costin commented Jan 5, 2016

andrimirandi commented Jan 7, 2016

yogeshdarji commented Nov 18, 2016 • edited

ebuildy commented Apr 7, 2017

wangzboo commented Jul 6, 2020

yogeshdarji commented Nov 18, 2016 •

edited