Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest with Spark 1.5.1, ES 2.0 and v2.2.0-beta1 #603

Closed
amiorin opened this issue Nov 9, 2015 · 4 comments

Comments

@amiorin
Copy link

amiorin commented Nov 9, 2015

https://gist.github.com/amiorin/d84dcdcbbdf29fa36079

It works with ES 1.7
ngrep says that no 500 status is returned by ES 2.0
Any idea?

@amiorin
Copy link
Author

amiorin commented Nov 16, 2015

Am I the only one with this problem?

@costin
Copy link
Member

costin commented Nov 16, 2015

Is there anything special about your indices? Does something show up in the ES 2.0 logs? The gist (thanks for this by the way and for turning on logging) things are pretty clear - the 500 start appearing quite early on on the various Spark tasks - all at various stages during the initial handshake.
500 indicates there's something wrong with ES - is it running out of space or anything like that?

@amiorin
Copy link
Author

amiorin commented Nov 17, 2015

This is our smoke test.

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._

val conf = new SparkConf().setAppName("foo").setMaster("local[8]")
conf.set("es.index.auto.create", "true")
conf.set("es.nodes", "elasticsearch.service.bohr.consul")
conf.set("es.port", "9200")
val sc = new SparkContext(conf)

val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")

sc.makeRDD(Seq(numbers, airports)).saveToEs("foo/bar")

On port 80 we have a reverse proxy, that returns 500 if it cannot route a request. From the logs of the reverse proxy I can see that es-hadoop is trying to make these requests:

192.168.88.80 - els1.node.bohr.consul HEAD /foo HTTP/1.1 500 - 2.504 ms
192.168.88.80 - els1.node.bohr.consul GET /_nodes/http HTTP/1.1 500 - 2.325 ms
192.168.88.80 - els1.node.bohr.consul GET /_nodes/http HTTP/1.1 500 - 2.200 ms
192.168.88.80 - els1.node.bohr.consul GET /_nodes/http HTTP/1.1 500 - 2.084 ms
192.168.88.80 - els1.node.bohr.consul HEAD /foo HTTP/1.1 500 - 1.971 ms
192.168.88.80 - els1.node.bohr.consul GET /_nodes/http HTTP/1.1 500 - 1.851 ms
192.168.88.80 - els1.node.bohr.consul HEAD /foo HTTP/1.1 500 - 1.725 ms
192.168.88.80 - els1.node.bohr.consul HEAD /foo HTTP/1.1 500 - 1.610 ms

We use consul for service discovery. It could be that ES 2.0 is redirecting requests for elasticsearch.service.bohr.consul:9200 to els1.node.bohr.consul:80.

This is our es2.0 config

# N4 settings
script.inline: on
script.indexed: on
indices.store.throttle.type: none
indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_size: 1gb

cluster.name: es-bohr
node.name: "Insane Poincare"
node.data: true
bootstrap.mlockall: true
network.publish_host: els1.node.bohr.consul
discovery.zen.ping.unicast.hosts: ["els1.node.consul"]
http.cors.enabled: true
network.host: 172.16.0.104

I'm investigating the network settings.

@amiorin
Copy link
Author

amiorin commented Nov 17, 2015

I've remove network.publish_host: els1.node.bohr.consul and now it works.

@amiorin amiorin closed this as completed Nov 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants