Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thousands of SearchContextMissingException: No search context found for id #576

Closed
sjoerdmulder opened this issue Oct 15, 2015 · 6 comments

Comments

@sjoerdmulder
Copy link

Version 2.1.1
We are using es-hadoop with spark, but each time our job runs the elasticsearch logs are spoiled with the errors like below:

This is causing the logfiles to grow by many gigabytes each day

[2015-10-15 06:31:11,699][DEBUG][action.search.type       ] [es-007] [120600] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException [es-012][inet[/192.168.0.12:9300]][indices:data/read/search[phase/scan/scroll]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [120600]
    at org.elasticsearch.search.SearchService.findContext(SearchService.java:537)
    at org.elasticsearch.search.SearchService.executeScan(SearchService.java:265)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:939)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:930)
    at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
@costin
Copy link
Member

costin commented Oct 15, 2015

Does this happened before 2.1.1 or not? It looks like the scan is finished/closed and the id is unavailable. Anything suspicious? If you can replicate this on a small log job, you can enable TRACE logging on the rest package and see what calls are being made.

What version of ES are you using? Does this occur on all Spark tasks or just on one?

@sjoerdmulder
Copy link
Author

Elasticsearch version is 1.7.2, also happens with 2.1.0. It does not happen on all Spark jobs, but i have isolated the scenario 😄

import com.sksamuel.elastic4s.ElasticClient
import com.sksamuel.elastic4s.ElasticDsl._
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
import org.elasticsearch.common.settings.ImmutableSettings
import org.elasticsearch.spark._

object EsScanLogApp {
  def main(args: Array[String]): Unit = {
    val sparkConf: SparkConf = new SparkConf()
      .setMaster("local[*]")
      .setAppName("EsScanLogApp")

    val settings = ImmutableSettings.builder.put("cluster.name", "test").build()
    val client = ElasticClient.remote(settings, "localhost", 9300)
    client.execute {
      deleteIndex("events-*")
    }.await
   // Depending on nr of documents you get more (or less) errors
    client.execute {
      bulk(
        (0 to 20000).map(i => index into "events-2015.01.23" / "details" fields "foo" -> "bar")
          ++ (0 to 10).map(i => index into "events-2015.09.23" / "details" fields "foo" -> "bar")
      )
    }.await

    val sc = new SparkContext(sparkConf)
    val sqlContext = new SQLContext(sc)

    val indexName = "events-*/details"
    sc.esRDD(indexName).collect()

  }
}

This causes the following logs in spark, maybe related that it discovered the mapping of the second (09.23) instead of the first (01.23)?

2015-10-15T21:27:56.952+0200 INFO  ScalaEsRDD.findPartitions() - Reading from [events-*/details]
2015-10-15T21:27:56.957+0200 INFO  ScalaEsRDD.findPartitions() - Discovered mapping {events-2015.09.23=[mappings=[details=[foo=STRING]]]} for [events-*/details]

@costin
Copy link
Member

costin commented Oct 15, 2015

The error looks to be internal to ES likely indicating there's an issue with the state (whether that's defined by the client or not it's unclear).
I don't have the full logs but it would be useful to see what REST calls are made before the exception occurs - this likely is a good indicator.
Especially since the REST logs indicate the id being used by the scan and scroll and potentially offer insighs into what triggers the exception.

Since you are deleting and creating new docs, you could trigger a refresh after it and see whether it makes any difference before making the sc.esRDD call.

Can you post your logs as a gist ? Also consider turning the level to TRACE on org.elasticsearch.hadoop and org.elasticsearch.spark - there will be a lot of info but very useful to us.

Cheers,

@sjoerdmulder
Copy link
Author

Hi @costin

The deleting / indexing is only done to reproduce the issue, it doesn't need to be in the Job.

Since this is a pretty simple testcase you'll could easily get the logs, i currently don't have time to post them.

@costin
Copy link
Member

costin commented Oct 27, 2015

@sjoerdmulder I managed to replicate the issue and track it down (thanks to @clintongormley ) to 2.1.x not picking up the new scrol;_id when iterating across the index. The issue did not appear in master (2.2.x).
The work for this in master was done as part of #537. I'll back port it to 2.1.x as well

costin added a commit that referenced this issue Oct 27, 2015
Update scroll id between requests
Close scroll id on closing

backport from commit 278972f

relates #576 #537
costin added a commit that referenced this issue Oct 27, 2015
Update scroll id between requests
Close scroll id on closing

backport from commit 278972f

relates #576 #537
@costin
Copy link
Member

costin commented Oct 28, 2015

@sjoerdmulder Since the issue was detected (thanks for the test) and fixed I'm closing the issue. Feel free to open another one if it keeps occuring.

Cheers,

@costin costin closed this as completed Oct 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants