Thousands of SearchContextMissingException: No search context found for id #576

sjoerdmulder · 2015-10-15T08:32:39Z

Version 2.1.1
We are using es-hadoop with spark, but each time our job runs the elasticsearch logs are spoiled with the errors like below:

This is causing the logfiles to grow by many gigabytes each day

[2015-10-15 06:31:11,699][DEBUG][action.search.type       ] [es-007] [120600] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException [es-012][inet[/192.168.0.12:9300]][indices:data/read/search[phase/scan/scroll]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [120600]
    at org.elasticsearch.search.SearchService.findContext(SearchService.java:537)
    at org.elasticsearch.search.SearchService.executeScan(SearchService.java:265)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:939)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:930)
    at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

The text was updated successfully, but these errors were encountered:

costin · 2015-10-15T17:18:57Z

Does this happened before 2.1.1 or not? It looks like the scan is finished/closed and the id is unavailable. Anything suspicious? If you can replicate this on a small log job, you can enable TRACE logging on the rest package and see what calls are being made.

What version of ES are you using? Does this occur on all Spark tasks or just on one?

sjoerdmulder · 2015-10-15T19:36:31Z

Elasticsearch version is 1.7.2, also happens with 2.1.0. It does not happen on all Spark jobs, but i have isolated the scenario 😄

import com.sksamuel.elastic4s.ElasticClient
import com.sksamuel.elastic4s.ElasticDsl._
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
import org.elasticsearch.common.settings.ImmutableSettings
import org.elasticsearch.spark._

object EsScanLogApp {
  def main(args: Array[String]): Unit = {
    val sparkConf: SparkConf = new SparkConf()
      .setMaster("local[*]")
      .setAppName("EsScanLogApp")

    val settings = ImmutableSettings.builder.put("cluster.name", "test").build()
    val client = ElasticClient.remote(settings, "localhost", 9300)
    client.execute {
      deleteIndex("events-*")
    }.await
   // Depending on nr of documents you get more (or less) errors
    client.execute {
      bulk(
        (0 to 20000).map(i => index into "events-2015.01.23" / "details" fields "foo" -> "bar")
          ++ (0 to 10).map(i => index into "events-2015.09.23" / "details" fields "foo" -> "bar")
      )
    }.await

    val sc = new SparkContext(sparkConf)
    val sqlContext = new SQLContext(sc)

    val indexName = "events-*/details"
    sc.esRDD(indexName).collect()

  }
}

This causes the following logs in spark, maybe related that it discovered the mapping of the second (09.23) instead of the first (01.23)?

2015-10-15T21:27:56.952+0200 INFO  ScalaEsRDD.findPartitions() - Reading from [events-*/details]
2015-10-15T21:27:56.957+0200 INFO  ScalaEsRDD.findPartitions() - Discovered mapping {events-2015.09.23=[mappings=[details=[foo=STRING]]]} for [events-*/details]

costin · 2015-10-15T20:09:17Z

The error looks to be internal to ES likely indicating there's an issue with the state (whether that's defined by the client or not it's unclear).
I don't have the full logs but it would be useful to see what REST calls are made before the exception occurs - this likely is a good indicator.
Especially since the REST logs indicate the id being used by the scan and scroll and potentially offer insighs into what triggers the exception.

Since you are deleting and creating new docs, you could trigger a refresh after it and see whether it makes any difference before making the sc.esRDD call.

Can you post your logs as a gist ? Also consider turning the level to TRACE on org.elasticsearch.hadoop and org.elasticsearch.spark - there will be a lot of info but very useful to us.

Cheers,

sjoerdmulder · 2015-10-27T12:07:57Z

Hi @costin

The deleting / indexing is only done to reproduce the issue, it doesn't need to be in the Job.

Since this is a pretty simple testcase you'll could easily get the logs, i currently don't have time to post them.

costin · 2015-10-27T13:51:37Z

@sjoerdmulder I managed to replicate the issue and track it down (thanks to @clintongormley ) to 2.1.x not picking up the new scrol;_id when iterating across the index. The issue did not appear in master (2.2.x).
The work for this in master was done as part of #537. I'll back port it to 2.1.x as well

Update scroll id between requests Close scroll id on closing backport from commit 278972f relates #576 #537

costin · 2015-10-28T18:11:40Z

@sjoerdmulder Since the issue was detected (thanks for the test) and fixed I'm closing the issue. Feel free to open another one if it keeps occuring.

Cheers,

costin added :Rest :Spark v2.1.2 v2.2.0-beta1 labels Oct 15, 2015

costin added a commit that referenced this issue Oct 27, 2015

[REST] scan/scroll improvements

ae4d993

Update scroll id between requests Close scroll id on closing backport from commit 278972f relates #576 #537

costin added a commit that referenced this issue Oct 27, 2015

[REST] scan/scroll improvements

987c300

Update scroll id between requests Close scroll id on closing backport from commit 278972f relates #576 #537

costin closed this as completed Oct 28, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thousands of SearchContextMissingException: No search context found for id #576

Thousands of SearchContextMissingException: No search context found for id #576

sjoerdmulder commented Oct 15, 2015

costin commented Oct 15, 2015

sjoerdmulder commented Oct 15, 2015

costin commented Oct 15, 2015

sjoerdmulder commented Oct 27, 2015

costin commented Oct 27, 2015

costin commented Oct 28, 2015

Thousands of SearchContextMissingException: No search context found for id #576

Thousands of SearchContextMissingException: No search context found for id #576

Comments

sjoerdmulder commented Oct 15, 2015

costin commented Oct 15, 2015

sjoerdmulder commented Oct 15, 2015

costin commented Oct 15, 2015

sjoerdmulder commented Oct 27, 2015

costin commented Oct 27, 2015

costin commented Oct 28, 2015