Render REST errors in a structural way #10643

s1monw · 2015-04-17T10:04:20Z

This is a first cut aiming at progress rather than perfection for #3303

It renders errors on the REST layer as JSON to allow more insight into the error message ie
the query with a missing index renders like this:

GET localhost:9200/foo1/_search/?q=~3&pretty=true
{
  "error" : {
    "type" : "index_missing_exception",
    "reason" : "[foo1] missing"
  },
  "status" : 404
}

or a search phase exception:

GET localhost:9200/_search?q=~3&pretty=true
{
  "error" : {
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "failed_shards" : [ {
      "shard" : 0,
      "index" : "foo",
      "node" : "0-75uocCQhyX4clLGbE6OQ",
      "reason" : {
        "type" : "search_parse_exception",
        "reason" : "Failed to parse source [{\"query\":{\"query_string\":{\"query\":\"~3\",\"lowercase_expanded_terms\":true,\"analyze_wildcard\":false}}}]",
        "caused_by" : {
          "type" : "query_parsing_exception",
          "reason" : "[foo] Failed to parse query [~3]",
          "caused_by" : {
            "type" : "parse_exception",
            "reason" : "Cannot parse '~3': Encountered \" <FUZZY_SLOP> \"~3 \"\" at line 1, column 0.\nWas expecting one of:\n    <NOT> ...\n    \"+\" ...\n    \"-\" ...\n    <BAREOPER> ...\n    \"(\" ...\n    \"*\" ...\n    <QUOTED> ...\n    <TERM> ...\n    <PREFIXTERM> ...\n    <WILDTERM> ...\n    <REGEXPTERM> ...\n    \"[\" ...\n    \"{\" ...\n    <NUMBER> ...\n    <TERM> ...\n    \"*\" ...\n    ",
            "caused_by" : {
              "type" : "parse_exception",
              "reason" : "Encountered \" <FUZZY_SLOP> \"~3 \"\" at line 1, column 0.\nWas expecting one of:\n    <NOT> ...\n    \"+\" ...\n    \"-\" ...\n    <BAREOPER> ...\n    \"(\" ...\n    \"*\" ...\n    <QUOTED> ...\n    <TERM> ...\n    <PREFIXTERM> ...\n    <WILDTERM> ...\n    <REGEXPTERM> ...\n    \"[\" ...\n    \"{\" ...\n    <NUMBER> ...\n    <TERM> ...\n    \"*\" ...\n    "
            }
          }
        }
      }
    }]
  },
  "status" : 400
}

rashidkpc · 2015-04-17T15:58:19Z

Few minutes of fiddling feedback. Definitely an improvement! We can dig into the objects until we find a type that we know about, eg query_parsing_exception, and create our own error message from there.

I don't know much of the code but I'm guessing getMessage() is responsible for "[foo] Failed to parse query [~3]"? That's where things probably get a bit hectic?

De-duping the reason
For example, in this first request we're asking logstash-* a question. Which means we're going to get a reason for every query_parsing_exception in every shard of every index that matches that pattern. The every shard isn't a problem, since all of those reasons will be the same. But since the index name is embedded in the string it will cause a different message for each index, even though they all failed for the same reason.

GET /logstash-*/_search/?q=~3&pretty=true

[...]
{
            "shard": 0,
            "index": "logstash-2015.04.09",
            "node": "C1kmGmoPTHmdfQIpIa_p0Q",
            "reason": {
               "type": "search_parse_exception",
               "reason": "Failed to parse source [{\"query\":{\"query_string\":{\"query\":\"~3\",\"lowercase_expanded_terms\":true,\"analyze_wildcard\":false}}}]",
               "caused_by": {
                  "type": "query_parsing_exception",
                  "reason": "[logstash-2015.04.09] Failed to parse query [~3]",
                  "caused_by": {
                     "type": "parse_exception",
                     "reason": "Cannot parse '~3': Encountered \" <FUZZY_SLOP> \"~3 \"\" at line 1, column 0.\nWas expecting one of:\n    <NOT> ...\n    \"+\" ...\n    \"-\" ...\n    <BAREOPER> ...\n    \"(\" ...\n    \"*\" ...\n    <QUOTED> ...\n    <TERM> ...\n    <PREFIXTERM> ...\n    <WILDTERM> ...\n    <REGEXPTERM> ...\n    \"[\" ...\n    \"{\" ...\n    <NUMBER> ...\n    <TERM> ...\n    \"*\" ...\n    ",
                     "caused_by": {
                        "type": "parse_exception",
                        "reason": "Encountered \" <FUZZY_SLOP> \"~3 \"\" at line 1, column 0.\nWas expecting one of:\n    <NOT> ...\n    \"+\" ...\n    \"-\" ...\n    <BAREOPER> ...\n    \"(\" ...\n    \"*\" ...\n    <QUOTED> ...\n    <TERM> ...\n    <PREFIXTERM> ...\n    <WILDTERM> ...\n    <REGEXPTERM> ...\n    \"[\" ...\n    \"{\" ...\n    <NUMBER> ...\n    <TERM> ...\n    \"*\" ...\n    "
                     }
                  }
               }
            }
         },
[...]

If we look at a sort failure, we have a much better scenario. The message will be the same every index and every shard, so we de-dupe them and tell the user "No mapping found for [foo] in order to sort on":


[...]
{
            "shard": 0,
            "index": "logstash-2015.04.09",
            "node": "C1kmGmoPTHmdfQIpIa_p0Q",
            "reason": {
               "type": "search_parse_exception",
               "reason": "Failed to parse source [{\"sort\":[{\"foo\":{}}]}]",
               "caused_by": {
                  "type": "search_parse_exception",
                  "reason": "No mapping found for [foo] in order to sort on"
               }
            }
         },
[...]

Figuring out which reason to use
We need to pick a simple error to show the user. As a human I'm pretty good at scanning that object and finding the subjectively "simple" reason, but we'd need a rule for doing the same programmatically.

From these two examples I would probably pick the "deepest" type we understand. For example, if we understand search_parse_exception we would dig into the sort error until we got to the deepest one: No mapping found for [foo] in order to sort on

In the query example, we'd say we still understand search_parse_exception, but we know query_parsing_exception also, and because its deeper, we'll use its reason. This would depend on us blacklisting parse_exception and making the assumption that anywhere a parse_exception could occur we should never show its reason because its simply too verbose.

Ideally we could do this on the Elasticsearch side and float the "simple" error to the top somehow, but I don't know that my rule devised from 2 examples is going to apply globally 😄

s1monw · 2015-04-18T12:32:22Z

@rashidkpc thanks a lot for your feedback, I tried another interation for this including some possible solutions to your problems:

De-duping the reason I added an optional deduplication for the shard failures that only returns on of the shard failures if there are multiple like this:

GET /_search/?q=::&pretty=true&group_shard_failures=true
{
  "error" : {
    "human_error" : [ "Failed to parse query [::]" ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "failed_shards" : [ {
      "shard" : 0,
      "index" : "foo",
      "node" : "7iBpoq5uTWyhDxMtBLHkVw",
      "reason" : {
        "type" : "search_parse_exception",
        "reason" : "Failed to parse source [{\"query\":{\"query_string\":{\"query\":\"::\",\"lowercase_expanded_terms\":true,\"analyze_wildcard\":false}}}]",
        "caused_by" : {
          "type" : "query_parsing_exception",
          "reason" : "Failed to parse query [::]",
          "index" : "foo",
          "caused_by" : {
            "type" : "parse_exception",
            "reason" : "Cannot parse '::': Encountered \" \":\" \": \"\" at line 1, ... ",
            "caused_by" : {
              "type" : "parse_exception",
              "reason" : "Encountered \" \":\" \": \"\" at line 1,... "
            }
          }
        }
      }
    } ]
  },
  "status" : 400
}

I added a best effort human readable error on the top level that basically uses all the unique lowest level Elasticsearch created exceptions (the first level we control) which looks like this:

GET /_search/?q=::&pretty=true&group_shard_failures=true
{
  "error" : {
    "human_error" : [ "Failed to parse query [::]" ],
    ....
  }
}

GET /_search/asf?pretty=true
{
  "error" : {
    "human_error" : [ "No feature for name [asf]" ],
    "type" : "elasticsearch_illegal_argument_exception",
    "reason" : "No feature for name [asf]"
  },
  "status" : 400
}

GET /bar/_search/?pretty=true
{
  "error" : {
    "human_error" : [ "no such index" ],
    "type" : "index_missing_exception",
    "reason" : "no such index",
    "index" : "bar"
  },
  "status" : 404
}

GET /_search?pretty=true&group_shard_failures=true' -d '{"sort" : [{"foo":{}}]}'
{
  "error" : {
    "human_error" : [ "No mapping found for [foo] in order to sort on" ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "failed_shards" : [ {
      "shard" : 0,
      "index" : "foo",
      "node" : "7iBpoq5uTWyhDxMtBLHkVw",
      "reason" : {
        "type" : "search_parse_exception",
        "reason" : "Failed to parse source [{\"sort\" : [{\"foo\":{}}]}]",
        "caused_by" : {
          "type" : "search_parse_exception",
          "reason" : "No mapping found for [foo] in order to sort on"
        }
      }
    } ]
  },
  "status" : 400
}

Those are all best effort and simple heuristics but I think it's an improvement over what we have today though.

kimchy · 2015-04-22T20:51:32Z

src/main/java/org/elasticsearch/ElasticsearchException.java

+    }
+
+    protected static String getExceptionName(Throwable ex) {
+        return Strings.toUnderscoreCase(ex.getClass().getSimpleName());


can we remove an "elasticsearch_" prefix if it exists, I think it will be cleaner? down the road, we can also remove the specific ElasticsearchIllegalArgumentException and such, it was added historically to get the correct status code, but now we also identify IlleglaArgumentException and return the correct status code, so the need for those became irrelevant.

kimchy · 2015-04-22T20:59:58Z

LGTM except for the minor comment on the elasticsearch_ prefix, great stuff!

s1monw · 2015-04-22T21:00:15Z

w00t

s1monw · 2015-04-23T08:46:34Z

@kimchy @clintongormley @rashidkpc I think we are ready here any more comments on the API? Changes I made are this:

group_shard_failures defaults to true
human_error is cut over to root_cause including a type (see below)
elasticsearch_ prefix is removed...

GET /_search/asf?q=::&pretty=true
{
  "error" : {
    "root_cause" : [ {
      "type" : "illegal_argument_exception",
      "reason" : "No feature for name [asf]"
    } ],
    "type" : "illegal_argument_exception",
    "reason" : "No feature for name [asf]"
  },
  "status" : 400
}

Grouped failures are reflected in the response (see "grouped" : true) use group_shard_failures=false in the request to get all the response

GET /_search?q=::&pretty=true
{
  "error" : {
    "root_cause" : [ {
      "type" : "search_phase_execution_exception",
      "reason" : "all shards failed"
    } ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [ {
      "shard" : 0,
      "index" : "foo",
      "node" : "SvSibBJsTGKcgfyIDchoyA",
      "reason" : {
        "type" : "search_parse_exception",
        "reason" : "Failed to parse source [{\"query\":{\"query_string\":{\"query\":\"::\",\"lowercase_expanded_terms\":true,\"analyze_wildcard\":false}}}]",
        "caused_by" : {
          "type" : "query_parsing_exception",
          "reason" : "Failed to parse query [::]",
          "index" : "foo",
          "caused_by" : {
            "type" : "parse_exception",
            "reason" : "Cannot parse '::': Encountered \" \":\" \": \"\" at line 1, column 0.\n..  ",
            "caused_by" : {
              "type" : "parse_exception",
              "reason" : "Encountered \" \":\" \": \"\" at line 1, column 0.\n..."
            }
          }
        }
      }
    } ]
  },
  "status" : 400
}

I think we are ready here... we can now improve stuff as we go...

clintongormley · 2015-04-23T10:21:20Z

@s1monw In your second example in #10643 (comment), surely the root_cause should be parse_exception, not search_phase_execution_exception?

An improvement in the second example (and not necessarily as part of this PR) might be to mark a particular exception in the tree as the one to use for the root cause, eg it may make more sense to stop at query_parsing_exception (with an improved reason) instead of going all the way down to the final parse_exception.

s1monw · 2015-04-23T13:23:12Z

An improvement in the second example (and not necessarily as part of this PR) might be to mark a particular exception in the tree as the one to use for the root cause, eg it may make more sense to stop at query_parsing_exception (with an improved reason) instead of going all the way down to the final parse_exception.

I guess it would make more sense to just dont' render the verbose part if verbose=false?

s1monw · 2015-04-23T13:25:01Z

@s1monw In your second example in #10643 (comment), surely the root_cause should be parse_exception, not search_phase_execution_exception?

yeah that is right! I fixed it, it now looks like this:

GET /_search?q=::&pretty=true

{
  "error" : {
    "root_cause" : [ {
      "type" : "query_parsing_exception",
      "reason" : "Failed to parse query [:::]",
      "index" : "foo"
    } ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [ {
      "shard" : 0,
      "index" : "foo",
      "node" : "CdPVI-Y-QHyBRctNFXVmSA",
      "reason" : {
        "type" : "search_parse_exception",
        "reason" : "Failed to parse source [{\"query\":{\"query_string\":{\"query\":\":::\",\"lowercase_expanded_terms\":true,\"analyze_wildcard\":false}}}]",
        "caused_by" : {
          "type" : "query_parsing_exception",
          "reason" : "Failed to parse query [:::]",
          "index" : "foo",
          "caused_by" : {
            "type" : "parse_exception",
            "reason" : "Cannot parse ':::': Encountered \" \":\" \": \"\" at line 1, column 0.\nWas expecting one...",
            "caused_by" : {
              "type" : "parse_exception",
              "reason" : "Encountered \" \":\" \": \"\" at line 1, column 0.\nWas expecting one... "
            }
          }
        }
      }
    } ]
  },
  "status" : 400
}

s1monw · 2015-04-23T19:58:55Z

unless anybody objects I'd push this tomorrow morning CEST

clintongormley · 2015-04-23T20:06:39Z

+1

This commit adds support for structural errors / failures / exceptions on the elasticsearch REST layer. Exceptions are rendering with at least a `type` and a `reason` corresponding to the exception name and the message. Some expcetions like the ones associated with an index or a shard will have additional information about the index the exception was triggered on or the shard respectivly. Each rendered response will also contain a list of root causes which is a list of distinct shard level errors returned for the request. Root causes are the lowest level elasticsearch exception found per shard response and are intended to be displayed to the user to indicate the soruce of the exception. Shard level response are by-default grouped by their type and reason to reduce the amount of duplicates retunred. Yet, the same exception retunred from different indices will not be grouped. Closes elastic#3303

s1monw added v2.0.0-beta1 discuss WIP labels Apr 17, 2015

s1monw assigned rashidkpc Apr 18, 2015

kimchy reviewed Apr 22, 2015
View reviewed changes

s1monw added >enhancement and removed discuss WIP labels Apr 23, 2015

s1monw force-pushed the better_error_msg branch from af1f123 to 15d58d9 Compare April 24, 2015 07:41

s1monw merged commit 15d58d9 into elastic:master Apr 24, 2015

s1monw added :Core/Infra/REST API REST infrastructure and utilities :Exceptions labels Apr 24, 2015

s1monw mentioned this pull request Apr 24, 2015

Cleaner query parse error feedback #7891

Closed

clintongormley added the release highlight label Apr 25, 2015

karmi mentioned this pull request Apr 27, 2015

Provide error class for ConcurrentSnapshotExecutionException? elastic/elasticsearch-ruby#89

Closed

rashidkpc mentioned this pull request May 6, 2015

ability to send invalid query_string elastic/kibana#3764

Closed

clintongormley added >feature and removed >enhancement :Exceptions labels Jun 6, 2015

tbragin mentioned this pull request Jul 30, 2015

Take advantage of the structured errors in Elasticsearch 2.0 elastic/kibana#4536

Closed

lextoumbourou mentioned this pull request May 1, 2016

Elasticsearch 2.x support? lextoumbourou/txes2#9

Closed

rylnd mentioned this pull request Oct 19, 2020

EQL: Error responses do not include caused_by fields #63855

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Render REST errors in a structural way #10643

Render REST errors in a structural way #10643

s1monw commented Apr 17, 2015

rashidkpc commented Apr 17, 2015

s1monw commented Apr 18, 2015

kimchy Apr 22, 2015

s1monw Apr 22, 2015

kimchy commented Apr 22, 2015

s1monw commented Apr 22, 2015

s1monw commented Apr 23, 2015

clintongormley commented Apr 23, 2015

s1monw commented Apr 23, 2015

s1monw commented Apr 23, 2015

s1monw commented Apr 23, 2015

clintongormley commented Apr 23, 2015

Render REST errors in a structural way #10643

Render REST errors in a structural way #10643

Conversation

s1monw commented Apr 17, 2015

rashidkpc commented Apr 17, 2015

s1monw commented Apr 18, 2015

kimchy Apr 22, 2015

Choose a reason for hiding this comment

s1monw Apr 22, 2015

Choose a reason for hiding this comment

kimchy commented Apr 22, 2015

s1monw commented Apr 22, 2015

s1monw commented Apr 23, 2015

clintongormley commented Apr 23, 2015

s1monw commented Apr 23, 2015

s1monw commented Apr 23, 2015

s1monw commented Apr 23, 2015

clintongormley commented Apr 23, 2015