Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render REST errors in a structural way #10643

Merged
merged 1 commit into from
Apr 24, 2015
Merged

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Apr 17, 2015

This is a first cut aiming at progress rather than perfection for #3303

It renders errors on the REST layer as JSON to allow more insight into the error message ie
the query with a missing index renders like this:

GET localhost:9200/foo1/_search/?q=~3&pretty=true
{
  "error" : {
    "type" : "index_missing_exception",
    "reason" : "[foo1] missing"
  },
  "status" : 404
}

or a search phase exception:

GET localhost:9200/_search?q=~3&pretty=true
{
  "error" : {
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "failed_shards" : [ {
      "shard" : 0,
      "index" : "foo",
      "node" : "0-75uocCQhyX4clLGbE6OQ",
      "reason" : {
        "type" : "search_parse_exception",
        "reason" : "Failed to parse source [{\"query\":{\"query_string\":{\"query\":\"~3\",\"lowercase_expanded_terms\":true,\"analyze_wildcard\":false}}}]",
        "caused_by" : {
          "type" : "query_parsing_exception",
          "reason" : "[foo] Failed to parse query [~3]",
          "caused_by" : {
            "type" : "parse_exception",
            "reason" : "Cannot parse '~3': Encountered \" <FUZZY_SLOP> \"~3 \"\" at line 1, column 0.\nWas expecting one of:\n    <NOT> ...\n    \"+\" ...\n    \"-\" ...\n    <BAREOPER> ...\n    \"(\" ...\n    \"*\" ...\n    <QUOTED> ...\n    <TERM> ...\n    <PREFIXTERM> ...\n    <WILDTERM> ...\n    <REGEXPTERM> ...\n    \"[\" ...\n    \"{\" ...\n    <NUMBER> ...\n    <TERM> ...\n    \"*\" ...\n    ",
            "caused_by" : {
              "type" : "parse_exception",
              "reason" : "Encountered \" <FUZZY_SLOP> \"~3 \"\" at line 1, column 0.\nWas expecting one of:\n    <NOT> ...\n    \"+\" ...\n    \"-\" ...\n    <BAREOPER> ...\n    \"(\" ...\n    \"*\" ...\n    <QUOTED> ...\n    <TERM> ...\n    <PREFIXTERM> ...\n    <WILDTERM> ...\n    <REGEXPTERM> ...\n    \"[\" ...\n    \"{\" ...\n    <NUMBER> ...\n    <TERM> ...\n    \"*\" ...\n    "
            }
          }
        }
      }
    }]
  },
  "status" : 400
}

@rashidkpc
Copy link

Few minutes of fiddling feedback. Definitely an improvement! We can dig into the objects until we find a type that we know about, eg query_parsing_exception, and create our own error message from there.

I don't know much of the code but I'm guessing getMessage() is responsible for "[foo] Failed to parse query [~3]"? That's where things probably get a bit hectic?

De-duping the reason
For example, in this first request we're asking logstash-* a question. Which means we're going to get a reason for every query_parsing_exception in every shard of every index that matches that pattern. The every shard isn't a problem, since all of those reasons will be the same. But since the index name is embedded in the string it will cause a different message for each index, even though they all failed for the same reason.

GET /logstash-*/_search/?q=~3&pretty=true

[...]
{
            "shard": 0,
            "index": "logstash-2015.04.09",
            "node": "C1kmGmoPTHmdfQIpIa_p0Q",
            "reason": {
               "type": "search_parse_exception",
               "reason": "Failed to parse source [{\"query\":{\"query_string\":{\"query\":\"~3\",\"lowercase_expanded_terms\":true,\"analyze_wildcard\":false}}}]",
               "caused_by": {
                  "type": "query_parsing_exception",
                  "reason": "[logstash-2015.04.09] Failed to parse query [~3]",
                  "caused_by": {
                     "type": "parse_exception",
                     "reason": "Cannot parse '~3': Encountered \" <FUZZY_SLOP> \"~3 \"\" at line 1, column 0.\nWas expecting one of:\n    <NOT> ...\n    \"+\" ...\n    \"-\" ...\n    <BAREOPER> ...\n    \"(\" ...\n    \"*\" ...\n    <QUOTED> ...\n    <TERM> ...\n    <PREFIXTERM> ...\n    <WILDTERM> ...\n    <REGEXPTERM> ...\n    \"[\" ...\n    \"{\" ...\n    <NUMBER> ...\n    <TERM> ...\n    \"*\" ...\n    ",
                     "caused_by": {
                        "type": "parse_exception",
                        "reason": "Encountered \" <FUZZY_SLOP> \"~3 \"\" at line 1, column 0.\nWas expecting one of:\n    <NOT> ...\n    \"+\" ...\n    \"-\" ...\n    <BAREOPER> ...\n    \"(\" ...\n    \"*\" ...\n    <QUOTED> ...\n    <TERM> ...\n    <PREFIXTERM> ...\n    <WILDTERM> ...\n    <REGEXPTERM> ...\n    \"[\" ...\n    \"{\" ...\n    <NUMBER> ...\n    <TERM> ...\n    \"*\" ...\n    "
                     }
                  }
               }
            }
         },
[...]

If we look at a sort failure, we have a much better scenario. The message will be the same every index and every shard, so we de-dupe them and tell the user "No mapping found for [foo] in order to sort on":


[...]
{
            "shard": 0,
            "index": "logstash-2015.04.09",
            "node": "C1kmGmoPTHmdfQIpIa_p0Q",
            "reason": {
               "type": "search_parse_exception",
               "reason": "Failed to parse source [{\"sort\":[{\"foo\":{}}]}]",
               "caused_by": {
                  "type": "search_parse_exception",
                  "reason": "No mapping found for [foo] in order to sort on"
               }
            }
         },
[...]

Figuring out which reason to use
We need to pick a simple error to show the user. As a human I'm pretty good at scanning that object and finding the subjectively "simple" reason, but we'd need a rule for doing the same programmatically.

From these two examples I would probably pick the "deepest" type we understand. For example, if we understand search_parse_exception we would dig into the sort error until we got to the deepest one: No mapping found for [foo] in order to sort on

In the query example, we'd say we still understand search_parse_exception, but we know query_parsing_exception also, and because its deeper, we'll use its reason. This would depend on us blacklisting parse_exception and making the assumption that anywhere a parse_exception could occur we should never show its reason because its simply too verbose.

Ideally we could do this on the Elasticsearch side and float the "simple" error to the top somehow, but I don't know that my rule devised from 2 examples is going to apply globally 😄

@s1monw
Copy link
Contributor Author

s1monw commented Apr 18, 2015

@rashidkpc thanks a lot for your feedback, I tried another interation for this including some possible solutions to your problems:

De-duping the reason I added an optional deduplication for the shard failures that only returns on of the shard failures if there are multiple like this:

GET /_search/?q=::&pretty=true&group_shard_failures=true
{
  "error" : {
    "human_error" : [ "Failed to parse query [::]" ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "failed_shards" : [ {
      "shard" : 0,
      "index" : "foo",
      "node" : "7iBpoq5uTWyhDxMtBLHkVw",
      "reason" : {
        "type" : "search_parse_exception",
        "reason" : "Failed to parse source [{\"query\":{\"query_string\":{\"query\":\"::\",\"lowercase_expanded_terms\":true,\"analyze_wildcard\":false}}}]",
        "caused_by" : {
          "type" : "query_parsing_exception",
          "reason" : "Failed to parse query [::]",
          "index" : "foo",
          "caused_by" : {
            "type" : "parse_exception",
            "reason" : "Cannot parse '::': Encountered \" \":\" \": \"\" at line 1, ... ",
            "caused_by" : {
              "type" : "parse_exception",
              "reason" : "Encountered \" \":\" \": \"\" at line 1,... "
            }
          }
        }
      }
    } ]
  },
  "status" : 400
}

I added a best effort human readable error on the top level that basically uses all the unique lowest level Elasticsearch created exceptions (the first level we control) which looks like this:

GET /_search/?q=::&pretty=true&group_shard_failures=true
{
  "error" : {
    "human_error" : [ "Failed to parse query [::]" ],
    ....
  }
}

GET /_search/asf?pretty=true
{
  "error" : {
    "human_error" : [ "No feature for name [asf]" ],
    "type" : "elasticsearch_illegal_argument_exception",
    "reason" : "No feature for name [asf]"
  },
  "status" : 400
}

GET /bar/_search/?pretty=true
{
  "error" : {
    "human_error" : [ "no such index" ],
    "type" : "index_missing_exception",
    "reason" : "no such index",
    "index" : "bar"
  },
  "status" : 404
}

GET /_search?pretty=true&group_shard_failures=true' -d '{"sort" : [{"foo":{}}]}'
{
  "error" : {
    "human_error" : [ "No mapping found for [foo] in order to sort on" ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "failed_shards" : [ {
      "shard" : 0,
      "index" : "foo",
      "node" : "7iBpoq5uTWyhDxMtBLHkVw",
      "reason" : {
        "type" : "search_parse_exception",
        "reason" : "Failed to parse source [{\"sort\" : [{\"foo\":{}}]}]",
        "caused_by" : {
          "type" : "search_parse_exception",
          "reason" : "No mapping found for [foo] in order to sort on"
        }
      }
    } ]
  },
  "status" : 400
}

Those are all best effort and simple heuristics but I think it's an improvement over what we have today though.

}

protected static String getExceptionName(Throwable ex) {
return Strings.toUnderscoreCase(ex.getClass().getSimpleName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove an "elasticsearch_" prefix if it exists, I think it will be cleaner? down the road, we can also remove the specific ElasticsearchIllegalArgumentException and such, it was added historically to get the correct status code, but now we also identify IlleglaArgumentException and return the correct status code, so the need for those became irrelevant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@kimchy
Copy link
Member

kimchy commented Apr 22, 2015

LGTM except for the minor comment on the elasticsearch_ prefix, great stuff!

@s1monw
Copy link
Contributor Author

s1monw commented Apr 22, 2015

w00t

@s1monw
Copy link
Contributor Author

s1monw commented Apr 23, 2015

@kimchy @clintongormley @rashidkpc I think we are ready here any more comments on the API? Changes I made are this:

  • group_shard_failures defaults to true
  • human_error is cut over to root_cause including a type (see below)
  • elasticsearch_ prefix is removed...
GET /_search/asf?q=::&pretty=true
{
  "error" : {
    "root_cause" : [ {
      "type" : "illegal_argument_exception",
      "reason" : "No feature for name [asf]"
    } ],
    "type" : "illegal_argument_exception",
    "reason" : "No feature for name [asf]"
  },
  "status" : 400
}
  • Grouped failures are reflected in the response (see "grouped" : true) use group_shard_failures=false in the request to get all the response
GET /_search?q=::&pretty=true
{
  "error" : {
    "root_cause" : [ {
      "type" : "search_phase_execution_exception",
      "reason" : "all shards failed"
    } ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [ {
      "shard" : 0,
      "index" : "foo",
      "node" : "SvSibBJsTGKcgfyIDchoyA",
      "reason" : {
        "type" : "search_parse_exception",
        "reason" : "Failed to parse source [{\"query\":{\"query_string\":{\"query\":\"::\",\"lowercase_expanded_terms\":true,\"analyze_wildcard\":false}}}]",
        "caused_by" : {
          "type" : "query_parsing_exception",
          "reason" : "Failed to parse query [::]",
          "index" : "foo",
          "caused_by" : {
            "type" : "parse_exception",
            "reason" : "Cannot parse '::': Encountered \" \":\" \": \"\" at line 1, column 0.\n..  ",
            "caused_by" : {
              "type" : "parse_exception",
              "reason" : "Encountered \" \":\" \": \"\" at line 1, column 0.\n..."
            }
          }
        }
      }
    } ]
  },
  "status" : 400
}

I think we are ready here... we can now improve stuff as we go...

@clintongormley
Copy link

@s1monw In your second example in #10643 (comment), surely the root_cause should be parse_exception, not search_phase_execution_exception?

An improvement in the second example (and not necessarily as part of this PR) might be to mark a particular exception in the tree as the one to use for the root cause, eg it may make more sense to stop at query_parsing_exception (with an improved reason) instead of going all the way down to the final parse_exception.

@s1monw
Copy link
Contributor Author

s1monw commented Apr 23, 2015

An improvement in the second example (and not necessarily as part of this PR) might be to mark a particular exception in the tree as the one to use for the root cause, eg it may make more sense to stop at query_parsing_exception (with an improved reason) instead of going all the way down to the final parse_exception.

I guess it would make more sense to just dont' render the verbose part if verbose=false?

@s1monw
Copy link
Contributor Author

s1monw commented Apr 23, 2015

@s1monw In your second example in #10643 (comment), surely the root_cause should be parse_exception, not search_phase_execution_exception?

yeah that is right! I fixed it, it now looks like this:

GET /_search?q=::&pretty=true

{
  "error" : {
    "root_cause" : [ {
      "type" : "query_parsing_exception",
      "reason" : "Failed to parse query [:::]",
      "index" : "foo"
    } ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [ {
      "shard" : 0,
      "index" : "foo",
      "node" : "CdPVI-Y-QHyBRctNFXVmSA",
      "reason" : {
        "type" : "search_parse_exception",
        "reason" : "Failed to parse source [{\"query\":{\"query_string\":{\"query\":\":::\",\"lowercase_expanded_terms\":true,\"analyze_wildcard\":false}}}]",
        "caused_by" : {
          "type" : "query_parsing_exception",
          "reason" : "Failed to parse query [:::]",
          "index" : "foo",
          "caused_by" : {
            "type" : "parse_exception",
            "reason" : "Cannot parse ':::': Encountered \" \":\" \": \"\" at line 1, column 0.\nWas expecting one...",
            "caused_by" : {
              "type" : "parse_exception",
              "reason" : "Encountered \" \":\" \": \"\" at line 1, column 0.\nWas expecting one... "
            }
          }
        }
      }
    } ]
  },
  "status" : 400
}

@s1monw
Copy link
Contributor Author

s1monw commented Apr 23, 2015

unless anybody objects I'd push this tomorrow morning CEST

@clintongormley
Copy link

+1

This commit adds support for structural errors / failures / exceptions
on the elasticsearch REST layer. Exceptions are rendering with at least
a `type` and a `reason` corresponding to the exception name and the message.
Some expcetions like the ones associated with an index or a shard will have
additional information about the index the exception was triggered on or the
shard respectivly.

Each rendered response will also contain a list of root causes which is a list
of distinct shard level errors returned for the request. Root causes are the lowest
level elasticsearch exception found per shard response and are intended to be displayed
to the user to indicate the soruce of the exception.

Shard level response are by-default grouped by their type and reason to reduce the amount
of duplicates retunred. Yet, the same exception retunred from different indices will not be
grouped.

Closes elastic#3303
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants