DocumentMissingException is also thrown on update retries #6724

peschlowp · 2014-07-04T09:12:21Z

Closes Indexing: DocumentMissingException is uncaught if thrown during retry of update request #6355.
Contains the bugfix and an integration test demonstrating the bug and validating the fix.
The fix in TransportUpdateAction is a simple try-catch block calling the appropriate failure handler. The fix prevents DocumentMissingExceptions from ending up uncaught and thus makes sure that an HTTP response is actually sent to the client instead of not sending any at all.

s1monw · 2014-07-08T16:44:18Z

sorry for the delay - I will look into this soon. Did you sign the CLA so I can pull it in once ready?

peschlowp · 2014-07-08T19:00:47Z

Glad to hear that this is going to be reviewed! I recently signed a CLA for contributing some typo fixes to The Definitive Guide - if it is the same CLA, then I'm already done.

kimchy · 2014-07-08T19:03:18Z

src/main/java/org/elasticsearch/action/update/TransportUpdateAction.java

-                                        shardOperation(request, listener, retryCount + 1);
+                                        try {
+                                            shardOperation(request, listener, retryCount + 1);
+                                        } catch (DocumentMissingException e) {


this should just catch Throwable I think

Yeah, catching Throwable might be even better to handle other cases as well. I was restricting myself to fix just the error situation I observed :-)

While at it, we might also consider adding similar try-catch blocks to two other cases (UPSERT, DELETE) of the surrounding switch statement.

If that's the way to go, I can modify my changes accordingly and also add further tests validating behavior for those other error cases. (So far I only added a test for that DocumentMissingException occurring during update request retries.)

yes, make sense

s1monw · 2014-07-09T10:48:38Z

hey,

I really appreciate your testcase though but it's kind of very involved and should use our testing infrastructure. Maybe take a look at ElasticsearchIntegrationTest?

I also wonder if we should fix this in more places I can see 2 more places where we do run shardOperation in a thread. I think maybe we can implement a special Runnable like

public abstract class ActionRunnable implements Runnable {
  protected final ActionListener<?> listener 
  public ActionRunnable(ActionListener<?> listener ){
    this.listener = listener;
  } 
  public final void run() {
    try {
      doRun();
    } catch (Throwable t) {
        listener.onFailure(t);
    }
  }
  protected abstract void doRun();
}

peschlowp · 2014-07-09T12:05:45Z

I'm currently adding test cases for the two other places in TransportUpdateAction, to have the logic in place at all. I'm going to update the pull request once I'm done. It's not a big deal to add the Runnable implementation you provide so I'll do that as well.

Actually, I took a look at ElasticsearchIntegrationTest before starting to write the initial test, but unfortunately it uses a specific implementation of Engine, and I need another one to force the timing between requests. As an Elasticsearch codebase newbie, I didn't want to mess with that so I went for a self-contained solution. Still, I'd be glad to have these tests based on the existing infrastructure. Note that another constraint is that I need to access the Engine instance from the test code, and it is not easy to get the respective injector at that point (as it is the index' injector, not the node's). Do you have a good suggestion for fitting it into ElasticsearchIntegrationTest?

s1monw · 2014-07-09T19:46:58Z

@peschlowp if you want to use a different engine that is pretty simple you can create an index like this:

public void testX() {
  assertAcked(prepareCreate("idx").setSettings(ImmutableSettings.builder()
      .put(IndexEngineModule.EngineSettings.ENGINE_TYPE, YourEngineModule.class.getName());
  // go do your thing...
}

lemme know if it works

peschlowp · 2014-07-10T11:34:21Z

Pushed an update applying the try-catch fix to all three locations, for you to review and give me feedback on two open questions:

I turned my self-contained test class into an ElasticsearchIntegration test. However, while the former implementation was stable, now it seems that some of the random aspects of the test cluster interfere with the timing I'm faking for the tests. Like, some request doesn't reach internal engine even though it's supposed to. I'm not sure how difficult it will be for me to debug this and see what exactly is the problem there. Is there an option to disable some of the randomness so that we can at least assure the tests will be stable and green, then maybe investigate further (if needed at all)?
I was at a loss coming up with a test case for the "delete" case in TransportUpdateAction.shardOperation(), because for the error situation to happen the delete request needs to have a retry_on_conflict value set. Is this possible at all - via the API it doesn't seem to be? So do we need to handle this case at all in case of "delete" requests?

Note that I didn't create a separate ActionRunnable class yet, this can be added when the above points are resolved.

peschlowp · 2014-07-10T13:31:51Z

Sorry for the confusion, I introduced a timing bug in one of my test cases while refactoring it and blamed the integration test infrastructure for it. It is fixed now and the tests are running stable on my machine :-)

Also, I added the ActionRunnable @s1monw proposed.

Aside from the question if the retry logic is needed at all for delete requests (see point 2 in my last comment), I'm satisfied now. If your review also turns out satisfactory, I can squash the various commits to a single one that you can actually pull.

s1monw · 2014-07-10T20:34:26Z

src/test/java/org/elasticsearch/action/update/TransportUpdateActionTest.java

+ * Tests to demonstrate the bug discussed in issue #6355 and pull request #6724. The tests validate the fix for that bug
+ * and similar situations.
+ */
+@ClusterScope(scope = Scope.SUITE, numDataNodes = 1, numClientNodes = 0)


I don't think this is needed you can run with default scope

You are right. Seems I was overly cautious when turning it into an ElasticsearchIntegrationTest. But as long as I only use my own internal engine implementation within my test indexes, everything is fine. Going to remove the annotation line completely.

s1monw · 2014-07-10T20:36:44Z

I left a couple of cosmetic comments I think we are ready to squash good job!

Closes elastic#6355. Also contains an integration test demonstrating the bug and validating the fix.

peschlowp · 2014-07-11T08:38:46Z

OK, all done, including a few more cosmetic changes, and squashed. Feel free to refactor whatever you like after pulling, I probably haven't followed all conventions of the project everywhere.

Happy that I could contribute to Elasticsearch.

s1monw · 2014-07-15T12:56:16Z

thanks for the report and the fix! I will push this in a bit..

Throwables thrown on update retries are now caught and handled via the provided callback. This commit also contains an integration test demonstrating the bug and validating the fix. Closes #6355 Closes #6724

Throwables thrown on update retries are now caught and handled via the provided callback. This commit also contains an integration test demonstrating the bug and validating the fix. Closes elastic#6355 Closes elastic#6724

clintongormley added review labels Jul 4, 2014

s1monw self-assigned this Jul 8, 2014

kimchy reviewed Jul 8, 2014
View reviewed changes

clintongormley removed the adoptme label Jul 10, 2014

s1monw reviewed Jul 10, 2014
View reviewed changes

Throwables thrown on update retries are caught and handled

bcebd86

Closes elastic#6355. Also contains an integration test demonstrating the bug and validating the fix.

s1monw removed the review label Jul 15, 2014

s1monw closed this in 9742d08 Jul 15, 2014

s1monw removed the review label Jul 15, 2014

clintongormley added >bug v2.0.0-beta1 v1.3.0 v1.2.3 :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels Jun 7, 2015

DocumentMissingException is also thrown on update retries #6724

DocumentMissingException is also thrown on update retries #6724

Uh oh!

Conversation

peschlowp commented Jul 4, 2014

Uh oh!

s1monw commented Jul 8, 2014

Uh oh!

peschlowp commented Jul 8, 2014

Uh oh!

kimchy Jul 8, 2014

Choose a reason for hiding this comment

Uh oh!

peschlowp Jul 8, 2014

Choose a reason for hiding this comment

Uh oh!

peschlowp Jul 8, 2014

Choose a reason for hiding this comment

Uh oh!

kimchy Jul 9, 2014

Choose a reason for hiding this comment

Uh oh!

s1monw commented Jul 9, 2014

Uh oh!

peschlowp commented Jul 9, 2014

Uh oh!

s1monw commented Jul 9, 2014

Uh oh!

peschlowp commented Jul 10, 2014

Uh oh!

peschlowp commented Jul 10, 2014

Uh oh!

s1monw Jul 10, 2014

Choose a reason for hiding this comment

Uh oh!

peschlowp Jul 11, 2014

Choose a reason for hiding this comment

Uh oh!

s1monw commented Jul 10, 2014

Uh oh!

peschlowp commented Jul 11, 2014

Uh oh!

s1monw commented Jul 15, 2014

Uh oh!

Uh oh!