Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DocumentMissingException is also thrown on update retries #6724

Conversation

peschlowp
Copy link
Contributor

  • Closes Indexing: DocumentMissingException is uncaught if thrown during retry of update request #6355.
  • Contains the bugfix and an integration test demonstrating the bug and validating the fix.
  • The fix in TransportUpdateAction is a simple try-catch block calling the appropriate failure handler. The fix prevents DocumentMissingExceptions from ending up uncaught and thus makes sure that an HTTP response is actually sent to the client instead of not sending any at all.

@s1monw
Copy link
Contributor

s1monw commented Jul 8, 2014

sorry for the delay - I will look into this soon. Did you sign the CLA so I can pull it in once ready?

@s1monw s1monw self-assigned this Jul 8, 2014
@peschlowp
Copy link
Contributor Author

Glad to hear that this is going to be reviewed! I recently signed a CLA for contributing some typo fixes to The Definitive Guide - if it is the same CLA, then I'm already done.

shardOperation(request, listener, retryCount + 1);
try {
shardOperation(request, listener, retryCount + 1);
} catch (DocumentMissingException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should just catch Throwable I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, catching Throwable might be even better to handle other cases as well. I was restricting myself to fix just the error situation I observed :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While at it, we might also consider adding similar try-catch blocks to two other cases (UPSERT, DELETE) of the surrounding switch statement.

If that's the way to go, I can modify my changes accordingly and also add further tests validating behavior for those other error cases. (So far I only added a test for that DocumentMissingException occurring during update request retries.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, make sense

@s1monw
Copy link
Contributor

s1monw commented Jul 9, 2014

hey,

I really appreciate your testcase though but it's kind of very involved and should use our testing infrastructure. Maybe take a look at ElasticsearchIntegrationTest?

I also wonder if we should fix this in more places I can see 2 more places where we do run shardOperation in a thread. I think maybe we can implement a special Runnable like

public abstract class ActionRunnable implements Runnable {
  protected final ActionListener<?> listener 
  public ActionRunnable(ActionListener<?> listener ){
    this.listener = listener;
  } 
  public final void run() {
    try {
      doRun();
    } catch (Throwable t) {
        listener.onFailure(t);
    }
  }
  protected abstract void doRun();
}

@peschlowp
Copy link
Contributor Author

I'm currently adding test cases for the two other places in TransportUpdateAction, to have the logic in place at all. I'm going to update the pull request once I'm done. It's not a big deal to add the Runnable implementation you provide so I'll do that as well.

Actually, I took a look at ElasticsearchIntegrationTest before starting to write the initial test, but unfortunately it uses a specific implementation of Engine, and I need another one to force the timing between requests. As an Elasticsearch codebase newbie, I didn't want to mess with that so I went for a self-contained solution. Still, I'd be glad to have these tests based on the existing infrastructure. Note that another constraint is that I need to access the Engine instance from the test code, and it is not easy to get the respective injector at that point (as it is the index' injector, not the node's). Do you have a good suggestion for fitting it into ElasticsearchIntegrationTest?

@s1monw
Copy link
Contributor

s1monw commented Jul 9, 2014

@peschlowp if you want to use a different engine that is pretty simple you can create an index like this:

public void testX() {
  assertAcked(prepareCreate("idx").setSettings(ImmutableSettings.builder()
      .put(IndexEngineModule.EngineSettings.ENGINE_TYPE, YourEngineModule.class.getName());
  // go do your thing...
}

lemme know if it works

@peschlowp
Copy link
Contributor Author

Pushed an update applying the try-catch fix to all three locations, for you to review and give me feedback on two open questions:

  1. I turned my self-contained test class into an ElasticsearchIntegration test. However, while the former implementation was stable, now it seems that some of the random aspects of the test cluster interfere with the timing I'm faking for the tests. Like, some request doesn't reach internal engine even though it's supposed to. I'm not sure how difficult it will be for me to debug this and see what exactly is the problem there. Is there an option to disable some of the randomness so that we can at least assure the tests will be stable and green, then maybe investigate further (if needed at all)?
  2. I was at a loss coming up with a test case for the "delete" case in TransportUpdateAction.shardOperation(), because for the error situation to happen the delete request needs to have a retry_on_conflict value set. Is this possible at all - via the API it doesn't seem to be? So do we need to handle this case at all in case of "delete" requests?

Note that I didn't create a separate ActionRunnable class yet, this can be added when the above points are resolved.

@peschlowp
Copy link
Contributor Author

Sorry for the confusion, I introduced a timing bug in one of my test cases while refactoring it and blamed the integration test infrastructure for it. It is fixed now and the tests are running stable on my machine :-)

Also, I added the ActionRunnable @s1monw proposed.

Aside from the question if the retry logic is needed at all for delete requests (see point 2 in my last comment), I'm satisfied now. If your review also turns out satisfactory, I can squash the various commits to a single one that you can actually pull.

* Tests to demonstrate the bug discussed in issue #6355 and pull request #6724. The tests validate the fix for that bug
* and similar situations.
*/
@ClusterScope(scope = Scope.SUITE, numDataNodes = 1, numClientNodes = 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed you can run with default scope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Seems I was overly cautious when turning it into an ElasticsearchIntegrationTest. But as long as I only use my own internal engine implementation within my test indexes, everything is fine. Going to remove the annotation line completely.

@s1monw
Copy link
Contributor

s1monw commented Jul 10, 2014

I left a couple of cosmetic comments I think we are ready to squash good job!

Closes elastic#6355.
Also contains an integration test demonstrating the bug and validating the fix.
@peschlowp
Copy link
Contributor Author

OK, all done, including a few more cosmetic changes, and squashed. Feel free to refactor whatever you like after pulling, I probably haven't followed all conventions of the project everywhere.

Happy that I could contribute to Elasticsearch.

@s1monw s1monw removed the review label Jul 15, 2014
@s1monw
Copy link
Contributor

s1monw commented Jul 15, 2014

thanks for the report and the fix! I will push this in a bit..

s1monw pushed a commit that referenced this pull request Jul 15, 2014
Throwables thrown on update retries are now caught and handled via
the provided callback. This commit also contains an integration test
demonstrating the bug and validating the fix.

Closes #6355
Closes #6724
@s1monw s1monw closed this in 9742d08 Jul 15, 2014
@s1monw s1monw removed the review label Jul 15, 2014
s1monw pushed a commit that referenced this pull request Jul 15, 2014
Throwables thrown on update retries are now caught and handled via
the provided callback. This commit also contains an integration test
demonstrating the bug and validating the fix.

Closes #6355
Closes #6724
@clintongormley clintongormley added >bug v2.0.0-beta1 v1.3.0 v1.2.3 :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels Jun 7, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
Throwables thrown on update retries are now caught and handled via
the provided callback. This commit also contains an integration test
demonstrating the bug and validating the fix.

Closes elastic#6355
Closes elastic#6724
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v1.2.3 v1.3.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Indexing: DocumentMissingException is uncaught if thrown during retry of update request
4 participants