Skip to content

Conversation

@tzulitai
Copy link
Contributor

@tzulitai tzulitai commented May 14, 2020

This PR adds an E2E test that verifies exactly-once semantics with failure recovery.
It is based on #109 so that this new E2E test is also run in Travis.


Please see the class level docs of ExactlyOnceVerificationModule and ExactlyOnceE2E on the specifics of the app used for verification, and the verification scenario.

Change log

  • 8321db2 is a refactoring of StatefulFunctionsAppContainers. While extending that class for extra functionality required by this new E2E, it was obvious that the class is growing to big and bundling too many responsibilities (test runtime functionality, and pre-test configuration). This commit refactors the class using the builder pattern.
  • fcb0782 to 5ff971f is enabling checkpointing in apps run by StatefulFunctionsAppContainers.
  • 82867ea Adds a utility method to restart specific workers at test runtime.
  • 8a5e5ca Adds the function modules / app to be used for verification
  • 302db5b The actual E2E test verification scenario.

Verifying

The Travis build should pass.

In the exposed master logs, you should see that the job fails due to a lost worker, is recovered and restarts.

@tzulitai tzulitai force-pushed the FLINK-17516 branch 2 times, most recently from fe3bd90 to 289d11b Compare May 14, 2020 10:09
Copy link
Contributor

@igalshilman igalshilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tzulitai, the new test looks good to me.
I have left some comments for your consideration.

worker.start();
}

private static File temporaryCheckpointDir() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the need here, but I'm wondering if creating temp directories can be avoided here?
For example can we create a separate named volume, does testcontainers supports that?

Copy link
Contributor Author

@tzulitai tzulitai May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but testcontainers does not support named volumes, only bind mounts. I guess it's because the docker-java API itself doesn't support named volumes, and therefore also not supported in Testcontainers which is built on top of that.

However, even with named volumes, ideally we still would want to treat them as temporary resources that should be deleted after every test run - we don't want to be persisting anything beyond the test lifecycle.

So, in that sense, technically there doesn't seem to be a difference in using named volumes v.s. bind mounting temp host directories for what we need to do here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is very subtle, where the named volume (on OSX, and probably on Windows) is created on the same filesystem as the containers (in the Linux VM that runs docker engine) and hence stuff like sharing a unix domain socket should work (AFIK)
Also It seems cleaner to let docker handle the whole thing and not touching the host file system directly.
But I understand the limitation, and wouldn't want to overcomplicated this!
so good to merge!

@tzulitai
Copy link
Contributor Author

@igalshilman I've addressed all comments except from the one about using Docker named volumes. Please let me know what you think, and if there are no further objections, I'll merge this PR. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants