Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-11463][PROTOTYPE] Port Kafka E2E test to Java #7605

Merged
merged 12 commits into from Nov 15, 2019

Conversation

zentol
Copy link
Contributor

@zentol zentol commented Jan 30, 2019

What is the purpose of the change

This PR is a prototype for writing Java-based end-to-end tests.
It contains a port of the Kafka end-to-end test.

"Brief" change log

The first few commits (labeled as hotfixes) are cleanups and extensions around existing code.

[hotfix][build] Remove unnecessary version tag

Removes a redundant version tag.

[hotfix][tests] Remove forced step logging in AutoClosableProcess

The convenience methods for creating Processes forced the user to pass a description of what this Process is supposed to be doing.
This is rather tiresome to work against, the output doesn't look as nice as dedicated log messages would and is inconsistent for non-blocking operations.
As such I have removed the logging.

[hotfix][tests] Rework Process IO handling

Previously newly started AutoClosableProcesses always inherited IO from the parent process. This interacts oddly when forking VMs (which surefire does) and can lead to streams being corrupted.
This commits reworks the ACP creation and provides callers with a hook for processing the stdout/stderr streams of the process.

[hotfix][tests] Extend Tar wrapper to support strip argument

Simple extension of the Tar wrapper to support additional functionality. This is useful for archives which only contain a directory in the root.

[hotfix][tests] Add modified ExternalResource class

This commit adds a modified version of the jUnit ExternalResource. It is an interface instead of an abstract class and allows resources to behave differently in their after method depending on whether the test succeeded. This will be used by the FlinkDistribution to backup logs for failed tests.

[hotfix][tests] Backup logs for failed tests

Modifies the FlinkDistribution to implement our modified ExternalResource interface instead, and implements a backup logic for logs if a test fails.

[hotfix][tests] Support starting individual Job-/TaskManagers

Small extesion of the FlinkDistribution to expose jobmanager.sh start and taskmanager.sh start.

[hotfix][tests] Support job submissions

Extension of the FlinkDistribution to support running jars.

Now to the actual prototype:

[FLINK-11463][tests] Add utilities

Adds a number of utilities used in subsequent commits. Kept in a separate commit to make re-ordering of commits easier.

TestUtils#getResourceJar is used to creating absolute paths for files in a modules target directory, like copied dependencies.
OperatingSystemRestriction provides a common way to disable test based on the operating system. Will be commonly needed for E2E tests until we got rid of all shell-script and unix tool dependencies.
ParameterProperty is is used for retrieving and parsing parameters from system properties.
FactoryUtils#loadAndInvokeFactory is used by the core resource factory methods to load and process factories.

[FLINK-11465][tests] Add FlinkResource

Adds a common FlinkResource interface for interacting with Flink, and adds a LocalStandaloneFlinkResource implementation that sets up local standalone clusters.
The interfaces are intentionally kept bare-bones.

Tests can make use of a FlinkResource like this:

@Rule
public final FlinkResource flink = FlinkResource.get();

This will return one FlinkResource implementation. Which implementation is loaded depends on the available FlinkResourceFactory implementations (loaded via ServiceLoader) and the system properties that have been set.
As of right now this will always return a LocalStandaloneFlinkResource as it is the only existing implementation.

The LocalStandaloneFlinkResource is backed by a FlinkDistribution and is rather straight-forward.

[FLINK-11464][tests] Add DownloadCache

Adds a utility for downloading and caching artifacts.

The access/loading mechanism works identical to the FlinkResource.

3 implementations are provided in this PR. All of them support downloading arbitrary files, but they differ in their caching behavior.

  • LolCache - The default implementation. Does not cache anything.
  • PersistingDownloadCache - Intended default implementation for local setups. Will be used if the cache-dir property is set to a local directory to be used as a cache location. Identifies files by the hash of the URL, and encodes the hash and download-date into the file name. Supports time-based cleanup of cached entries by setting the cache-ttl property to a Period, e.g. P1D (one day), when running maven.
  • TravisDownloadCache - Implementation intended for Travis. Similar to the PersistingDownloadCache, but instead of time uses a build-number based TTL.

[FLINK-11466][tests] Add KafkaResource

Adds a common KafkaResource interface for interacting with Kafka, and adds a LocalStandaloneKafkaResource` implementation that downloads and sets up a local cluster with the bundled zookeeper. This should behave similarly like the kafka cluster setup in our bash end-to-end test (see kafka-common.sh).

The access/loading mechanism works identical to the FlinkResource.

[FLINK-11468][tests] Setup surefire execution

Sets up a separate surefire execution for end-to-end test in flink-end-to-end-tests.

  • e2e-test by be suffixed with ITCase
  • e2e-tests by be annotated with @Category(TravisGroupX.class) to have them run in a specific Travis job.
  • e2e-tests that rely on hadoop should also be marked as such. @Category({TravisGroupX.class, Hadoop.class})

The required changes to cron-master-e2e can be seen here: zentol@ead3ac4

[FLINK-11467][kafka][tests] Port Kafka Streaming E2E test

Actually ports the kakfa Streaming E2E test to Java, building on the previous commits. Effectively a 1:1 port of the test-streaming-kafka-common.sh.

Verifying this change

Individual utilities are not tested.

Successful travis build: https://travis-ci.org/zentol/flink/jobs/486896824
(note: it times out in the bash! e2e tests)

The kafka test can be executed by running mvn verify -Dcategories="org.apache.flink.tests.util.categories.TravisGroup1" -DdistDir=<path to flink-dist> in flink-streaming-kafka-driver.
All tests can be executed by running mvn verify -Dcategories="" -DdistDir=<path to flink-dist> in flink-end-to-end-tests

@zentol zentol force-pushed the e2e_infra branch 2 times, most recently from d4d91fe to bbf013e Compare January 31, 2019 11:52
@aljoscha aljoscha self-requested a review February 11, 2019 13:07
@tweise tweise changed the title [FLINK-11476][PROTOTYPE] Port Kafka E2E test to Java [FLINK-11463][PROTOTYPE] Port Kafka E2E test to Java Mar 15, 2019
Copy link
Contributor

@aljoscha aljoscha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is a very good direction! 👍 I had some inline comments and now some comments here about the general idea.

TravisGroup Categories

Not sure if they are placeholders or not. I think we should not strictly tie tests to given travis groups but rather have thematic categories and then in the travis configuration we choose which categories of tests get executed together. Tying them to Travis group would hardcode our current test setup into the code, which makes it less future-proof/flexible, I think.

Distributions and concurrency

The pom tries to ensure that we don't run tests in parallel because the "distributions", Flink and Kafka so far, would have problems with being accessed by multiple tests.

I think instead we could create/copy a new directory for a distribution for each test to ensure isolation. This would also solve the problem of backing up and restoring the Flink configuration, which I think is prone to problems in the future. This would make things both more robust and potentially faster because we can exploit parallelism. Plus it would ensure that things can actually run in parallel without interfering, which is a nice addition.

What do you think?

@zentol
Copy link
Contributor Author

zentol commented Mar 27, 2019

I'd love to be able to run these tests in parallel (from a speed perspective alone), but I have doubts that this is really feasible due to port conflicts and the introduced load randomness that may mess with timeouts etc. .

@zentol
Copy link
Contributor Author

zentol commented Mar 27, 2019

TravisGroups and thematic groups are not mutually exclusive.

I agree that the TravisGroups are clunky, but this way we can easily add e2e tests without having to touch the travis-related files (in case a new theme is added) (which is often forgotten). Thematic groups are also less concise (pass the FQN of each to maven, with no way of shortening this).

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 18, 2019

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build

@zentol zentol merged commit 671029d into apache:master Nov 15, 2019
@zentol zentol deleted the e2e_infra branch November 21, 2019 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants