[FLINK-27518][tests] Refactor migration tests to support version update automatically #21736

gaoyunhaii · 2023-01-20T05:15:11Z

What is the purpose of the change

This PR refactors the state migration tests so that when cutting branch, we need only add new version and could generates the states of stale version automatically.

In general, there are two options:

Similar to the configuration document generator, we could have a module that depends on all the modules containing migration tests and run generation with this module.
Introduce tools to generate states, and each module config the tools separately.

We finally choose the option 2. This is because Maven have a bad support for depending on the tests classes of other modules, we could only use the test-jar, which do not support transitive dependency and make it hard to manage these transitive dependencies.

Since previously the generating methods are written with JUnit tests, some of them are bounded with the JUnit infrastructures, like @ClassRule, @Rule. To avoid the burden of re-written the generating methods, we have to have some minimum support for JUnit tests interfaces.

Except for the generating, during the refactoring we also make each migration tests use a dynamic version lists: [start, FlinkVersion.last()], which free us from manually change the list on cutting branch for each version.

Brief change log

Introduce a new framework of migration tests.
Introduce tools to scan the test classes of the configured module and generating snapshots.
Refactor existing tests based on the new framework.

Verifying this change

Manually verified the process of

Add version 1.18 to FlinkVersion.
Generating states automatically via mvn clean package -Pgenerate-snapshots -Dgenerate.version=1.17 -nsu -DskipRat -Dcheckstyle.skip -Drat.ignoreErrors=true -DspotlessFiles=ineffective -Dfast -DskipTests
Run the existing tests and verified the tests including the ones against 1.17 are all executed successfully.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? Yes
If yes, how is the feature documented? docs

gaoyunhaii · 2023-01-20T05:17:11Z

@XComp could you have a look at the PR? Very thanks!

flinkbot · 2023-01-20T05:21:21Z

CI report:

6db6455 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

XComp

Thanks @gaoyunhaii for providing this test data automation. I think, it's a good way to improve the release procedure. I didn't go through all the test classes. The points I made in StatefulJobSnapshotMigrationITCase seem to apply also in other test classes. Therefore, I will leave the comments like this for now and wait for your response and/or updates to the PR before going through the rest of the code.

flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java

...est-utils/src/main/java/org/apache/flink/test/migration/MigrationTestsSnapshotGenerator.java

XComp · 2023-01-20T16:20:25Z

...est-utils/src/main/java/org/apache/flink/test/migration/MigrationTestsSnapshotGenerator.java

+            String normalizedVersionName =
+                    "v" + versionMatcher.group(1) + "_" + versionMatcher.group(2);
+            FlinkVersion version = FlinkVersion.valueOf(normalizedVersionName);


We could move this logic into FlinkVersion providing a method valueOf(int majorVersion, int minorVersion).

I'll move it to a new constructor function for FlinkVersion

You created the constructor but didn't use it. I'm also not sure whether we should actually use a constructor here. It feels odd to use a constructor outside of the enum definition. I still would vote for providing a static method that returns the right enum value.

Yes indeed, that makes sense, I'll change it to a static method.

...ts/src/test/java/org/apache/flink/test/checkpointing/StatefulJobSnapshotMigrationITCase.java

...test/java/org/apache/flink/test/checkpointing/StatefulJobWBroadcastStateMigrationITCase.java

...ts/src/test/java/org/apache/flink/test/checkpointing/StatefulJobSnapshotMigrationITCase.java

gaoyunhaii · 2023-02-02T08:24:46Z

Hi @XComp sorry it took some time for the fixes, I have update the PR, could you have another round of look? Very thanks for the revewing!

...tion-test-utils/src/main/java/org/apache/flink/test/migration/SnapshotGeneratorExecutor.java

XComp · 2023-02-07T15:33:54Z

...est-utils/src/main/java/org/apache/flink/test/migration/MigrationTestsSnapshotGenerator.java

+            String normalizedVersionName =
+                    "v" + versionMatcher.group(1) + "_" + versionMatcher.group(2);
+            FlinkVersion version = FlinkVersion.valueOf(normalizedVersionName);


You created the constructor but didn't use it. I'm also not sure whether we should actually use a constructor here. It feels odd to use a constructor outside of the enum definition. I still would vote for providing a static method that returns the right enum value.

XComp

I did another pass over the code but didn't manage to touch all the classes. But I though it was already worth it to push my comments out.

...est-utils/src/main/java/org/apache/flink/test/migration/MigrationTestsSnapshotGenerator.java

flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java

...ts/src/test/java/org/apache/flink/test/checkpointing/StatefulJobSnapshotMigrationITCase.java

XComp · 2023-02-07T16:47:50Z

...st/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBaseMigrationTest.java

+ * <p>For regenerating the binary snapshot files run {@link #writeSnapshot(FlinkVersion)} on the
+ * corresponding Flink release-* branch.


Shouldn't we rather refer to the test data generation framework here?

I'll remove the outdated comments since generating will be done by the RM as a whole.

flink-core/pom.xml

...rc/test/scala/org/apache/flink/api/scala/migration/StatefulJobSavepointMigrationITCase.scala

.../test/java/org/apache/flink/test/state/operator/restore/unkeyed/ChainLengthIncreaseTest.java

.../test/java/org/apache/flink/test/state/operator/restore/unkeyed/ChainLengthDecreaseTest.java

...ts/src/test/java/org/apache/flink/test/checkpointing/StatefulJobSnapshotMigrationITCase.java

...-core/src/test/java/org/apache/flink/api/common/typeutils/TypeSerializerUpgradeTestBase.java

gaoyunhaii · 2023-02-14T14:42:51Z

Thanks @XComp for the review! I have updated the PR according to the comments. I'll further update the PR if #21736 (comment) is acceptable.

XComp · 2023-02-15T07:53:46Z

Thanks for letting me know, I'm gonna have another pass over it. Could you rebase the branch in the meantime?

gaoyunhaii · 2023-02-15T07:56:22Z

Ok, got that, I'll rebase the branch to the latest master.

XComp

Thanks, @gaoyunhaii and sorry for the late response. The 1.17 release keeps me busy. I did another pass and only found minor things when looking over the code change. Could you rebase this branch and check what's going on with the CI failures?

...-core/src/test/java/org/apache/flink/api/common/typeutils/TypeSerializerUpgradeTestBase.java

XComp · 2023-02-27T14:41:50Z

flink-test-utils-parent/flink-migration-test-utils/README.md

+
+```xml
+<profile>
+    <id>generate-snapshots</id>


I'd go for generate-migration-test-data. "data" is already plural - adding the plural to the "test" keyword doesn't add any value but is rather unusual, in my opinion.

gaoyunhaii · 2023-03-02T10:56:11Z

Hi @XComp very thanks for the review! The CI failure is due to that now we have change the test versions to be [xx, recently published version] for all the migration tests, now the latest version has been increased to 1.18, but we have not generate the snapshots for 1.17 yet. I generated the snapshots with this tool.

For the formal process in the future, I think the generating might happen at the time of cutting branch, before we adding the new version tag. Do you think this would be reasonable?

XComp · 2023-03-02T13:49:26Z

For the formal process in the future, I think the generating might happen at the time of cutting branch, before we adding the new version tag. Do you think this would be reasonable?

Yeah, I guess that makes sense.

About this PR: Could you squash all the commits into reasonable chunks? Especially, the test data generation should be covered in a separate commit. I will do a final pass over it after that is done.

gaoyunhaii · 2023-03-03T07:42:48Z

"Especially, the test data generation should be covered in a separate commit."

Hi @XComp I have some concern with this point: for the formal process, if we want to split the generated files, it will still require a lot of manual operations, since there is no explicit mappings from the generated files to the migration classes.

Do we think it is necessary to split the commits? Since now it looks to me that they are some kind of auto-managed generated binary files, and the developers seems not have requirements to check its history (In fact it will only have one log that the files are added).

If we still think it is necessary to split the commits, I think we might need to extend this functionality to support some kind of post-processing actions that could execute external scripts to commit files for each migrating test classes.

What do you think about this point?

XComp · 2023-03-04T06:21:52Z

Not sure, whether I understand you in the right way here: I didn't mean to push the generated test da in multiple commits (e.g. one commit per test class). I meant that we want to prepare this PR to have one commit for the refactoring of the test data generation and one commit for the generated data. Does that make sense?🤔

But on the other note: I reiterated on your proposal for how the process should look like. I came to the conclusion that we shouldn't create the test data when creating the release branch. We still have to do it after the new minor release (in our current case 1.17.0) is published. The test data should be generated using the code version of the minor version's git tag (i.e. release-1.17.0). That's the baseline for migration tests. Therefore, it must be possible to control the FlinkVersion.getMostRecentlyPublished()'s return value. Automatically deriving it from the enum taking the element before the last element is not good enough . Do you agree?

gaoyunhaii · 2023-03-07T07:06:55Z

Hi @XComp

For the first point, sorry it is indeed my misunderstanding, and we are in fact consistent.

For the second point, after some more thoughts it now looks to me we indeed need to do the data generation on publishing instead of cutting branch, since there are still new codes merged during the period before formally published.

For migration test data generation, we in fact have a parameter to specify the target version so that the snapshots data could be located in the right location, thus I think this step is ok.

The main issue is that currently the migration tests relies on the mostRecentlyPublishedVersion() to list the versions to test against. Before 1.17 is published, there is no test data for release 1.17, and the versions should be [some start version, 1.16], after 1.17 is published, the versions would become [some start version, 1.17].

It looks its not easy to distinguish the two cases without more information. The git tags might not always exist (for example, users download sources from the website) and is not easy to acquire from the java code. Do you think it is ok for us to have a constant in FlinkVersion like

public static FlinkVersion MOSTLY_RECENT_PUBLISHED_VERSION = v1.17;

and the RM will update the variable on publishing after generating the test data ?

XComp · 2023-03-07T12:40:32Z

Yeah, one option would be to use a constant variable. The RM would need to update that one along generating the test data. A minor thing: The variable should be named MOST_RECENTLY_PUBLISHED_VERSION. Considering that this is only used by the migration test, we could even move this constant into MigrationTestsSnapshotGenerator.

Alternatively, we could write this information into a metadata file that's located in the flink-migration-test-utils sub-module's resources and update that file as part of the MigrationTestsSnapshotGenerator test run. The user has to specify the version as part of this run, anyway. Therefore, there's no need for an additional change of code manually. WDYT?

gaoyunhaii · 2023-03-08T07:40:23Z

Hi @XComp the option of having the information in the resource direction also looks good to me, I'll update the PR and also squash the commits.

gaoyunhaii · 2023-03-09T09:41:36Z

Hi @XComp sorry when I try to implement the logic of update the file that records the most-recently published file, I found that it is not easy to locate the file:

Java does not support update the resources directly.
In Maven the directory should be ${project.rootlocation} / flink-test-utils-parent/flink-migration-test-utils/src/main/resources, but there is no variables like ${project.rootlocation} and there is no easy way to acquire that.

Do you think it is also ok that we still leave the most-recently flink version in file, but we update it with the bash scripts on publishing?

XComp · 2023-03-09T12:04:16Z

should be fine, I guess.

gaoyunhaii · 2023-03-14T17:29:14Z

Hi @XComp sorry for the long delay due to being bound recently, I have merged the previous commits except for the last change that reads latest published version from the file. I'll further squash it after this piece get reviewed.

XComp

Thanks, @gaoyunhaii . I added a few comments. PTAL

...est-utils/src/main/java/org/apache/flink/test/migration/MigrationTestsSnapshotGenerator.java

.../test/java/org/apache/flink/test/state/operator/restore/AbstractOperatorRestoreTestBase.java

...rc/test/scala/org/apache/flink/api/scala/migration/StatefulJobSavepointMigrationITCase.scala

...t/scala/org/apache/flink/api/scala/migration/StatefulJobWBroadcastStateMigrationITCase.scala

...igration-test-utils/src/main/java/org/apache/flink/test/migration/PublishedVersionUtils.java

XComp

...just additional thoughts after I went through generating the data manually for FLINK-31593.

.../test/java/org/apache/flink/test/state/operator/restore/AbstractOperatorRestoreTestBase.java

...fs-tests/src/test/java/org/apache/flink/hdfstests/ContinuousFileProcessingMigrationTest.java

...t-utils-parent/flink-migration-test-utils/src/main/resources/most_recently_published_version

XComp · 2023-04-24T04:12:43Z

Hi @gaoyunhaii , I'm wondering what the status of the PR is. I merged FLINK-31593 now which is causing the conflicts, I guess. Would it be possible to finalize this PR rather sooner than later to be able to close this issue?

gaoyunhaii · 2023-04-25T06:56:29Z

Hi @XComp sorry for the long delay due to being heavily occupied in the previous weeks, I'll update the PR today and will try to make it being mergable in this week.

gaoyunhaii · 2023-04-29T23:53:25Z

Hi @XComp Very sorry for the long delay, I updated the PR according to the comments.

XComp · 2023-05-03T09:31:33Z

CI is still failing. I didn't do an entire pass over the PR because of that but rather focused on the open discussions.

gaoyunhaii · 2023-05-03T23:08:34Z

Hi @XComp sorry for failing the Ci previously, the CI is now passed, could you have another look?

XComp

Thanks @gaoyunhaii . I have just a few nitty comments. But generally, it looks good. I also tried the test data generation and it generated files. I would suggest squashing everything together and rebasing the branch.

Just as a test run: Could you generate the data and commit them so that we would get a full CI run with the generated data? We wouldn't merge that data commit to master. This would be just so that we have a test run with the generated data. WDYT?

flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java

.../test/java/org/apache/flink/test/state/operator/restore/AbstractOperatorRestoreTestBase.java

flink-test-utils-parent/flink-migration-test-utils/README.md

…d refactor existing tests.

gaoyunhaii · 2023-05-11T07:29:30Z

Very thanks @XComp for the review! I have update the PR to fix the remaining comments.

Also I generate the test data for 1.18 and push the PR to #22567, and we could see if the test for this PR could be passed.

gaoyunhaii · 2023-05-11T11:29:49Z

Hi @XComp the tests of both this PR and #22567 has passed 😀.

XComp

LGTM 👍 Great, this took quite some effort. Good job, @gaoyunhaii :-) I verified the CI build in #22567 as well (see related comment). I guess, we're good to go. Could you update the Create Flink Release wiki article accordingly as a follow-up?

gaoyunhaii · 2023-05-12T06:06:19Z

Thanks @XComp a lot for the review all the way! I'll update the wiki today.

flinkbot added the component=TestInfrastructure label Jan 20, 2023

MartijnVisser requested a review from XComp January 20, 2023 08:10

gaoyunhaii force-pushed the format_migration_test branch from f7edc0b to c2c8cbd Compare January 20, 2023 08:50

XComp reviewed Jan 25, 2023

View reviewed changes

gaoyunhaii force-pushed the format_migration_test branch from c2c8cbd to 3759c53 Compare February 2, 2023 08:08

XComp reviewed Feb 7, 2023

View reviewed changes

XComp reviewed Feb 8, 2023

View reviewed changes

gaoyunhaii force-pushed the format_migration_test branch from 3759c53 to 649e590 Compare February 14, 2023 14:18

gaoyunhaii force-pushed the format_migration_test branch from 649e590 to 10e162c Compare February 16, 2023 08:07

XComp requested changes Feb 27, 2023

View reviewed changes

gaoyunhaii force-pushed the format_migration_test branch from 10e162c to 4d5e0e2 Compare March 2, 2023 10:51

gaoyunhaii force-pushed the format_migration_test branch from 4d5e0e2 to 92a8586 Compare March 14, 2023 17:25

XComp requested changes Mar 21, 2023

View reviewed changes

XComp requested changes Mar 24, 2023

View reviewed changes

gaoyunhaii force-pushed the format_migration_test branch from 92a8586 to 3425682 Compare April 29, 2023 23:49

XComp reviewed May 10, 2023

View reviewed changes

[FLINK-27518][tests] Introduce the state migration tests framework an…

6db6455

…d refactor existing tests.

gaoyunhaii mentioned this pull request May 11, 2023

[FLINK-27518][Check only] Check the migration test data generation works #22567

Closed

gaoyunhaii force-pushed the format_migration_test branch from 71b7e8d to 6db6455 Compare May 11, 2023 07:28

XComp approved these changes May 11, 2023

View reviewed changes

gaoyunhaii closed this in 82fb74e May 12, 2023

		* <p>For regenerating the binary snapshot files run {@link #writeSnapshot(FlinkVersion)} on the
		* corresponding Flink release-* branch.

[FLINK-27518][tests] Refactor migration tests to support version update automatically #21736

[FLINK-27518][tests] Refactor migration tests to support version update automatically #21736

Conversation

gaoyunhaii commented Jan 20, 2023 • edited

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

gaoyunhaii commented Jan 20, 2023

flinkbot commented Jan 20, 2023 • edited

CI report:

XComp left a comment

Choose a reason for hiding this comment

XComp Jan 20, 2023

Choose a reason for hiding this comment

gaoyunhaii Jan 31, 2023

Choose a reason for hiding this comment

XComp Feb 7, 2023

Choose a reason for hiding this comment

gaoyunhaii Feb 13, 2023 • edited

Choose a reason for hiding this comment

gaoyunhaii commented Feb 2, 2023

XComp Feb 7, 2023

Choose a reason for hiding this comment

XComp left a comment

Choose a reason for hiding this comment

XComp Feb 7, 2023

Choose a reason for hiding this comment

gaoyunhaii Feb 14, 2023

Choose a reason for hiding this comment

gaoyunhaii commented Feb 14, 2023

XComp commented Feb 15, 2023

gaoyunhaii commented Feb 15, 2023

XComp left a comment

Choose a reason for hiding this comment

XComp Feb 27, 2023

Choose a reason for hiding this comment

gaoyunhaii commented Mar 2, 2023

XComp commented Mar 2, 2023

gaoyunhaii commented Mar 3, 2023 • edited

XComp commented Mar 4, 2023

gaoyunhaii commented Mar 7, 2023 • edited

XComp commented Mar 7, 2023 • edited

gaoyunhaii commented Mar 8, 2023 • edited

gaoyunhaii commented Mar 9, 2023

XComp commented Mar 9, 2023

gaoyunhaii commented Mar 14, 2023

XComp left a comment

Choose a reason for hiding this comment

XComp left a comment

Choose a reason for hiding this comment

XComp commented Apr 24, 2023

gaoyunhaii commented Apr 25, 2023

gaoyunhaii commented Apr 29, 2023

XComp commented May 3, 2023

gaoyunhaii commented May 3, 2023

XComp left a comment

Choose a reason for hiding this comment

gaoyunhaii commented May 11, 2023

gaoyunhaii commented May 11, 2023

XComp left a comment

Choose a reason for hiding this comment

gaoyunhaii commented May 12, 2023

gaoyunhaii commented Jan 20, 2023 •

edited

flinkbot commented Jan 20, 2023 •

edited

gaoyunhaii Feb 13, 2023 •

edited

gaoyunhaii commented Mar 3, 2023 •

edited

gaoyunhaii commented Mar 7, 2023 •

edited

XComp commented Mar 7, 2023 •

edited

gaoyunhaii commented Mar 8, 2023 •

edited