Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-27518][tests] Refactor migration tests to support version update automatically #21736

Closed
wants to merge 1 commit into from

Conversation

gaoyunhaii
Copy link
Contributor

@gaoyunhaii gaoyunhaii commented Jan 20, 2023

What is the purpose of the change

This PR refactors the state migration tests so that when cutting branch, we need only add new version and could generates the states of stale version automatically.

In general, there are two options:

  1. Similar to the configuration document generator, we could have a module that depends on all the modules containing migration tests and run generation with this module.
  2. Introduce tools to generate states, and each module config the tools separately.

We finally choose the option 2. This is because Maven have a bad support for depending on the tests classes of other modules, we could only use the test-jar, which do not support transitive dependency and make it hard to manage these transitive dependencies.

Since previously the generating methods are written with JUnit tests, some of them are bounded with the JUnit infrastructures, like @ClassRule, @Rule. To avoid the burden of re-written the generating methods, we have to have some minimum support for JUnit tests interfaces.

Except for the generating, during the refactoring we also make each migration tests use a dynamic version lists: [start, FlinkVersion.last()], which free us from manually change the list on cutting branch for each version.

Brief change log

  • Introduce a new framework of migration tests.
  • Introduce tools to scan the test classes of the configured module and generating snapshots.
  • Refactor existing tests based on the new framework.

Verifying this change

Manually verified the process of

  • Add version 1.18 to FlinkVersion.
  • Generating states automatically via mvn clean package -Pgenerate-snapshots -Dgenerate.version=1.17 -nsu -DskipRat -Dcheckstyle.skip -Drat.ignoreErrors=true -DspotlessFiles=ineffective -Dfast -DskipTests
  • Run the existing tests and verified the tests including the ones against 1.17 are all executed successfully.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? Yes
  • If yes, how is the feature documented? docs

@gaoyunhaii
Copy link
Contributor Author

@XComp could you have a look at the PR? Very thanks!

@flinkbot
Copy link
Collaborator

flinkbot commented Jan 20, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@XComp XComp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gaoyunhaii for providing this test data automation. I think, it's a good way to improve the release procedure. I didn't go through all the test classes. The points I made in StatefulJobSnapshotMigrationITCase seem to apply also in other test classes. Therefore, I will leave the comments like this for now and wait for your response and/or updates to the PR before going through the rest of the code.

Comment on lines 79 to 141
String normalizedVersionName =
"v" + versionMatcher.group(1) + "_" + versionMatcher.group(2);
FlinkVersion version = FlinkVersion.valueOf(normalizedVersionName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move this logic into FlinkVersion providing a method valueOf(int majorVersion, int minorVersion).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move it to a new constructor function for FlinkVersion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You created the constructor but didn't use it. I'm also not sure whether we should actually use a constructor here. It feels odd to use a constructor outside of the enum definition. I still would vote for providing a static method that returns the right enum value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, that makes sense, I'll change it to a static method.

@gaoyunhaii
Copy link
Contributor Author

Hi @XComp sorry it took some time for the fixes, I have update the PR, could you have another round of look? Very thanks for the revewing!

Comment on lines 79 to 141
String normalizedVersionName =
"v" + versionMatcher.group(1) + "_" + versionMatcher.group(2);
FlinkVersion version = FlinkVersion.valueOf(normalizedVersionName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You created the constructor but didn't use it. I'm also not sure whether we should actually use a constructor here. It feels odd to use a constructor outside of the enum definition. I still would vote for providing a static method that returns the right enum value.

Copy link
Contributor

@XComp XComp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another pass over the code but didn't manage to touch all the classes. But I though it was already worth it to push my comments out.

Comment on lines 66 to 67
* <p>For regenerating the binary snapshot files run {@link #writeSnapshot(FlinkVersion)} on the
* corresponding Flink release-* branch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we rather refer to the test data generation framework here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove the outdated comments since generating will be done by the RM as a whole.

flink-core/pom.xml Outdated Show resolved Hide resolved
@gaoyunhaii
Copy link
Contributor Author

Thanks @XComp for the review! I have updated the PR according to the comments. I'll further update the PR if #21736 (comment) is acceptable.

@XComp
Copy link
Contributor

XComp commented Feb 15, 2023

Thanks for letting me know, I'm gonna have another pass over it. Could you rebase the branch in the meantime?

@gaoyunhaii
Copy link
Contributor Author

Ok, got that, I'll rebase the branch to the latest master.

Copy link
Contributor

@XComp XComp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @gaoyunhaii and sorry for the late response. The 1.17 release keeps me busy. I did another pass and only found minor things when looking over the code change. Could you rebase this branch and check what's going on with the CI failures?


```xml
<profile>
<id>generate-snapshots</id>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go for generate-migration-test-data. "data" is already plural - adding the plural to the "test" keyword doesn't add any value but is rather unusual, in my opinion.

@gaoyunhaii
Copy link
Contributor Author

Hi @XComp very thanks for the review! The CI failure is due to that now we have change the test versions to be [xx, recently published version] for all the migration tests, now the latest version has been increased to 1.18, but we have not generate the snapshots for 1.17 yet. I generated the snapshots with this tool.

For the formal process in the future, I think the generating might happen at the time of cutting branch, before we adding the new version tag. Do you think this would be reasonable?

@XComp
Copy link
Contributor

XComp commented Mar 2, 2023

For the formal process in the future, I think the generating might happen at the time of cutting branch, before we adding the new version tag. Do you think this would be reasonable?

Yeah, I guess that makes sense.

About this PR: Could you squash all the commits into reasonable chunks? Especially, the test data generation should be covered in a separate commit. I will do a final pass over it after that is done.

@gaoyunhaii
Copy link
Contributor Author

gaoyunhaii commented Mar 3, 2023

"Especially, the test data generation should be covered in a separate commit."

Hi @XComp I have some concern with this point: for the formal process, if we want to split the generated files, it will still require a lot of manual operations, since there is no explicit mappings from the generated files to the migration classes.

Do we think it is necessary to split the commits? Since now it looks to me that they are some kind of auto-managed generated binary files, and the developers seems not have requirements to check its history (In fact it will only have one log that the files are added).

If we still think it is necessary to split the commits, I think we might need to extend this functionality to support some kind of post-processing actions that could execute external scripts to commit files for each migrating test classes.

What do you think about this point?

@XComp
Copy link
Contributor

XComp commented Mar 4, 2023

Not sure, whether I understand you in the right way here: I didn't mean to push the generated test da in multiple commits (e.g. one commit per test class). I meant that we want to prepare this PR to have one commit for the refactoring of the test data generation and one commit for the generated data. Does that make sense?🤔

But on the other note: I reiterated on your proposal for how the process should look like. I came to the conclusion that we shouldn't create the test data when creating the release branch. We still have to do it after the new minor release (in our current case 1.17.0) is published. The test data should be generated using the code version of the minor version's git tag (i.e. release-1.17.0). That's the baseline for migration tests. Therefore, it must be possible to control the FlinkVersion.getMostRecentlyPublished()'s return value. Automatically deriving it from the enum taking the element before the last element is not good enough . Do you agree?

@gaoyunhaii
Copy link
Contributor Author

gaoyunhaii commented Mar 7, 2023

Hi @XComp

For the first point, sorry it is indeed my misunderstanding, and we are in fact consistent.

For the second point, after some more thoughts it now looks to me we indeed need to do the data generation on publishing instead of cutting branch, since there are still new codes merged during the period before formally published.

For migration test data generation, we in fact have a parameter to specify the target version so that the snapshots data could be located in the right location, thus I think this step is ok.

The main issue is that currently the migration tests relies on the mostRecentlyPublishedVersion() to list the versions to test against. Before 1.17 is published, there is no test data for release 1.17, and the versions should be [some start version, 1.16], after 1.17 is published, the versions would become [some start version, 1.17].

It looks its not easy to distinguish the two cases without more information. The git tags might not always exist (for example, users download sources from the website) and is not easy to acquire from the java code. Do you think it is ok for us to have a constant in FlinkVersion like

public static FlinkVersion MOSTLY_RECENT_PUBLISHED_VERSION = v1.17;

and the RM will update the variable on publishing after generating the test data ?

@XComp
Copy link
Contributor

XComp commented Mar 7, 2023

Yeah, one option would be to use a constant variable. The RM would need to update that one along generating the test data. A minor thing: The variable should be named MOST_RECENTLY_PUBLISHED_VERSION. Considering that this is only used by the migration test, we could even move this constant into MigrationTestsSnapshotGenerator.

Alternatively, we could write this information into a metadata file that's located in the flink-migration-test-utils sub-module's resources and update that file as part of the MigrationTestsSnapshotGenerator test run. The user has to specify the version as part of this run, anyway. Therefore, there's no need for an additional change of code manually. WDYT?

@gaoyunhaii
Copy link
Contributor Author

gaoyunhaii commented Mar 8, 2023

Hi @XComp the option of having the information in the resource direction also looks good to me, I'll update the PR and also squash the commits.

@gaoyunhaii
Copy link
Contributor Author

Hi @XComp sorry when I try to implement the logic of update the file that records the most-recently published file, I found that it is not easy to locate the file:

  1. Java does not support update the resources directly.
  2. In Maven the directory should be ${project.rootlocation} / flink-test-utils-parent/flink-migration-test-utils/src/main/resources, but there is no variables like ${project.rootlocation} and there is no easy way to acquire that.

Do you think it is also ok that we still leave the most-recently flink version in file, but we update it with the bash scripts on publishing?

@XComp
Copy link
Contributor

XComp commented Mar 9, 2023

should be fine, I guess.

@gaoyunhaii
Copy link
Contributor Author

Hi @XComp sorry for the long delay due to being bound recently, I have merged the previous commits except for the last change that reads latest published version from the file. I'll further squash it after this piece get reviewed.

Copy link
Contributor

@XComp XComp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @gaoyunhaii . I added a few comments. PTAL

Copy link
Contributor

@XComp XComp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...just additional thoughts after I went through generating the data manually for FLINK-31593.

@XComp
Copy link
Contributor

XComp commented Apr 24, 2023

Hi @gaoyunhaii , I'm wondering what the status of the PR is. I merged FLINK-31593 now which is causing the conflicts, I guess. Would it be possible to finalize this PR rather sooner than later to be able to close this issue?

@gaoyunhaii
Copy link
Contributor Author

Hi @XComp sorry for the long delay due to being heavily occupied in the previous weeks, I'll update the PR today and will try to make it being mergable in this week.

@gaoyunhaii
Copy link
Contributor Author

Hi @XComp Very sorry for the long delay, I updated the PR according to the comments.

@XComp
Copy link
Contributor

XComp commented May 3, 2023

CI is still failing. I didn't do an entire pass over the PR because of that but rather focused on the open discussions.

@gaoyunhaii
Copy link
Contributor Author

Hi @XComp sorry for failing the Ci previously, the CI is now passed, could you have another look?

Copy link
Contributor

@XComp XComp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gaoyunhaii . I have just a few nitty comments. But generally, it looks good. I also tried the test data generation and it generated files. I would suggest squashing everything together and rebasing the branch.

Just as a test run: Could you generate the data and commit them so that we would get a full CI run with the generated data? We wouldn't merge that data commit to master. This would be just so that we have a test run with the generated data. WDYT?

@gaoyunhaii
Copy link
Contributor Author

Very thanks @XComp for the review! I have update the PR to fix the remaining comments.

Also I generate the test data for 1.18 and push the PR to #22567, and we could see if the test for this PR could be passed.

@gaoyunhaii
Copy link
Contributor Author

Hi @XComp the tests of both this PR and #22567 has passed 😀.

Copy link
Contributor

@XComp XComp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Great, this took quite some effort. Good job, @gaoyunhaii :-) I verified the CI build in #22567 as well (see related comment). I guess, we're good to go. Could you update the Create Flink Release wiki article accordingly as a follow-up?

@gaoyunhaii
Copy link
Contributor Author

Thanks @XComp a lot for the review all the way! I'll update the wiki today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants