Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-24394][test] Refactor BuiltInFunctions IT Tests #17341

Closed
wants to merge 1 commit into from

Conversation

matriv
Copy link
Contributor

@matriv matriv commented Sep 23, 2021

Add the possibility to tests multiple TableApi expressions
and/or SQL expressions as multiple table columns using just one
execution of the pipeline to speed up the total test execution time.

Refactor existing tests of BuiltInFunctions to use the new approach
wherever possible.

What is the purpose of the change

Refactor BuiltInFunctionTestBase to allow testing of multiple expressions at once.

Brief change log

  • Add the possibility to test multiple TableApi and SQL expressions within on execution, by converting them into multiple table columns. This allows for faster overall test execution.
  • Refactor existing builtin function tests to use this new approach
  • Change some of the existing tests so that both Table API and SQL paths are verified.

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@flinkbot
Copy link
Collaborator

flinkbot commented Sep 23, 2021

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 1b6da15 (Thu Sep 23 18:54:41 UTC 2021)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Sep 23, 2021

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

@matriv matriv force-pushed the rf-builtin-fn-test branch 2 times, most recently from f2c827b to 1b6da15 Compare September 23, 2021 17:05
@matriv
Copy link
Contributor Author

matriv commented Sep 24, 2021

@flinkbot run azure

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @matriv. This is quite a big change for a hotfix and should have a corresponding JIRA issue maybe as a subtask of your current issue. I added some feedback.

}

TestSpec testTableApiResult(
Expression[] expression, Object[] result, AbstractDataType<?>[] dataType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we use List instead of arrays? I find Arrays.asList and soon List.of nicer than the verbose array syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have any objection, just chose arrays here to avoid the overhead of List objects especially for the common cases of only one element, but considering it's test, Lists are nicer indeed.

@@ -298,14 +298,52 @@ TestSpec testResult(
return testResult(expression, sqlExpression, result, dataType, dataType);
}

TestSpec testResult(TableTestSpecColumn... tableTestSpecColumns) {
int cols = tableTestSpecColumns.length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we don't enforce using final in Flink but if you check the core, you will see that most committers use final extensively to indicate immutable/mutability in code. I would vote for having a conistent coding style within our team. At least we should adapt to the coding style of the class. When I read this and the following lines, it looks to me as if we would modify the reference in the for loop again. Because all variables above are marked final.

testItems.add(new TableApiResultTestItem(expression, result, tableApiDataType));
testItems.add(new SqlResultTestItem(sqlExpression, result, sqlDataType));
StringJoiner sj = new StringJoiner(",");
for (String sql : sqlExpression) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use the Java streams API or String.join once we have lists

String sqlExpression,
Object result,
AbstractDataType<?> tableApiDataType,
AbstractDataType<?> sqlQueryDataType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if Table API and SQL API share the same result it should in most cases also have the same data type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this variant exposed as there are a couple of tests in JsonFunctionsITCase that make use of different return data types: https://github.com/apache/flink/pull/17341/files#diff-14f6e87fed00e28afef4a5e995126c430b4065661e3dc2f66d14f3741378f874R180

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH this is a very special case and hopefully does not happen elsewhere. we should not start having more of these special cases.

}

/** Helper POJO to store test parameters. */
public static class TableTestSpecColumn {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Column sounds very internal to this class. Tests don't know columns but only test specs. How about ResultSpec to keep it short?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thx, couldn't come up with a nice name.

// In this case, the return type is not null because we have a
// constant in the function invocation
BIGINT().notNull()));
of(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using a static import in every test, we could simply provide static methods in the upper class, called resultItem(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'd prefer though the name resultSpec() since resultItem is another more lower level pojo within the class and can cause confusion.

@matriv matriv changed the title [hotfix][test] Refactor BuiltInFunctions IT Tests [FLINK-24394][test] Refactor BuiltInFunctions IT Tests Sep 28, 2021
@matriv matriv requested a review from twalthr September 28, 2021 10:11
Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I added one last comment and will merge this once the build is green then.

@@ -150,7 +153,7 @@ private static void testResult(

assertEquals(
"Result of column [" + i + "] doesn't match.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of spec btw I think it would be nice to also print the test item summary to quickly know which one failed if there are many.

Copy link
Contributor Author

@matriv matriv Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

"Logical type for spec [" + i + "] of test [" + testItem + "] doesn't match"

&

"Result for spec [" + i + "] of test [" + testItem + "] doesn't match"

looks good?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed this change, along with a fix regarding the relevant toString() implementation for TableAPI cases.

Add the possibility to tests multiple TableApi expressions
and/or SQL expressions as multiple table columns using just one
execution of the pipeline to speed up the total test execution time.

Refactor existing tests of BuiltInFunctions to use the new approach
whereever possible.
@twalthr
Copy link
Contributor

twalthr commented Sep 28, 2021

@flinkbot run azure

@matriv
Copy link
Contributor Author

matriv commented Sep 28, 2021

@twalthr I guess this should go to 1.14 as well and possibly 1.13 to ease future backporting of relevant fixes?

@twalthr twalthr closed this in aba25f1 Sep 29, 2021
@twalthr
Copy link
Contributor

twalthr commented Sep 29, 2021

@matriv I haven't merged this to 1.14 yet. We can do this on demand.

@matriv matriv deleted the rf-builtin-fn-test branch October 6, 2021 09:10
niklassemmler pushed a commit to niklassemmler/flink that referenced this pull request Feb 3, 2022
…nFunctionTestBase

Add the possibility to tests multiple TableApi expressions
and/or SQL expressions as multiple table columns using just one
execution of the pipeline to speed up the total test execution time.

Refactor existing tests of BuiltInFunctions to use the new approach
whereever possible.

This closes apache#17341.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants