Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-27246][TableSQL/Runtime][master] - Include WHILE blocks in split generated java methods #21393

Merged
merged 1 commit into from
Feb 6, 2023

Conversation

kristoffSC
Copy link
Contributor

@kristoffSC kristoffSC commented Nov 24, 2022

What is the purpose of the change

This PR fix issue reported in https://issues.apache.org/jira/browse/FLINK-27246 that was caused by JavaCodeSplitter not rewriting "while" statements that was leading to Java compiler not able to compile code due to to big methods.

Proposed change, enhance JavaCodeSplitter by replacing IfStatementRewriter.java with BlockStatementRewriter.java.

The new BlockStatementRewriter.java can rewrite both IF/ESLE statements and WHILE blocks including nested statements.

The BlockStatementRewriter works in two steps. First step extracts blocks from IF/ELSE branches and WHILE bodies to separate methods. This step is executed by new class BlockStatementSplitter.

The second step, groups statements (single lien statements, IF/ELSE and WHILE statements) into groups and extract them to new methods. This step is executed by new class BlockStatementGrouper .

An example of rewritten code:
BEFORE:

public class Example {
 
      int b;
      int c;
 
      public void myFun(int a) {
 
          int counter = 10;
          while (counter > 0) {
              int localA = a + 1000;
              System.out.println(localA);
              if (a > 0) {
                  b = a * 2;
                  c = b * 2;
                  System.out.println(b);
              } else {
                  b = a * 3;
                  System.out.println(b);
              }
              counter--;
          }
      }
  }

AFTER executing BlockStatementRewriter:

public class Example {
 
      int b;
      int c;
 
      public void myFun(int a) {
 
          int counter = 10;
 
          while (counter > 0) {
              myFun_rewriteGroup1(a);
              counter--;
          }
      }
 
      void myFun_rewriteGroup1(int a) {
          myFun_whileBody0_0(a);
          if (a > 0) {
              myFun_whileBody0_0_ifBody0(a);
          } else {
              myFun_whileBody0_0_ifBody1(a);
          }
      }
 
      void myFun_whileBody0_0(int a) {
          int localA = a + 1000;
          System.out.println(localA);
      }
 
      void myFun_whileBody0_0_ifBody1(int a) {
          b = a * 3;
          System.out.println(b);
      }
 
      void myFun_whileBody0_0_ifBody0(int a) {
          b = a * 2;
          c = b * 2;
          System.out.println(b);
      }
  }

Brief change log

  • Include WHILE blocs in Java cod splitter logic.
  • Add BlockStatementRewriter, BlockStatementSplitter and BlockStatementGrouper class,
  • Remove IfStatementRewriter class

Verifying this change

This change is already covered by existing tests, such as

  • Flink-brach-sql-test
  • Flink-tcpds-test
  • Junit test - JavaCodeSplitterTest

This change added tests and can be verified as follows:

  • BlockStatementRewriterTest.java
  • BlockStatementSplitterTest.java
  • BlockStatementGrouperTest.java

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)

@flinkbot
Copy link
Collaborator

flinkbot commented Nov 24, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@kristoffSC kristoffSC force-pushed the FLINK-27246_master branch 2 times, most recently from 71ce516 to 467d4af Compare November 24, 2022 20:46
@kristoffSC kristoffSC force-pushed the FLINK-27246_master branch 2 times, most recently from bd3171a to a4f9687 Compare December 30, 2022 12:01
@kristoffSC
Copy link
Contributor Author

@flinkbot run azure

1 similar comment
@kristoffSC
Copy link
Contributor Author

@flinkbot run azure

@kristoffSC
Copy link
Contributor Author

@flinkbot run azure

@kristoffSC
Copy link
Contributor Author

@flinkbot run azure

1 similar comment
@kristoffSC
Copy link
Contributor Author

@flinkbot run azure

@kristoffSC
Copy link
Contributor Author

@flinkbot run azure

@kristoffSC
Copy link
Contributor Author

@flinkbot run azure

@kristoffSC
Copy link
Contributor Author

@flinkbot run azure

@kristoffSC
Copy link
Contributor Author

@flinkbot run azure

1 similar comment
@kristoffSC
Copy link
Contributor Author

@flinkbot run azure

@kristoffSC kristoffSC force-pushed the FLINK-27246_master branch 2 times, most recently from cbbf3d4 to 09b95bc Compare January 5, 2023 12:08
@kristoffSC
Copy link
Contributor Author

@flinkbot run azure

1 similar comment
@kristoffSC
Copy link
Contributor Author

@flinkbot run azure

@kristoffSC kristoffSC changed the title [Draft][FLINK-27246_master] - Split generated java methods - Work in progress [FLINK-27246][TableSQL/Runtime][master] - Include WHILE blocks in split generated java methods Jan 6, 2023
@kristoffSC kristoffSC marked this pull request as ready for review January 6, 2023 19:58
@kristoffSC
Copy link
Contributor Author

@flinkbot run azure

@kristoffSC
Copy link
Contributor Author

kristoffSC commented Jan 30, 2023

@tsreaper

I've replayed to/implemented changes from your last comments.

Regarding I'd like to see more examples. Could you please provide examples and explain

If we focus only on IF/ELSE statements without else if blocks which are also supported in proposed solution I think that the differences boils down to one example below.

Having:

public void myFun1(int[] a, int[] b) throws RuntimeException {
        if (a[0] == 0) {
            a[11] = b[0];
            a[12] = b[0];
            if (a[2] == 0) {
                a[21] = 1;
                a[22] = 1;
            } else {
                a[23] = b[2];
                a[24] = b[2];
            }

            a[13] = b[0];
            a[14] = b[0];
        }
}

The original `IfStatementRewritter will not extract:

            a[11] = b[0];
            a[12] = b[0];

and

            a[13] = b[0];
            a[14] = b[0];

To their's separate methods. They will be extracted together with entire TRUE branch code block. In my proposed solution, the TRUE branch will be extracted plus statements from above will be further extracted to their own methods like so:

    void myFun1_0_1(int[] a, int[] b) throws RuntimeException {
        a[11] = b[0];
        a[12] = b[0];
    }

    void myFun1_0(int[] a, int[] b) throws RuntimeException {
        a[13] = b[0];
        a[14] = b[0];
    }

The test with similar code is implemented in: BlockStatementRewriterTest::testIfMultipleSingleLineStatementRewrite

If you would have 3rd level IF/ELSE after a[21] = 1; a[22] = 1; the story will be the same. My proposition is able to extract such statements for every level.

And now you may say, that this is not a big deal. Maybe with an example I'm showing here the gain is hard to spot. However I can tell that for a production job (SQL Query), that was causing FLINK-27246, I have an extracted method that has 537 statements similar to those from example below, that I'm not sure if the original implementation could extract. Plus we have many additional methods containing 2 statements.

So even if this would be the only extra thing (except supporting else if and while) I think that at scale, the gain adds up.

Copy link
Contributor

@tsreaper tsreaper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kristoffSC for your effort in this long running pull request. I don't have much to say now. You did a great job.

One last comment: Add a test to CodeSplitITCase to address why this issue is brought up. Maciej Bryński's comment in the original JIRA ticket might be useful.

@kristoffSC
Copy link
Contributor Author

kristoffSC commented Feb 1, 2023

Hi @tsreaper
thanks for kind words. I'm really happy that we are getting closer to merge this change.

I was using Maciej Bryński's SQL example from FLINK-27246 ticket to verify if my change here solves the original problem.
I had Flink Job (TableAPI) that was running with code splitter changes we have here.

I will add it to CodeSplitITCase in following days.

@kristoffSC kristoffSC force-pushed the FLINK-27246_master branch 2 times, most recently from dd6e916 to c662b00 Compare February 3, 2023 08:25
…it generated java methods.

This PR fix issue reported in https://issues.apache.org/jira/browse/FLINK-27246 that was caused by JavaCodeSplitter not rewriting WHILE
statements which was leading to Java compiler being not able to compile code due to big methods.
The Proposed change, enhance JavaCodeSplitter by replacing IfStatementRewriter.java with BlockStatementRewriter.java. The new
BlockStatementRewriter.java can rewrite both IF/ELSE statements and WHILE blocks including combination of both and nested statements.

The BlockStatementRewriter works in two steps. First step extracts blocks from IF/ELSE branches and WHILE bodies to separate methods.
This step is executed by new class BlockStatementSplitter. The second step, groups statements (single lien statements, IF/ELSE and WHILE
statements) into groups and extract them to new methods. This step is executed by new class BlockStatementGrouper.

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
@kristoffSC
Copy link
Contributor Author

Hi @tsreaper
I've implemented test you requested for. It's 'CodeSplitITCase ::testManyAggregationsWithGroupBy'

CI/CD is green.

Anything else you would like me to do? If not could you merge this PR?

P.S.
After merging this PR I would like to create a back port PRs to 1.15 and 1.16. Our client is on 1.14 which is not supported anymore and I'm not sure to which version our client will be upgrading to.

@tsreaper tsreaper merged commit af9a112 into apache:master Feb 6, 2023
@tsreaper
Copy link
Contributor

tsreaper commented Feb 6, 2023

After merging this PR I would like to create a back port PRs to 1.15 and 1.16.

You're welcome to create back ports. As 1.16.0 has just released for one month or two I guess 1.16.1 will come in the near future. However 1.15 is an older version and I can't determine when will we release the next version (or if the next version will ever come).

@snuyanzin
Copy link
Contributor

it looks like it generates methods with same signature for some cases e.g. org.apache.flink.table.planner.runtime.stream.sql.MatchRecognizeITCase#testUserDefinedFunctions
and it leads to debug output for these tests

@kristoffSC
Copy link
Contributor Author

kristoffSC commented Feb 6, 2023

@snuyanzin

it looks like it generates methods with same signature for some cases e.g.

I will take a look at this one.

This was not causing CI build to fail?
Im guessing not, since in case of compilation error on rewritten code, planner is using original the one.. hm.

@snuyanzin
Copy link
Contributor

snuyanzin commented Feb 6, 2023

somehow build doesn't fail
however lots of output in tests with generated code
The output happens here

@snuyanzin
Copy link
Contributor

btw, i created an issue for that
https://issues.apache.org/jira/browse/FLINK-30927

@kristoffSC
Copy link
Contributor Author

I already have fix, will provide PR shortly.

kristoffSC added a commit to kristoffSC/flink that referenced this pull request Feb 6, 2023
…0927.

Rewritten code after code splitting could result in more than one method with the same signature and return type.
The issue happens whenever split method had more than one IF/ELSE/WHILE statement in series (not tested).
Issue was caused by bug in apache#21393.

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
kristoffSC added a commit to kristoffSC/flink that referenced this pull request Feb 6, 2023
…0927 - rewritten conde failed at compilation with "non-abstract methods have the same parameter types, declaring type and return type"

Rewritten code after code splitting could result in more than one method with the same signature and return type.
The issue happens whenever split method had more than one IF/ELSE/WHILE statement in series (not tested).
Issue was caused by bug in apache#21393.

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
kristoffSC added a commit to kristoffSC/flink that referenced this pull request Feb 6, 2023
…0927 - rewritten conde failed at compilation with "non-abstract methods have the same parameter types, declaring type and return type"

Rewritten code after code splitting could result in more than one method with the same signature and return type.
The issue happens whenever split method had more than one IF/ELSE/WHILE statement in series (not tested).
Issue was caused by bug in apache#21393.

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
kristoffSC added a commit to kristoffSC/flink that referenced this pull request Feb 7, 2023
…it generated java methods.

This PR fix issue reported in https://issues.apache.org/jira/browse/FLINK-27246 that was caused by JavaCodeSplitter not rewriting WHILE
statements which was leading to Java compiler being not able to compile code due to big methods.
The Proposed change, enhance JavaCodeSplitter by replacing IfStatementRewriter.java with BlockStatementRewriter.java. The new
BlockStatementRewriter.java can rewrite both IF/ELSE statements and WHILE blocks including combination of both and nested statements.

The BlockStatementRewriter works in two steps. First step extracts blocks from IF/ELSE branches and WHILE bodies to separate methods.
This step is executed by new class BlockStatementSplitter. The second step, groups statements (single lien statements, IF/ELSE and WHILE
statements) into groups and extract them to new methods. This step is executed by new class BlockStatementGrouper.

[FLINK-30927][TableSQL/Runtime][master][bugfix] - Bug fix for FLINK-30927 - rewritten code failed at compilation with "non-abstract methods have the same parameter types, declaring type and return type"

Rewritten code after code splitting could result in more than one method with the same signature and return type.
The issue happens whenever split method had more than one IF/ELSE/WHILE statement in series (not tested).
Issue was caused by bug in apache#21393.

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
akkinenivijay pushed a commit to krisnaru/flink that referenced this pull request Feb 11, 2023
mohsenrezaeithe pushed a commit to mohsenrezaeithe/flink that referenced this pull request Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants