Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-33792] Generate the same code for the same logic #23984

Closed
wants to merge 10 commits into from

Conversation

zoudan
Copy link
Contributor

@zoudan zoudan commented Dec 22, 2023

What is the purpose of the change

This pull request ensure that we generate the same code for the same logic, it is a precondition for sharing generated classes between different jobs.

Brief change log

  • add a name counter in each CodeGeneratorContext and use it when we generate names for variables.

Verifying this change

Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing

This change added tests and can be verified as follows:

  • Add a new test: CodeGenUtilsTest#testNewName
  • Add tests in existing class: HashCodeGeneratorTest#testHashWithIndependentNameCounter and ProjectionCodeGeneratorTest#testHashWithIndependentNameCounter

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (docs)

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 22, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Member

@libenchao libenchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zoudan Thanks for the PR, it looks great. My main point is that we don't need to add a config for it, it can be done by default.

@@ -233,6 +233,16 @@ private TableConfigOptions() {}
.withDescription(
"Specifies a threshold where class members of generated code will be grouped into arrays by types.");

@Documentation.TableOption(execMode = Documentation.ExecMode.BATCH_STREAMING)
public static final ConfigOption<Boolean> INDEPENDENT_NAME_COUNTER_ENABLED =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this config, I think it's ok to change the behavior by default, it's no meaning to expose it to users.

@@ -80,23 +80,25 @@ object HashCodeGenerator {
}
""".stripMargin

System.out.println(code)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove it.

@zoudan
Copy link
Contributor Author

zoudan commented Dec 25, 2023

@libenchao Thanks for reviewing, I have remove the configuration, please take a look when you have time.

@libenchao
Copy link
Member

@zoudan The PR looks good to me, except that the CI is failing, could you fix that?

@zoudan
Copy link
Contributor Author

zoudan commented Dec 28, 2023

@flinkbot run azure

1 similar comment
@zoudan
Copy link
Contributor Author

zoudan commented Dec 29, 2023

@flinkbot run azure

@zoudan
Copy link
Contributor Author

zoudan commented Dec 29, 2023

@libenchao CI is passed, please take a look when you have time.

Copy link
Member

@libenchao libenchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zoudan Thanks for the updating, using 'parent context' is a great idea. I'll left a few minor comments, others looks good to me.

@@ -124,14 +124,27 @@ object CodeGenUtils {

private val nameCounter = new AtomicLong

def newName(name: String): String = {
s"$name$$${nameCounter.getAndIncrement}"
def newName(context: CodeGeneratorContext = null, name: String): String = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one has a default value for context, but newNames does not have, is there any rational behind this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter section with *-parameter is not allowed to have default arguments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that do you really need to have a default value, IIUC, we want all places to pass context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it, removed

require(names.toSet.size == names.length, "Duplicated names")
val newId = nameCounter.getAndIncrement
names.map(name => s"$name$$$newId")
if (context == null || context.getNameCounter == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering that if newNames can be optimized to something like names.map(name => newName(context, name))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The returned new names use the same index, I keep it the same as before. So, I did not call newName

Copy link
Contributor

@lsyldliu lsyldliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zoudan Thanks for contribution, I will take a look in the next few days.

@zoudan
Copy link
Contributor Author

zoudan commented Jan 2, 2024

@flinkbot run azure

1 similar comment
@zoudan
Copy link
Contributor Author

zoudan commented Jan 3, 2024

@flinkbot run azure

Copy link
Member

@libenchao libenchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from my side

Copy link
Contributor

@lsyldliu lsyldliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zoudan Thanks for contribution, LGTM overall, I just left some minor comments.

@@ -87,10 +90,11 @@ object LongHashJoinGenerator {
def genProjection(
tableConfig: ReadableConfig,
classLoader: ClassLoader,
types: Array[LogicalType]): GeneratedProjection = {
types: Array[LogicalType],
parentCtx: CodeGeneratorContext): GeneratedProjection = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify the method signature to

  def genProjection(
      types: Array[LogicalType],
      parentCtx: CodeGeneratorContext): GeneratedProjection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I do not get your point, I think tableConfig and classLoader are needed in this method. And I only add the parameter parentCtx without change anything else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean that we can get the tableConfig and classLoader from parentCtx, but this isn't an important point, ignore it now.

names.map(name => s"$name$$$newId")
if (context == null || context.getNameCounter == null) {
val newId = nameCounter.getAndIncrement
// Add an 'i' in the middle to distinguish from nameCounter in CodeGeneratorContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why here will have conflicts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be some scenarios where it is not convenient for us to obtain class level CodeGeneratorContext, and we could use the nameCounter in CodeGenUtils to generate new names. In these cases we may use nameCounter from CodeGenUtils and CodeGeneratorContext in the same class.

@Override
@Nullable
public CodeGeneratorContext getCodeGeneratorContext() {
return null;
Copy link
Contributor

@lsyldliu lsyldliu Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some annotation to explain why here returns null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change it to non null

@zoudan
Copy link
Contributor Author

zoudan commented Jan 9, 2024

@flinkbot run azure

4 similar comments
@zoudan
Copy link
Contributor Author

zoudan commented Jan 9, 2024

@flinkbot run azure

@zoudan
Copy link
Contributor Author

zoudan commented Jan 9, 2024

@flinkbot run azure

@zoudan
Copy link
Contributor Author

zoudan commented Jan 10, 2024

@flinkbot run azure

@zoudan
Copy link
Contributor Author

zoudan commented Jan 11, 2024

@flinkbot run azure

@zoudan
Copy link
Contributor Author

zoudan commented Jan 11, 2024

@flinkbot run azure

@zoudan
Copy link
Contributor Author

zoudan commented Jan 12, 2024

@flinkbot run azure

@zoudan
Copy link
Contributor Author

zoudan commented Jan 12, 2024

@lsyldliu I have updated my code, please have a took when you have time.

Copy link
Contributor

@lsyldliu lsyldliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants