Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-11830][table-planner-blink] Introduce CodeGeneratorContext to maintain reusable statements #7906

Closed
wants to merge 5 commits into from

Conversation

wuchong
Copy link
Member

@wuchong wuchong commented Mar 6, 2019

What is the purpose of the change

Introduce CodeGeneratorContext to maintain reusable statements for code generation.

The CodeGeneratorContext will keep all the reusable statements which will be the basic class for code generation. In the future, we will introduce FunctionCodeGeneration, AggregateCodeGeneration, etc... and they will depend on the CodeGeneratorContext to store reusable statements.

Brief change log

  • Introduce CodeGeneratorContext.
  • Introduce CompileUtils in table-runtime-blink to support compile code in runtime.
  • Introduce GeneratedClass in table-runtime-blink to wrap generated code and class name to support easy instantiation.
  • Copy TableConfig to table-planner-blink because CodeGeneratorContext depends on it.
  • Introduce CodeGenUtils to package utilities for codegen (will add various generateXXX in FLINK-11788).
  • Introduce TypeCheckUtils to check types.

Verifying this change

The CodeGeneratorContext will be covered in later FLINK-11788.

Added a CompileUtilsTest to test compile classes with cache.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@flinkbot
Copy link
Collaborator

flinkbot commented Mar 6, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

  • ✅ 1. The [description] looks good.
  • ✅ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ✅ 4. The change fits into the overall [architecture].
  • ✅ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the effort, i have some comments

case _ => false
}

def needCloneRefForType(t: InternalType): Boolean = t match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is useless.

case InternalTypes.TIMESTAMP => boxedTypeTermForType(InternalTypes.LONG)

case InternalTypes.STRING => BINARY_STRING
case InternalTypes.BINARY => "byte[]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InternalTypes.BINARY is ArrayType, remove it, let ArrayType do.


case _: RowType => classOf[BaseRow].getCanonicalName
case _: DecimalType => throw new UnsupportedOperationException
case _: ArrayType => throw new UnsupportedOperationException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayType and MapType have been supported.

def boxedTypeTermForExternalType(t: TypeInformation[_]): String = t match {
// From PrimitiveArrayTypeInfo we would get class "int[]", scala reflections
// does not seem to like this, so we manually give the correct type here.
case INT_PRIMITIVE_ARRAY_TYPE_INFO => "int[]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try getTypeClass().getCanonicalName(), it will output byte[], seem it work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can use getTypeClass().getCanonicalName() instead.
Not sure why hardcode "int[]" in the original code.

Thread.currentThread().getContextClassLoader)
references += objCopy

val clsName = Option(className).getOrElse(obj.getClass.getName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getCanonicalName is better?

val byteArray = InstantiationUtil.serializeObject(obj)
val objCopy: AnyRef = InstantiationUtil.deserializeObject(
byteArray,
Thread.currentThread().getContextClassLoader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obj.getClass.getClassLoader?

private def addReusableObject_(obj: AnyRef, fieldTerm: String, fieldTypeTerm: String): Unit = {
val idx = references.length
// make a deep copy of the object
val byteArray = InstantiationUtil.serializeObject(obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge code with addReferenceObj(obj: AnyRef, className: String = null)?

try {
compiler.cook(code);
} catch (Throwable t) {
// TODO: println pretty code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think println pretty code is a good idea, because it doesn't match the number of lines compiled incorrectly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I will remove the todo and consider it later if we need.

Copy link
Contributor

@KurtYoung KurtYoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most changes looks good to me, however i think there are some very error prone codes, and i opened some follow up issues to track these

* The context for code generator, maintaining various reusable statements that could be insert
* into different code sections in the final generated class.
*/
class CodeGeneratorContext(val tableConfig: TableConfig) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I create a follow up issue: FLINK-11831 to further separate the reused members for different codegen purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a child issue of "Finalize the Blink SQL merging efforts", we can do it in the last steps

*/
def className[T](implicit m: Manifest[T]): String = m.runtimeClass.getCanonicalName

def needCopyForType(t: InternalType): Boolean = t match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions like this should better be a method of InternalType, otherwise there is no guarantee when someone add a new InternalType, he will check this functionality.
I will also open a follow up issue for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


def isMap(dataType: InternalType): Boolean = dataType.isInstanceOf[MapType]

def isComparable(dataType: InternalType): Boolean =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also very error prone

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@wuchong
Copy link
Member Author

wuchong commented Mar 6, 2019

Hi @JingsongLi @KurtYoung , thanks for reviewing.

I agree with your ideas and addressed the comments.

@KurtYoung
Copy link
Contributor

+1 to merge

@KurtYoung
Copy link
Contributor

@flinkbot approve all

@wuchong
Copy link
Member Author

wuchong commented Mar 6, 2019

Merging...

@asfgit asfgit closed this in ec3b36c Mar 6, 2019
HuangZhenQiu pushed a commit to HuangZhenQiu/flink that referenced this pull request Apr 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants