Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-35406] Use inner serializer when casting RAW type to BINARY or… #24818

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

docete
Copy link
Contributor

@docete docete commented May 21, 2024

… STRING in cast rules

What is the purpose of the change

This pull request fix the wrong behaviour in casting RAW to BINARY or STRING.
The generated code in RawToStringCastRule and RawToBinaryCastRule use
BinaryRawValueData::toBytes and BinaryRawValueData::toObject to convert
RawValueData(to java object or byte array), which should use inner serializer
instead of RawValueDataSerializer.

Brief change log

  • 049fbd2 fix the wrong behaviour and add tests.

Verifying this change

This change is convered by new test cases in CastFunctionMiscITCase and CastFunctionMiscLegacyITCase

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented May 21, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a nasty bug. Thanks for fixing it @docete. I just had a tiny code improvement suggestion. Otherwise LGTM.

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final question before merging.

@@ -62,7 +62,7 @@ public boolean canFail(LogicalType inputLogicalType, LogicalType targetLogicalTy
// new behavior
isNull$290 = isNull$289;
if (!isNull$290) {
byte[] deserializedByteArray$76 = result$289.toBytes(typeSerializer$292);
byte[] deserializedByteArray$76 = result$289.toBytes(typeSerializer$292.getInnerSerializer());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we actually need the call to getInnerSerializer() during runtime shouldn't we simply use this serializer as the typeSerializer$292 for code generation? Or does the implementation not allow this?

Copy link
Contributor Author

@docete docete May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we actually need the call to getInnerSerializer() during runtime shouldn't we simply use this serializer as the typeSerializer$292 for code generation? Or does the implementation not allow this?

I think the RAW type should bind with RawValueDataSerializer, and use getInnerSerializer() in se/de phase. Follow the same pattern also make the Generated code more clear. See: AbstractBinaryWriter::writeRawValue

    public void writeRawValue(
            int pos, RawValueData<?> input, RawValueDataSerializer<?> serializer) {
        TypeSerializer innerSerializer = serializer.getInnerSerializer();
        // RawValueData only has one implementation which is BinaryRawValueData
        BinaryRawValueData rawValue = (BinaryRawValueData) input;
        rawValue.ensureMaterialized(innerSerializer);
        writeSegmentsToVarLenPart(
                pos, rawValue.getSegments(), rawValue.getOffset(), rawValue.getSizeInBytes());
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, modify the code generator make things more complicated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants