Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-6864] Fix confusing "invalid POJO type" messages from TypeExtractor #4574

Closed
wants to merge 2 commits into from

Conversation

FangYongs
Copy link
Contributor

What is the purpose of the change

Fix confusing "invalid POJO type" messages from TypeExtractor

Brief change log

  • Improve log about pojo in TypeExtractor
  • Add notice in types_serialization.cmd about rules-for-pojo-types

Verifying this change

This change needs no testing

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@FangYongs
Copy link
Contributor Author

@tzulitai I create this PR to fix https://issues.apache.org/jira/browse/FLINK-6864, I think add logs instead of removing them will be better, what do you think? Thanks

@ggevay
Copy link
Contributor

ggevay commented Aug 23, 2017

I agree with @zjureel . Even though I know the POJO rules well, it still happened to me more than once that I accidentally had a non-POJO, and I only noticed this because of these log messages, so I think it would be problematic to completely remove them.

@greghogan
Copy link
Contributor

I also agree that these messages have been personally useful. I think this PR is an improvement but that we could do better to note and/or emphasize the performance implications of GenericType. I'm not sure that we even need to mention GenericType explicitly. I can see how the current message could confuse users but explaining this as having performance implications should clarify the reasoning for this being logged.

@FangYongs
Copy link
Contributor Author

@ggevay @greghogan Thank you for your suggestions, I tried to add some notice about performance loss in the log, what do you think? Thanks

@@ -115,6 +115,8 @@ conditions are fulfilled:
or have a public getter- and a setter- method that follows the Java beans
naming conventions for getters and setters.

Note that when a data type can't be recognized as a POJO type, it will be handled as GenericType.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... it will be processed as a GenericType and serialized with Kryo"? Is it generally understood that use of Kryo can have a significant impact on performance? It's not just serialization which is slower. It looks like if the class does not implement NormalizedKey then objects are deserialized for each comparison.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to reply late, and thank you for your suggestions @greghogan
In fact, I have some doubts about what information to be provided to the users when the data can't be recognized as POJO type. Do you think data type and serialization are sufficient here? thanks

@@ -1900,7 +1900,8 @@ private boolean isValidPojoField(Field f, Class<?> clazz, ArrayList<Type> typeHi
ParameterizedType parameterizedType, TypeInformation<IN1> in1Type, TypeInformation<IN2> in2Type) {

if (!Modifier.isPublic(clazz.getModifiers())) {
LOG.info("Class " + clazz.getName() + " is not public, cannot treat it as a POJO type. Will be handled as GenericType");
LOG.info("Class " + clazz.getName() + " is not public, cannot treat it as a POJO type. " +
"Will be handled as GenericType, and may lose some serialization performance.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we said something like "... is not public and cannot be processed as a POJO type. Please read the Flink documentation on "Data Types & Serialization" for details on the impact to performance."

@greghogan
Copy link
Contributor

@zjureel am merging this ... thanks for the PR and edits!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants