-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18215. Enhance WritableName to be able to return aliases for classes that use serializers #4215
Conversation
💔 -1 overall
This message was automatically generated. |
eee1ea2
to
537dae4
Compare
💔 -1 overall
This message was automatically generated. |
537dae4
to
796d901
Compare
💔 -1 overall
This message was automatically generated. |
17aa709
to
52fc693
Compare
💔 -1 overall
This message was automatically generated. |
…lasses that use serializers
52fc693
to
adf3de0
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The production code change looks okay. However, the added unit tests don't fail if I take out the production code change.
@jojochuang thank you very much for taking a look. Are you sure you tested correctly? Adding back the
|
@jojochuang thabka again for looking at this. Do you think you could merge given the above test results? |
@jojochuang Any chance we can merge this? |
@jojochuang any chance we can merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the test. I tried it locally and it does reproduces the issue for me:
java.lang.ClassCastException: class org.apache.hadoop.io.TestSequenceFile$AnotherSimpleSerializable
at java.lang.Class.asSubclass(Class.java:3404)
at org.apache.hadoop.io.WritableName.getClass(WritableName.java:95)
at org.apache.hadoop.io.SequenceFile$Reader.getKeyClass(SequenceFile.java:2199)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:2133)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1982)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1931)
at org.apache.hadoop.io.TestSequenceFile.testSerializationUsingWritableNameAlias(TestSequenceFile.java:795)
Future suggestion: In case any PR gets stuck post getting a review for almost a year. And you see the person active. Good to drop him/her a mail reminding him/her. May be he is just missing your pings
Moreover Slack has limited audience, the dev lists have more, so good to give it a try if none of the options work
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/WritableName.java
Show resolved
Hide resolved
Thanks for looking and for the suggestions! Just wanted to ask one question about your review comment before I make the change. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general sounds fine to me. Do we need an extra validation that the class implements Serialization
?
The SequenceFile
code is pretty old, @slfan1989 / @tasanuma can you folks have a look as well, in case you find any reasons, why we shouldn't do this
Just want to clarify the use-case here (as a reminder and for the new reviewers):
This all worked when all key/value extend Writable, but Hadoop also supports Serialization framework. You can specify So when the same "old sequence file contains a key or value class that has been renamed" problem happens, if you are using Serialization you are out of luck. By default you'd get a ClassNotFoundException, and if you tried doing WritableName.addName, you'd get ClassCastException. The simplest fix for that seemed to be in WritableName which appeared to be IA.Private and have no real usages in the repo outside of SequenceFile. A one line change there was attractive. The risk here seems pretty low, at least for how SequenceFile uses this class. If we have concerns here, there are other possible more involved solutions we could discuss. For example, we could add something in SerializationFactory to add aliases. This would be more involved though because it'd require a slight refactor in SequenceFile and we'd have to make sure that new API worked for any other usages of SerializationFactory. That's why I chose the simple 1 liner approach, since it solves the problem with simplicity and minimal external impact. |
So these serializable classes don't need to implement Serializable interface. The only real check we could do is whether |
To me it sounds ok, regarding the one liner, if that is a strickt ask and if other people are convinced I am ok with that as well. But it isn't something where I am the only one approving it. I will play it safe, Utility classes does get used a lot, Even FileUtils, Utils are all annotated Evolving, but if you break it, you will break a lot of stuff. Something which is there more than 10 years.... The codes being impacted are around 2008-09, I am not sure who has direct involvement during development, like why it was done like that or so. @steveloughran / @goiri / @vinayakumarb anyone any context or feedback around this? |
Sounds good, thanks. The one liner is not strict, I was just giving the context so everyone understood the use-case/reasons. I'm happy to merge as is if you'd like, but we can also wait a few days to see if any of the people you pinged chime in. |
blocker with anything related to writable &c are "do you break any existing serializations, subclasses etc". we know that parquet and avro do that and we don't want to maintain the tradition. @omalley is the person to review the code |
public static synchronized Class<?> getClass(String name, Configuration conf | ||
) throws IOException { | ||
public static synchronized Class<?> getClass(String name, Configuration conf, | ||
boolean requireWritable) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd propose that we unconditionally remove the check for Writable in getClass. My thought is:
- Users can always enforce the constraint later if they want to.
- All uses of the method with Hadoop's code base don't want to limit the output.
- The check isn't consistent. (It is only applied for aliases, not natural class names.)
- Removing the check won't break any callers since they couldn't get non-Writables before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks for looking!
@ayushtkn if you're ok with that I can just revert my last commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go ahead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reverts commit 33d6d4c.
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM
Thanks everyone :) |
Is it possible to backport to branch-3 and/or branch-3.3? Not sure how that works in the hadoop project. I can submit follow-up PRs, but I imagine this will apply cleanly with cherry-pick. |
…lasses that use serializers (#4215)
I think branch-3 is mostly abandoned. I committed it to branch-3.3. It can be committed in my opinion to branch-3.3.5, but I'll leave that call up to the release manager. |
Thanks Steve and Owen for chimming in and helping over here. Thanx Bryan for the contribution, One year!!! |
…lasses that use serializers (apache#4215)
Description of PR
Very simple one-line change that allows users to optionally call addName for key/value classes that may not inherit Writable (instead relying on serialization).
How was this patch tested?
Added two tests:
This patch has also been deployed in our environment for 2 months.