Conversation
/** | ||
* Reset the internal state | ||
*/ | ||
public void reset() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a comment for what this and the two methods below are for and why we don't have to implement them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the comments.
/** ID to class name cache */ | ||
private static Map<Integer, String> ID_TO_CLASS_NAME = new HashMap(); | ||
/** Zookeeper */ | ||
private static ZooKeeperExt ZK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's generally not a nice design to have these static, but not sure if because of some details in kryo there is no other way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without static, we have to pass this information as a parameter. But there are so many places where the kryo objects are created, so it would not be pretty to delegate this information from BspService to all these places.
*/ | ||
public class GiraphClassResolver extends DefaultClassResolver { | ||
/** Length of the ZK sequence number */ | ||
private static final int SEQUENCE_NUMBER_LENGTH = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this number coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is coming from ZooKeeperExt, but was a private field. I changed it to public.
} | ||
|
||
for (String name : registeredList) { | ||
String className = name.substring(0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So when we create PERSISTENT_SEQUENTIAL node, zookeeper adds the sequential number to its file name? Maybe just add a comment stating that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* This number has to be high enough to not conflict with | ||
* explicity registered class IDs. | ||
* */ | ||
private static final int BASE_CLASS_ID = 10000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is having this number high potentially increasing the serialized size of our data? Eg if it was low we could use a single byte to write class. Any way to check what is the highest number registered manually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The highest I observed was ~250. So 1000 would be more than enough.
} | ||
|
||
@Override | ||
public Registration register(Registration registration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this method differ from implementation in DefaultClassResolver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default implementation also handles the case when the class is non-explicitly registered, i.e. it is not assigned to an an integer. Since in this implementation all classes are assigned to integers agreed by all kryo instances, they all treated like explicitly assigned classes. This implementation throws an exception if it encounters a class that has not been assigned to an integer yet.
if (registration.getId() == NAME) { | ||
throw new IllegalStateException("Invalid registration ID"); | ||
} else { | ||
output.writeVarInt(registration.getId() + 2, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the offset 2 for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class ID's are incremented by 2 when writing, because 0 is used for null, and 1 is used for non-explicitly registered classes, though non-explicit class registration is an error for this implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this as a comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
private static int MIN_CLASS_ID = -1; | ||
|
||
/** Memoized class id*/ | ||
private int memoizedClassId = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this to speed up the case when single class is used very frequently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
* This class resolver assigns unique classIds for every class that was not | ||
* explicitly registered. It uses zookeeper for consistent mapping across all | ||
* nodes. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some more documentation the describes how this mechanism works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more documentation.
*/ | ||
public static void refreshCache() { | ||
try { | ||
ZK.createOnceExt(KRYO_REGISTERED_CLASS_PATH, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can avoid calling this every time you refresh the cache and, instead, call it once during initialization (e.g. in setZookeeperInfo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a boolean flag to avoid this repeated call.
"Interrupted while retrieving child nodes for " + | ||
KRYO_REGISTERED_CLASS_PATH, e); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some debug logging here that prints the returned list? With the usual pattern:
if (LOG.isDebugEnabled()) {
LOG.debug("...");
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
*/ | ||
public class GiraphClassResolver extends DefaultClassResolver { | ||
/** Base ID to start for class name assignments. | ||
* This number has to be high enough to not conflict with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a conflict something we can detect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could if this information was not kept private in the base class. The application will certainly fail if there is a conflict.
The custom class resolver allows kryo to always use the integer IDs for classes that are agreed by all nodes running the job. Default class resolver always writes the full class name of the first encountered class type to the stream, and then it assigns an integer for subsequent instances. These changes make the serialization faster by eliminating the need to write the full name for the first encountered class instance. Tests: |
No description provided.