-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27276 Reduce reflection overhead in Filter deserialization #5488
Conversation
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Test failures are unrelated. |
hbase-common/src/main/java/org/apache/hadoop/hbase/util/ReflectedFunctionCache.java
Outdated
Show resolved
Hide resolved
private static <R> Set<Class<? extends R>> getSubclassesInPackage(ClassLoader classLoader, | ||
Class<R> baseClass) { | ||
try { | ||
return ClassPath.from(classLoader).getAllClasses().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this cause we load all classes even if it is not used by now? I used to use guava's ClassPath in a project but it performed differently when executing in IDE and in command line, finally I chose to use ClassPathScanningCandidateComponentProvider in spring for scanning classes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will cause us to load all of the matching ones, i.e. the ones i call load
on below after filtering to the correct package. So it will load all of the filters in org.apache.hadoop.hbase.filter
on startup.
I thought this was preferable because the number of classes is not large. Since I do it on startup, I don't need to worry about synchronization. I could only populate the cache as Filters are accessed, but then I need to handle synchronization. Would you prefer that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code for ClassPath#getAllClasses
, it seems that it will build an exhaustive set of all classes on the classpath that are loadable. It doesn't actually load the classes. My understanding is that Stream
operations are lazy, so the only classes loaded should be those that materialize in the final collect
.
hbase-common/src/main/java/org/apache/hadoop/hbase/util/ReflectionUtils.java
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestComparatorSerialization.java
Outdated
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestComparatorSerialization.java
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice feature. Let's see how it does in practice.
hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
Show resolved
Hide resolved
private static <R> Set<Class<? extends R>> getSubclassesInPackage(ClassLoader classLoader, | ||
Class<R> baseClass) { | ||
try { | ||
return ClassPath.from(classLoader).getAllClasses().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code for ClassPath#getAllClasses
, it seems that it will build an exhaustive set of all classes on the classpath that are loadable. It doesn't actually load the classes. My understanding is that Stream
operations are lazy, so the only classes loaded should be those that materialize in the final collect
.
// optimizing dynamically loaded classes. We can do it once we build for java9+, see the todo | ||
// in ReflectedFunctionCache | ||
private static final ReflectedFunctionCache<byte[], Filter> FILTERS = ReflectedFunctionCache | ||
.create(ProtobufUtil.class.getClassLoader(), Filter.class, byte[].class, PARSE_FROM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the content of the directory specified in hbase.dynamic.jars.dir
included in the classpath that is under the domain of this classloader? I think that if there are user-provided Filter classes in the path, we should load them. I guess that we cannot assume that they will be in the o.a.h.h.filter
package, so we'd have to relax our class selection criteria.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct we'd need to expand our package search, which may lead to increased start times.
I've had this branch hanging around for a long time, I'd like to get this shipped and then we can tackle custom filters in a follow up when I or someone has time
hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
Show resolved
Hide resolved
I deployed this to one of our prod servers, which is an extreme case that typically spends about 12% of time deserializing Filters. With the patch, it spends less than 5% time now. I also instrumented the initialization a bit:
So it's about 5ms to create one of these, but the ClassPath.from() call is quite expensive and will depend on the size of the classpath (could be worse on clients that do other things). I'm going to take a look at lazy loading these on demand. Barring that, I'll at the very least re-use one ClassPath object. |
I just pushed a commit which populates the cache on-demand, negating the need for any classpath traversal. I used our ConcurrentMapUtils.computeIfAbsent, which does a get followed by putIfAbsent. This means that under high concurrency it'd be possible to generate the function more than once, but this is ok -- it'll be GC'd. I added some timings to the code, logged with debug logging. I ran this in prod, and creating the functions only takes 0-1ms in this context. I think it's faster than the old approach because we don't actually have to load the class (pass So now we have no startup time problem, and extremely small upfront cost for the first time a filter/comparator is loaded. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
I think this is ready for merge, if anyone has any other feedback since the move to on-demand loading. Note I decided to use a ConcurrentHashMap rather than LoadingCache or Caffeine Cache, because we don't need any of the extra features of those. The total number of cached items is small and constant, with no eviction or expiration. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Duo Zhang <zhangduo@apache.org>
…serialization (apache#5488) Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Duo Zhang <zhangduo@apache.org>
Uses LambdaMetafactory to create fast functions for reflectively parsing filters and comparators. These are cached for the lifetime of the process.
On startup, a cache will be populated for all of the subclasses of Filter and Comparator in the same package. This covers all of our built-ins. For now, custom filters/comparators are not covered unless the user places them in the same package and are available at startup.
With java9+ we could support updating the cache over time from the dynamic classloader. We could possibly work around that with some language reflection now too, but this patch as-is is a huge optimization. So would rather tackle that in a follow-up.
This is covered by existing unit tests, but I updated them to specifically ensure that the new functionality is working. I also added unit tests for custom filters/comparators to verify the expected behaviour wrt this feature.
See benchmarks on the issue: https://issues.apache.org/jira/browse/HBASE-27276