-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27313: Persist list of Hfiles names for which prefetch is done #4726
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good, just have some nit comments. There seems to be some UT failures, that requires addressing before this is ready to merge.
@@ -44,12 +50,15 @@ public final class PrefetchExecutor { | |||
|
|||
/** Futures for tracking block prefetch activity */ | |||
private static final Map<Path, Future<?>> prefetchFutures = new ConcurrentSkipListMap<>(); | |||
/** Set of files for which prefetch is completed */ | |||
public static HashMap<String, Boolean> prefetchCompleted = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we really need a map here? It seems we never check the boolean value, but only if the map has an entry for the path of not. Could we use hashset or a list impl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review Wellington :)
Yes, my initial approach was to use a set. But I could find a protobuf support for HashSet.
Also, I wanted to maintain a unique set of filenames so I could avoid doing a duplicate check. So I chose a map implementation.
https://developers.google.com/protocol-buffers/docs/reference/java-generated#map-fields
@@ -79,6 +88,12 @@ public Thread newThread(Runnable r) { | |||
+ HConstants.HREGION_COMPACTIONDIR_NAME.replace(".", "\\.") + Path.SEPARATOR_CHAR + ")"); | |||
|
|||
public static void request(Path path, Runnable runnable) { | |||
if(!prefetchCompleted.isEmpty()){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: No need for this extra check, as prefetchCompleted.containsKey
would return false anyways if the map is empty.
@@ -93,6 +93,8 @@ public class CacheConfig { | |||
public static final String DROP_BEHIND_CACHE_COMPACTION_KEY = | |||
"hbase.hfile.drop.behind.compaction"; | |||
|
|||
public static String PREFETCH_PERSISTENCE_PATH_KEY = "hbase.prefetch.persist.filepath"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: So this is basically a path for a list of prefetched files. Can we call it hbase.prefetch.file-list.path
? That would be more intuitive, especially when we start to use it in BucketCache class, which already has a persistencePath
variable. We could then relabel BuckeCache.prefetchPersistencePath
to something like BuckeCache.prefetchedFileListPath
.
@@ -239,6 +242,8 @@ public class BucketCache implements BlockCache, HeapSize { | |||
/** In-memory bucket size */ | |||
private float memoryFactor; | |||
|
|||
private String prefetchPersistencePath; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just need addressing the checkstyle issues.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest changes LGTM. Let's see if we can get a green run.
🎊 +1 overall
This message was automatically generated. |
Latest UT failures are unrelated, have tested the PR locally and have these two passing. |
@@ -93,6 +93,8 @@ public class CacheConfig { | |||
public static final String DROP_BEHIND_CACHE_COMPACTION_KEY = | |||
"hbase.hfile.drop.behind.compaction"; | |||
|
|||
public static final String PREFETCH_PERSISTENCE_PATH_KEY = "hbase.prefetch.file-list.path"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] it may be too late but we should not use -
, "hbase.prefetch.file.list.path"
Jira link: https://issues.apache.org/jira/browse/HBASE-27313
Summary:
Persist list of Hfiles names for which prefetch is completed when persist for in-memory bucket cache mapping state is enabled
We need to persist the Set, which contains a list of all the HFiles for which the prefetch has completed along with the In-memory state of the bucket cache so that when we restart and if any prefetch is scheduled again for the same HFile and we already have the in-memory state retrieved, we can skip the prefetch for it.