-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making optimal usage of multiple segment cache locations #8038
Conversation
…egments to multiple segment cache locations
to me choosing the segment cache location with the max free size instead of round robin makes more sense. otherwise we can make the segment cache location selection strategy configurable and default to max free available. |
@@ -102,6 +105,8 @@ public SegmentLoaderLocalCacheManager( | |||
); | |||
} | |||
locations.sort(COMPARATOR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we are already trying to sort via the available free size,
The issue seems like the order is not updated after a segment is loaded.
what do you think about sorting the locations after a segment has been loaded ?
I think that would probably fix the issue in #7641 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nishantmonu51 , This probably makes sense. However, one case is when the segment cache location max sizes are skewed (one or few locations with way more availability than others). The sort strategy resorts to selecting the same location again and again until it's availability falls short of others. This will end up having more or less the same behaviour reported in #7641. Round-robin on the other hand will try to distribute the segments across multiple locations there by improving I/O if the locations are backed by different physical drives. However I'm not sure whether the round-robin strategy has any implications on query performance. Let me know your thoughts.
@dclim and others, let us know your thoughts.
I like the idea of making the segment cache location selection strategy configurable. |
Ah interesting - I thought I remembered the behavior used to select the least filled disk! Looks like a regression at some point. @sashidhar I do still think there's value in making the selector strategy configurable to something like round-robin for the reason you mentioned. An example - I was setting up a Druid cluster that had two volumes mounted (let's say they were each 10G and called /mnt and /mnt1). I was also using /mnt for other stuff - as a general scratch drive, storing intermediate indexing files, log files, etc. so I needed to reserve some space for this - let's say I reserved 2G. I had 8G left, so I set the size of the segment cache for /mnt to 8G. Now, what do I set the size of the segment cache for /mnt1 to? If I set it to 10G to fully utilize the volume and at a point in time have less than 2G of data, it would all be on /mnt1 and potentially wouldn't be maximizing the I/O throughput available. I could instead set it to 8G to be the same as /mnt and that would evenly distribute the segments, but I'd lose those 2G unnecessarily just to coax the algorithm to utilize both locations. A round-robin strategy (or one that selects the location that has the least bytes used in absolute terms instead of relative to the capacity) would have been what I wanted. |
@dclim , @nishantmonu51 Here's what I'm thinking. As discussed, the segment cache location selector strategy should be configurable. There could be 3 possible strategies currently.
Questions:
Other things to note:
@gianm FYI. |
This sounds like a PR which needs a proposal to me. |
I think, ideally in all cases, we want to minimize |
@jihoonson , @himanshug , thanks for your inputs. Should I raise a separate proposal PR or modify this PR to make it a proposal ? |
I think this kind of issue needs a proposal before writing code so that the author can avoid unnecessary work. However, in this case, I think you don’t have to write a proposal at this moment because you already raised this PR. But still, it would be worth to get design review from 3 or more committers. I added the label. Also please update the PR description accordingly once the design issue is resolved. |
Updated the description with the proposed algorithm and the alternatives discussed. Round-robin and least-bytes-used both seem reasonable. Please review the design. |
it doesn't hurt to make the strategy configurable , however I think "Least-Bytes-Used" should be default instead of "Round-Robin" .
write happens in 1 or very few threads so write throughput is not impacted and on the contrary it improves read throughput due to similar space utilization in each location which has significantly higher concurrency. many times users add new segment locations after the node has been in use for a while and already has some data and then restart the node, with "Round Robin" newly added location will likely stay underutilized . Round-Robin wouldn't solve #7641 in that case. |
Makes sense. It seems to me that the negative case I mentioned for Least-Bytes-Used might not be much of a concern. It makes sense for the Least-Bytes-Used to be the default for the write and read throughput reasons mentioned. |
@jihoonson , @himanshug , @dclim , @nishantmonu51 have you had a chance to review this ? |
hey @sashidhar - your proposal mentioned:
But I don't see that implemented (I only see the round-robin implementation). Am I missing something? You also said:
Which I agree with. Apologies if you were waiting on further confirmation before implementing the Least-Bytes-Used strategy. Between round-robin and Least-Bytes-Used, I would be okay if you just implemented the latter, as I think it would be the right option in the majority of cases, but I would also be okay if you implemented both and had a configuration parameter to select the strategy. |
Hi David,
Sorry for the unclear wording. It should have been "This PR will
introduce...". I was waiting for the design approval before implementing
Least-Bytes-Used strategy. I would like to implement both and make the
strategy configurable default being Least-Bytes-Used. I'll
resume working on the implementation.
Thanks,
Sashi
…On Thu, Jul 25, 2019, 5:34 AM David Lim ***@***.***> wrote:
hey @sashidhar <https://github.com/sashidhar> - your proposal mentioned:
This PR introduces an optional new Historical runtime property
druid.segmentCache.locationSelectorStrategy to make the segment cache
location selection strategy configurable. Possible values for the above
property - round-robin, least-bytes-used.
But I don't see that implemented (I only see the round-robin
implementation). Am I missing something? You also said:
It makes sense for the Least-Bytes-Used to be the default for the write
and read throughput reasons mentioned.
Which I agree with. Apologies if you were waiting on further confirmation
before implementing the Least-Bytes-Used strategy. Between round-robin and
Least-Bytes-Used, I would be okay if you just implemented the latter, as I
think it would be the right option in the majority of cases, but I would
also be okay if you implemented both and had a configuration parameter to
select the strategy.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8038>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACGIELQDFBIGA3NCQLR3HLQBDUY3ANCNFSM4H6T5OQQ>
.
|
…rategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy
… least bytes used. Adding currSizeBytes() method in StorageLocation.
Implemented both the strategies and made the strategy configurable. However there is one implementation glitch due to which Here's the scenario. For example, assume strategy configured is least bytes used strategy and there are two locations - loc1 and loc2 each on different disks disk1 and disk2 respectively. loc1 has the least bytes used. The strategy picks loc1 and before |
* | ||
* @return The storage location to load the given segment into or null if no location has the capacity to store the given segment. | ||
*/ | ||
StorageLocation select(DataSegment dataSegment, String storageDirStr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented both the strategies and made the strategy configurable. However there is one implementation glitch due to which SegmentLoaderLocalCacheManagerTest.testRetrySuccessAtSecondLocation() is failing.
Here's the scenario. For example, assume strategy configured is least bytes used strategy and there are two locations - loc1 and loc2 each on different disks disk1 and disk2 respectively. loc1 has the least bytes used. The strategy picks loc1 and before SegmentLoaderLocalCacheManager loads a segment if disk1 fails or is not writable the segment loading fails. The strategy has no way (with my impl) to find if loc1 is bad, this results in the strategy picking loc1 every time failing all segment load attempts. What is a clean way to handle this ?
good catch, I think, for that reason the interface here should be something like..
Iterator<StorageLocation> getLocations(..)
so that caller can go through all of them like it does currently and caller should be responsible for calling the reserve(..)
method not the impls of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the method contract to return an iterator of StorageLocations as suggested. If the changes look good will add a few more tests.
@sashidhar thanks for the quick update! I'll finish my review once #8038 (comment) is addressed. Would you take a look? |
Addressed the comment please review. |
*/ | ||
private StorageLocation loadSegmentWithRetry(DataSegment segment, String storageDirStr) throws SegmentLoadingException | ||
{ | ||
for (StorageLocation loc : locations) { | ||
Iterator<StorageLocation> locationsIterator = strategy.getLocations(); | ||
int numLocationsToTry = this.locations.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numLocationsToTry
is not necessary now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! will fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#8038 (comment) javadocs u mean ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed numLocationsToTry and update java docs. Let me know if the description isn't clear or any change is required.
@Override | ||
public Iterator<StorageLocation> getLocations(DataSegment dataSegment, String storageDirStr) | ||
{ | ||
return cyclicIterator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, now I know what the round robin you want is. What I thought was, each caller will get an iterator with a different startIndex
which is changed in a round robin fashion. Okay, your implementation makes sense. Please add more details description about the behavior of this strategy especially when multiple threads use this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 after CI. Thank you @sashidhar!
Thanks a lot @jihoonson for your thorough and patient review. |
@dclim @himanshug @nishantmonu51 do you have more comments? |
* https://github.com/apache/incubator-druid/pull/8038#discussion_r325520829 of PR https://github | ||
* .com/apache/incubator-druid/pull/8038 for more details. | ||
*/ | ||
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "tier", defaultImpl = LeastBytesUsedStorageLocationSelectorStrategy.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is property = "tier"
here required, or is it copy/pasted from another location (like TierSelectorStrategy
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, should probably be "type"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dclim, let me know if it needs to be removed or changed to type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to type.
localStorageFolder1, loc1.getPath()); | ||
|
||
StorageLocation loc2 = locations.next(); | ||
Assert.assertEquals("The next element of the iterator should point to path local_storage_folder_1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assert message is wrong here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
localStorageFolder2, loc2.getPath()); | ||
|
||
StorageLocation loc3 = locations.next(); | ||
Assert.assertEquals("The next element of the iterator should point to path local_storage_folder_1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assert message is wrong here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
+1 after minor assert message change |
Thank you @sashidhar and @jihoonson for working through this |
Thanks @jihoonson, @himanshug, @dclim, @nishantmonu51 for your review and suggestions. |
Please add Release Notes label as this PR introduces a new Historical runtime property. |
@sashidhar oh yes I added. Thank you! |
* apache#7641 - Changing segment distribution algorithm to distribute segments to multiple segment cache locations * Fixing indentation * WIP * Adding interface for location strategy selection, least bytes used strategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy * fixing code style * Fixing test * Adding a method visible only for testing, fixing tests * 1. Changing the method contract to return an iterator of locations instead of a single best location. 2. Check style fixes * fixing the conditional statement * Added testSegmentDistributionUsingLeastBytesUsedStrategy, fixed testSegmentDistributionUsingRoundRobinStrategy * to trigger CI build * Add documentation for the selection strategy configuration * to re trigger CI build * updated docs as per review comments, made LeastBytesUsedStorageLocationSelectorStrategy.getLocations a synchronzied method, other minor fixes * In checkLocationConfigForNull method, using getLocations() to check for null instead of directly referring to the locations variable so that tests overriding getLocations() method do not fail * Implementing review comments. Added tests for StorageLocationSelectorStrategy * Checkstyle fixes * Adding java doc comments for StorageLocationSelectorStrategy interface * checkstyle * empty commit to retrigger build * Empty commit * Adding suppressions for words leastBytesUsed and roundRobin of ../docs/configuration/index.md file * Impl review comments including updating docs as suggested * Removing checkLocationConfigForNull(), @notempty annotation serves the purpose * Round robin iterator to keep track of the no. of iterations, impl review comments, added tests for round robin strategy * Fixing the round robin iterator * Removed numLocationsToTry, updated java docs * changing property attribute value from tier to type * Fixing assert messages
* apache#7641 - Changing segment distribution algorithm to distribute segments to multiple segment cache locations * Fixing indentation * WIP * Adding interface for location strategy selection, least bytes used strategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy * fixing code style * Fixing test * Adding a method visible only for testing, fixing tests * 1. Changing the method contract to return an iterator of locations instead of a single best location. 2. Check style fixes * fixing the conditional statement * Added testSegmentDistributionUsingLeastBytesUsedStrategy, fixed testSegmentDistributionUsingRoundRobinStrategy * to trigger CI build * Add documentation for the selection strategy configuration * to re trigger CI build * updated docs as per review comments, made LeastBytesUsedStorageLocationSelectorStrategy.getLocations a synchronzied method, other minor fixes * In checkLocationConfigForNull method, using getLocations() to check for null instead of directly referring to the locations variable so that tests overriding getLocations() method do not fail * Implementing review comments. Added tests for StorageLocationSelectorStrategy * Checkstyle fixes * Adding java doc comments for StorageLocationSelectorStrategy interface * checkstyle * empty commit to retrigger build * Empty commit * Adding suppressions for words leastBytesUsed and roundRobin of ../docs/configuration/index.md file * Impl review comments including updating docs as suggested * Removing checkLocationConfigForNull(), @notempty annotation serves the purpose * Round robin iterator to keep track of the no. of iterations, impl review comments, added tests for round robin strategy * Fixing the round robin iterator * Removed numLocationsToTry, updated java docs * changing property attribute value from tier to type * Fixing assert messages
Design proposal for #7641.
Description
Making optimal usage of multiple segment cache locations to distribute segments. See #7641 for more details.
Proposed Algorithm
Alternative Algorithms Considered
The following alternative algorithms have been discussed.
Least bytes used algorithm (or least filled disk) approach: This algorithm picks a location with the least bytes used. This to me seems reasonable in most cases. See Making optimal usage of multiple segment cache locations #8038 (comment). In practice the distribution of segment sizes are not very even for several reasons (an interval having less or more data, improperly tuned cluster etc). For example, segments sizes across intervals could be any where from 100MB to 1GB assuming most intervals with very close segment sizes, few intervals having outliers like say 100MB or 1GB. Let us consider we have 3 locations. If a location (location 1) loads a segment of size 1GB, the subsequent calls to load segments of lesser sizes will be distributed between locations 2 and 3 until both of them reach/cross 1 GB. This repeats every time a particular location loads a bigger size segment. This might not have optimal write throughput in such a scenario. However, I'm not sure how much of a problem is this.
Max free size algorithm: Choose the segment cache location with the max free size each time. This algorithm has a possible short coming as explained in Making optimal usage of multiple segment cache locations #8038 (comment).
New configuration
This PR introduces an optional new Historical runtime property druid.segmentCache.locationSelectorStrategy to make the segment cache location selection strategy configurable. Possible values for the above property - round-robin, least-bytes-used.
Test plan
Unit tests to be added.
Documentation
Documentation needs to be updated with the new property if the location selection strategy is made configurable and release notes for the same.