Suboptimal usage of multiple segment cache locations #7641

dclim · 2019-05-11T07:02:34Z

Affected Version

0.14.0-incubating

Description

There seems to be room for improvement in adding logic to distribute segments among multiple segment cache locations which could speed up overall I/O if the locations are backed by different physical drives. Right now, the algorithm seems to pick one location, fills it up to capacity, and then moves on to the next one.

An example test configuration:

druid.server.maxSize=35000000000
druid.segmentCache.locations=[{"path"\:"/mnt/var/druid/segment-cache","maxSize"\:"5000000000"}, {"path"\:"/mnt1/var/druid/segment-cache","maxSize"\:"10000000000"}, {"path"\:"/mnt2/var/druid/segment-cache","maxSize"\:"10000000000"}, {"path"\:"/mnt3/var/druid/segment-cache","maxSize"\:"10000000000"}]

Some point-in-time snapshots of disk usage:

t=1
/dev/nvme0n1    1.7T   79M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  3.9G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T   77M  1.7T   1% /mnt2
/dev/nvme3n1    1.7T   77M  1.7T   1% /mnt3

t=2
/dev/nvme0n1    1.7T   79M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.2G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  952M  1.7T   1% /mnt2
/dev/nvme3n1    1.7T   77M  1.7T   1% /mnt3

t=3
/dev/nvme0n1    1.7T   79M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.2G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  5.4G  1.7T   1% /mnt2
/dev/nvme3n1    1.7T   77M  1.7T   1% /mnt3

t=4
/dev/nvme0n1    1.7T   79M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.2G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  9.2G  1.7T   1% /mnt2
/dev/nvme3n1    1.7T  937M  1.7T   1% /mnt3

t=5
/dev/nvme0n1    1.7T   79M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.2G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  9.2G  1.7T   1% /mnt2
/dev/nvme3n1    1.7T  5.1G  1.7T   1% /mnt3

t=6
/dev/nvme0n1    1.7T  859M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.2G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  9.2G  1.7T   1% /mnt2
/dev/nvme3n1    1.7T  9.2G  1.7T   1% /mnt3

t=7 (steady state, all locations are 'full')
/dev/nvme0n1    1.7T  4.6G  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.3G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  9.2G  1.7T   1% /mnt2
/dev/nvme3n1    1.7T  9.2G  1.7T   1% /mnt3

Additionally, I'm seeing some misleading WARN level logs when it attempts to use a location that is at capacity and fails, after which it moves onto the next location:

2019-05-11T06:45:07,615 INFO [ZkCoordinator] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment tpch_lineitem_1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z_2018-04-03T22:29:17.915Z_1
2019-05-11T06:45:07,615 WARN [ZkCoordinator] org.apache.druid.segment.loading.StorageLocation - Segment[tpch_lineitem_1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z_2018-04-03T22:29:17.915Z_1:548,961,952] too large for storage[/mnt1/var/druid/segment-cache:268,514,257]. Check your druid.segmentCache.locations maxSize param
2019-05-11T06:45:07,615 WARN [ZkCoordinator] org.apache.druid.segment.loading.StorageLocation - Segment[tpch_lineitem_1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z_2018-04-03T22:29:17.915Z_1:548,961,952] too large for storage[/mnt2/var/druid/segment-cache:237,382,486]. Check your druid.segmentCache.locations maxSize param
2019-05-11T06:45:07,615 INFO [ZkCoordinator] org.apache.druid.storage.s3.S3DataSegmentPuller - Pulling index at path[s3://imply-awstest-druid/a336020b-9c30-4ecd-bd15-4cd4433998c3/segments/tpch_lineitem/1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z/2018-04-03T22:29:17.915Z/1/index.zip] to outDir[/mnt3/var/druid/segment-cache/tpch_lineitem/1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z/2018-04-03T22:29:17.915Z/1]

The text was updated successfully, but these errors were encountered:

sashidhar · 2019-06-27T14:26:09Z

@dclim , I would like to work on this. I'll revert back with my understanding before making any changes.

sashidhar · 2019-07-01T09:50:31Z

@dclim

I want to understand if the proposed algorithm has any side effects like data locality. With the current algorithm all the segments of a particular interval might land up in the same location (correct me). With the new algorithm the segments of the same interval will be distributed among multiple locations. Does this have any effect on the query performance ?
SegmentLoaderLocalCacheManager.loadSegmentWithRetry(..)

private StorageLocation loadSegmentWithRetry(DataSegment segment, String storageDirStr) throws 
SegmentLoadingException
  {
    for (StorageLocation loc : **locations**) {
      if (loc.canHandle(segment)) {
        File storageDir = new File(loc.getPath(), storageDirStr);
        try {
          loadInLocationWithStartMarker(segment, storageDir);
          return loc;
        }
        catch (SegmentLoadingException e) {  ....  }
      }
    }
    throw new SegmentLoadingException("Failed to load segment %s in all locations.", segment.getId());
  }

Whenever loadSegmentWithRetry() is called, the for loop above picks the same location (first from the list initially) until the location's capacity is exhausted. Once a location's capacity is exhausted it picks another.

The implementation of the proposed algorithm will look something like below,

      ....
private StorageLocation loadSegmentWithRetry(DataSegment segment, String storageDirStr) throws 
SegmentLoadingException
  {
      StorageLocation nextLocation = **getNextLocation();**
      if (nextLocation.canHandle(segment)) {
           // load segment
           return nextLocation;
      }
      ...
      ...
}

private StorageLocation getNextLocation() {
 **// Loop through the **locations** in a round robin fashion.** 
    return nextLocation;
}

…egments to multiple segment cache locations

@notempty

* #7641 - Changing segment distribution algorithm to distribute segments to multiple segment cache locations * Fixing indentation * WIP * Adding interface for location strategy selection, least bytes used strategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy * fixing code style * Fixing test * Adding a method visible only for testing, fixing tests * 1. Changing the method contract to return an iterator of locations instead of a single best location. 2. Check style fixes * fixing the conditional statement * Added testSegmentDistributionUsingLeastBytesUsedStrategy, fixed testSegmentDistributionUsingRoundRobinStrategy * to trigger CI build * Add documentation for the selection strategy configuration * to re trigger CI build * updated docs as per review comments, made LeastBytesUsedStorageLocationSelectorStrategy.getLocations a synchronzied method, other minor fixes * In checkLocationConfigForNull method, using getLocations() to check for null instead of directly referring to the locations variable so that tests overriding getLocations() method do not fail * Implementing review comments. Added tests for StorageLocationSelectorStrategy * Checkstyle fixes * Adding java doc comments for StorageLocationSelectorStrategy interface * checkstyle * empty commit to retrigger build * Empty commit * Adding suppressions for words leastBytesUsed and roundRobin of ../docs/configuration/index.md file * Impl review comments including updating docs as suggested * Removing checkLocationConfigForNull(), @notempty annotation serves the purpose * Round robin iterator to keep track of the no. of iterations, impl review comments, added tests for round robin strategy * Fixing the round robin iterator * Removed numLocationsToTry, updated java docs * changing property attribute value from tier to type * Fixing assert messages

@notempty

* apache#7641 - Changing segment distribution algorithm to distribute segments to multiple segment cache locations * Fixing indentation * WIP * Adding interface for location strategy selection, least bytes used strategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy * fixing code style * Fixing test * Adding a method visible only for testing, fixing tests * 1. Changing the method contract to return an iterator of locations instead of a single best location. 2. Check style fixes * fixing the conditional statement * Added testSegmentDistributionUsingLeastBytesUsedStrategy, fixed testSegmentDistributionUsingRoundRobinStrategy * to trigger CI build * Add documentation for the selection strategy configuration * to re trigger CI build * updated docs as per review comments, made LeastBytesUsedStorageLocationSelectorStrategy.getLocations a synchronzied method, other minor fixes * In checkLocationConfigForNull method, using getLocations() to check for null instead of directly referring to the locations variable so that tests overriding getLocations() method do not fail * Implementing review comments. Added tests for StorageLocationSelectorStrategy * Checkstyle fixes * Adding java doc comments for StorageLocationSelectorStrategy interface * checkstyle * empty commit to retrigger build * Empty commit * Adding suppressions for words leastBytesUsed and roundRobin of ../docs/configuration/index.md file * Impl review comments including updating docs as suggested * Removing checkLocationConfigForNull(), @notempty annotation serves the purpose * Round robin iterator to keep track of the no. of iterations, impl review comments, added tests for round robin strategy * Fixing the round robin iterator * Removed numLocationsToTry, updated java docs * changing property attribute value from tier to type * Fixing assert messages

@notempty

* apache#7641 - Changing segment distribution algorithm to distribute segments to multiple segment cache locations * Fixing indentation * WIP * Adding interface for location strategy selection, least bytes used strategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy * fixing code style * Fixing test * Adding a method visible only for testing, fixing tests * 1. Changing the method contract to return an iterator of locations instead of a single best location. 2. Check style fixes * fixing the conditional statement * Added testSegmentDistributionUsingLeastBytesUsedStrategy, fixed testSegmentDistributionUsingRoundRobinStrategy * to trigger CI build * Add documentation for the selection strategy configuration * to re trigger CI build * updated docs as per review comments, made LeastBytesUsedStorageLocationSelectorStrategy.getLocations a synchronzied method, other minor fixes * In checkLocationConfigForNull method, using getLocations() to check for null instead of directly referring to the locations variable so that tests overriding getLocations() method do not fail * Implementing review comments. Added tests for StorageLocationSelectorStrategy * Checkstyle fixes * Adding java doc comments for StorageLocationSelectorStrategy interface * checkstyle * empty commit to retrigger build * Empty commit * Adding suppressions for words leastBytesUsed and roundRobin of ../docs/configuration/index.md file * Impl review comments including updating docs as suggested * Removing checkLocationConfigForNull(), @notempty annotation serves the purpose * Round robin iterator to keep track of the no. of iterations, impl review comments, added tests for round robin strategy * Fixing the round robin iterator * Removed numLocationsToTry, updated java docs * changing property attribute value from tier to type * Fixing assert messages

dclim added Performance Improvement Area - Segment Balancing/Coordination labels May 11, 2019

sashidhar pushed a commit to sashidhar/incubator-druid that referenced this issue Jul 6, 2019

apache#7641 - Changing segment distribution algorithm to distribute s…

a619112

…egments to multiple segment cache locations

sashidhar mentioned this issue Jul 6, 2019

Making optimal usage of multiple segment cache locations #8038

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suboptimal usage of multiple segment cache locations #7641

Suboptimal usage of multiple segment cache locations #7641

dclim commented May 11, 2019

sashidhar commented Jun 27, 2019

sashidhar commented Jul 1, 2019 •

edited

Loading

Suboptimal usage of multiple segment cache locations #7641

Suboptimal usage of multiple segment cache locations #7641

Comments

dclim commented May 11, 2019

Affected Version

Description

sashidhar commented Jun 27, 2019

sashidhar commented Jul 1, 2019 • edited Loading

sashidhar commented Jul 1, 2019 •

edited

Loading