Skip to content

Commit

Permalink
[ISSUE-451][Improvement] Read HDFS data files with random sequence to…
Browse files Browse the repository at this point in the history
… distribute pressure (#452)

### What changes were proposed in this pull request?
[Improvement] Read HDFS data files with random sequence to distribute pressure #452

### Why are the changes needed?
In PR #396 to support concurrently writing single partition's data into multiple HDFS files, it's better to randomly read HDFS data files to distribute stress in client side.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing UTs
  • Loading branch information
zuston authored Jan 3, 2023
1 parent 8d59e2f commit 9572b84
Showing 1 changed file with 6 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@
package org.apache.uniffle.storage.handler.impl;

import java.io.FileNotFoundException;
import java.util.Comparator;
import java.util.Collections;
import java.util.List;
import java.util.stream.Collectors;

import com.google.common.collect.Lists;
import org.apache.hadoop.conf.Configuration;
Expand Down Expand Up @@ -143,7 +144,10 @@ protected void init(String fullShufflePath) {
LOG.warn("Can't create ShuffleReaderHandler for " + filePrefix, e);
}
}
readHandlers.sort(Comparator.comparing(HdfsShuffleReadHandler::getFilePrefix));
Collections.shuffle(readHandlers);
LOG.info("Reading order of HDFS files with name prefix: {}",
readHandlers.stream().map(x -> x.filePrefix).collect(Collectors.toList())
);
}
}

Expand Down

0 comments on commit 9572b84

Please sign in to comment.