HDDS-3466. Improve PipelinePlacementPolicy performance. #9633

szetszwo · 2026-01-14T03:24:51Z

What changes were proposed in this pull request?

HDDS-3139 has changed PipelinePlacementPolicy to sort the datanodes by the number of existing pipelines. It simply uses Stream.sorted(..) to sort the entire list. @sodonnel has pointed out that the performance is degraded since the running time of sorting generally is $$O(n \log n)$$. It has changed the PipelinePlacementPolicy running time from $$O(n)$$ to $$O(n \log n)$$.

The proposed solution here is to use bucket-sort with bucket size == 1. For examples, a cluster may have 5,000 datanodes (elements) but the number of pipelines (buckets) per datanode is mostly less than 100. Then, the running time of bucket-sort is $$O(n \log b)$$, which is more efficient than the usual $$O(n \log n)$$ sorting, where $$n$$ is the number of elements and $$b$$ is the number of buckets.

An alternative worth considering is counting-sort. We may create a fixed array of buckets (say 100), where the $$k$$-th bucket is for the datanodes with $$k$$ pipelines. We need to handle the outlier datanodes which have number of pipelines larger than the array size. The running time becomes essentially linear, provided that the number of outliers is small. It actually is the first implementation I have done but the code is much more complicated. It may not worth doing it in this case.

What is the link to the Apache JIRA

HDDS-3466

How was this patch tested?

Added a new test.

adoroszlai

Thanks @szetszwo for the patch.

adoroszlai · 2026-01-14T17:11:37Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/SortedList.java

+    return true;
+  }
+
+  private E getOrRmove(String name, int index, BiFunction<List<E>, Integer, E> method) {


typo: getOrRmove -> getOrRemove

adoroszlai · 2026-01-14T17:56:03Z

hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/pipeline/TestSortedList.java

+    for (Element e : ordering) {
+      count++;
+      assertTrue(e.weight >= min);
+      min = e.weight;
+      assertTrue(contains.contains(e));
+    }


We can assert that both lists have e at the same position instead of just contains(), but need to sort value numerically.

diff --git hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/pipeline/TestSortedList.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/pipeline/TestSortedList.java index 724e6facbb..4c49515f2f 100644 --- hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/pipeline/TestSortedList.java +++ hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/pipeline/TestSortedList.java @@ -17,13 +17,16 @@ package org.apache.hadoop.hdds.scm.pipeline; +import static org.assertj.core.api.Assertions.assertThat; import static org.junit.jupiter.api.Assertions.assertEquals; import static org.junit.jupiter.api.Assertions.assertFalse; +import static org.junit.jupiter.api.Assertions.assertSame; import static org.junit.jupiter.api.Assertions.assertTrue; import java.util.ArrayList; import java.util.Collections; import java.util.Comparator; +import java.util.Iterator; import java.util.List; import java.util.Random; import org.junit.jupiter.api.Test; @@ -37,7 +40,7 @@ public class TestSortedList { static class Element implements Comparable<Element> { private final int weight; - private final String value = "e" + ++id; + private final String value = "e" + String.format("%04d", ++id); Element(int weight) { this.weight = weight; @@ -149,11 +152,14 @@ static void assertOrdering(List<Element> ordering, List<Element> contains) { int min = -1; int count = 0; + final Iterator<Element> actual = contains.iterator(); for (Element e : ordering) { - count++; - assertTrue(e.weight >= min); + assertThat(e.weight).isGreaterThanOrEqualTo(min); min = e.weight; - assertTrue(contains.contains(e)); + assertThat(contains).contains(e); + assertTrue(actual.hasNext()); + assertSame(e, actual.next(), "[" + count + "]"); + count++; } assertEquals(size, count, () -> ordering.getClass().getSimpleName() + " " + ordering); }

szetszwo · 2026-01-14T19:00:04Z

@adoroszlai , thanks for reviewing this! I just pushed a change to address your comments.

adoroszlai · 2026-01-14T20:15:32Z

Thanks @szetszwo for updating the patch.

szetszwo · 2026-01-14T20:37:43Z

@adoroszlai , thanks for reviewing this!

HDDS-3466. Improve pipeline creation performance.

36c285b

siddhantsangwan self-requested a review January 14, 2026 08:46

szetszwo requested a review from adoroszlai January 14, 2026 16:39

adoroszlai reviewed Jan 14, 2026

View reviewed changes

Address review comments

3e8353a

adoroszlai approved these changes Jan 14, 2026

View reviewed changes

szetszwo merged commit 104261c into apache:master Jan 14, 2026
44 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-3466. Improve PipelinePlacementPolicy performance. #9633

HDDS-3466. Improve PipelinePlacementPolicy performance. #9633

Uh oh!

szetszwo commented Jan 14, 2026 •

edited

Loading

Uh oh!

adoroszlai left a comment

Uh oh!

adoroszlai Jan 14, 2026

Uh oh!

adoroszlai Jan 14, 2026

Uh oh!

szetszwo commented Jan 14, 2026

Uh oh!

adoroszlai commented Jan 14, 2026

Uh oh!

Uh oh!

szetszwo commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HDDS-3466. Improve PipelinePlacementPolicy performance. #9633

HDDS-3466. Improve PipelinePlacementPolicy performance. #9633

Uh oh!

Conversation

szetszwo commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

adoroszlai left a comment

Choose a reason for hiding this comment

Uh oh!

adoroszlai Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

adoroszlai Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

szetszwo commented Jan 14, 2026

Uh oh!

adoroszlai commented Jan 14, 2026

Uh oh!

Uh oh!

szetszwo commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

szetszwo commented Jan 14, 2026 •

edited

Loading