GH-49104: [C++] Fix Segfault in SparseCSFIndex::Equals with mismatched dimensions #49105

AliRana30 · 2026-01-31T17:21:23Z

Rationale for This Change

The SparseCSFIndex::Equals method can crash when comparing two sparse indices that have a different number of dimensions. The method iterates over the indices() and indptr() vectors of the current object and accesses the corresponding elements in the other object without first verifying that both objects have matching vector sizes. This can lead to out-of-bounds access and a segmentation fault when the dimension counts differ.

What Changes Are Included in This PR?

This change adds explicit size equality checks for the indices() and indptr() vectors at the beginning of the SparseCSFIndex::Equals method. If the dimensions do not match, the method now safely returns false instead of attempting invalid memory access.

Are These Changes Tested?

Yes. The fix has been validated through targeted reproduction of the crash scenario using mismatched dimension counts, ensuring the method behaves safely and deterministically.

Are There Any User-Facing Changes?

No. This change improves internal safety and robustness without altering public APIs or observable user behavior.

GitHub Issue: [C++] Segfault in SparseCSFIndex::Equals with mismatched dimensions #49104

github-actions · 2026-01-31T17:21:44Z

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

github-actions · 2026-01-31T19:26:05Z

⚠️ GitHub issue #49104 has been automatically assigned in GitHub to PR creator.

kou · 2026-02-01T12:39:46Z

Could you add a test for this case?

AliRana30 · 2026-02-01T16:22:07Z

@kou On your request, I have added a new test case, TestEqualityMismatchedDimensions, in
cpp/src/arrow/sparse_tensor_test.cc.

Test details:
This test compares SparseCSFIndex objects with mismatched dimensions (for example, 1D vs 2D) to verify that Equals now safely returns false instead of causing a segfault.

Fix summary:
The fix adds explicit checks for the sizes of indices, indptr, and axis_order before attempting to iterate over them, ensuring safe comparison when dimensions do not match.

… add test

kou · 2026-02-02T13:33:36Z

Could you fix the lint failure?

…imit and style guide

AliRana30 · 2026-02-02T14:36:39Z

I have fixed the lint failures by reformatting SparseCSFIndex::Equals() to comply with the 90-character line limit and Arrow's style guide. All functionality remains unchanged.

You can have a check on it ):

… 1D)

The TEST(TestSparseCSFIndex, EqualsMismatchedDimensions) test created SparseCSFIndex objects with empty tensors (nullptr buffers, 0-length shape), causing segfaults during validation on ASAN/UBSAN and 'front() called on empty vector' errors on MSVC. The typed test TestEqualityMismatchedDimensions already properly validates the fix with valid CSF index structures.

AliRana30 · 2026-02-02T16:12:42Z

Note: Some packaging/JNI tests are failing due to Docker image naming with my fork. The core C++ tests should be passing.
Some builds are failing due to Google Benchmark deprecation warnings in benchmark_util.h (not modified by this PR). The core issue fix and tests are complete.
I think it's not an issue>

AliRana30 · 2026-02-03T15:22:15Z

@kou can you have a look at this ??

raulcd · 2026-02-03T15:29:59Z

cpp/src/arrow/sparse_tensor.cc

  for (int64_t i = 0; i < static_cast<int64_t>(indices().size()); ++i) {
-    if (!indices()[i]->Equals(*other.indices()[i])) return false;
+    if (!indices()[i]->Equals(*other.indices()[i])) {
+      return false;
+    }
  }
  for (int64_t i = 0; i < static_cast<int64_t>(indptr().size()); ++i) {
-    if (!indptr()[i]->Equals(*other.indptr()[i])) return false;
+    if (!indptr()[i]->Equals(*other.indptr()[i])) {
+      return false;
+    }


Why is this being changed?

Pease revert this to make the PR more readable.

raulcd · 2026-02-03T15:30:33Z

cpp/src/arrow/sparse_tensor.cc

+  if (axis_order().size() != other.axis_order().size()) {
+    return false;
+  }


Could you explain why is this required?

AliRana30 · 2026-02-03T15:53:06Z

@raulcd This change fixes a segmentation fault that occurs when comparing SparseCSFIndex objects with mismatched dimensions.

Bug description:
The original implementation calls indices()[i]->Equals(...) without first verifying that both objects have the same number of dimensions. When comparing,
for example, a 2D index with a 3D index:

indices().size() returns different values (e.g., 2 vs 3)
The loop iterates using the size of the first object
Accessing other.indices()[i] with mismatched sizes results in out-of-bounds access, leading to a segmentation fault

Fix:
Lines 408–416 add early size checks before any iteration:

if (indices().size() != other.indices().size()) return false;
if (indptr().size() != other.indptr().size()) return false;
if (axis_order().size() != other.axis_order().size()) return false;

These checks ensure that Equals safely returns false when dimensions do not match, preventing access to invalid memory.

Test coverage:
Added TestEqualityMismatchedDimensions (lines 1644–1661), which reproduces the original issue. Without this fix, the test would crash due to the segmentation fault.

raulcd · 2026-02-03T16:50:45Z

Hii @Alirana2829 thanks for the comment.
I wasn't asking about a summary of the overall change, I do understand what we are trying to solve with the PR. I am talking about those specific line changes I am pointing out on the review comments. Those two specific changes do not seem required.

Keep only essential size checks. Maintainers requested reverting formatting changes to reduce diff noise and improve readability.

AliRana30 · 2026-02-03T19:09:29Z

@raulcd You're absolutely right - the axis_order size check was redundant. Since the final return axis_order() == other.axis_order() already uses vector equality (which checks size internally), that check was unnecessary.

I've removed it. The PR now only contains the essential size checks for indices() and indptr() that prevent the segfault from out-of-bounds access.

AliRana30 · 2026-02-03T19:10:03Z

@rok I've reverted the formatting changes. The loop bodies are back to single-line format.

The axis_order().size() check was unnecessary because vector equality operator already compares sizes. Keeping only the essential checks for indices() and indptr() that prevent segfault from out-of-bounds access.

rok

This looks ok, I would just add the one test.

rok · 2026-02-03T20:06:46Z

cpp/src/arrow/sparse_tensor_test.cc

+  ASSERT_FALSE(si_3D->Equals(*si_2D));
+}


Suggested change

ASSERT_FALSE(si_3D->Equals(*si_2D));

}

ASSERT_FALSE(si_3D->Equals(*si_2D));

ASSERT_TRUE(si_2D->Equals(*si_2D));

}

rok · 2026-02-03T20:25:25Z

cpp/src/arrow/sparse_tensor.cc

 bool SparseCSFIndex::Equals(const SparseCSFIndex& other) const {
+  if (indices().size() != other.indices().size()) {
+    return false;
+  }
+  if (indptr().size() != other.indptr().size()) {
+    return false;
+  }
+
  for (int64_t i = 0; i < static_cast<int64_t>(indices().size()); ++i) {
    if (!indices()[i]->Equals(*other.indices()[i])) return false;
  }
  for (int64_t i = 0; i < static_cast<int64_t>(indptr().size()); ++i) {
    if (!indptr()[i]->Equals(*other.indptr()[i])) return false;
  }
  return axis_order() == other.axis_order();
 }


How about just replacing this function?

Suggested change

bool SparseCSFIndex::Equals(const SparseCSFIndex& other) const {

auto eq = [](const auto& a, const auto& b) { return a->Equals(*b); };

return axis_order() == other.axis_order()

&& std::ranges::equal(indices(), other.indices(), eq)

&& std::ranges::equal(indptr(), other.indptr(), eq);

}

github-actions bot added Component: C++ awaiting review Awaiting review labels Jan 31, 2026

AliRana30 changed the title ~~[C++] Fix Segfault in SparseCSFIndex::Equals with mismatched dimensions~~ GH-49104: [C++] Fix Segfault in SparseCSFIndex::Equals with mismatched dimensions Jan 31, 2026

AliRana30 requested review from assignUser, jonkeane, kou and raulcd as code owners February 1, 2026 09:25

AliRana30 force-pushed the fix-sparsecsfindex-equals-segfault branch from d60ea08 to 3e1cbd6 Compare February 1, 2026 13:50

Alirana2829 added 2 commits February 1, 2026 18:54

[C++] Fix Segfault in SparseCSFIndex::Equals with mismatched dimensions

daa5f73

[C++] Add regression test for SparseCSFIndex::Equals segfault

3e1cbd6

Fix Segfault in SparseCSFIndex::Equals with mismatched dimensions and…

fdf506e

… add test

Fix lint: Format SparseCSFIndex::Equals to comply with 90-char line l…

c87afdb

…imit and style guide

Alirana2829 added 2 commits February 2, 2026 20:35

Fix test: Use valid CSF index structures (2D vs 3D instead of invalid…

8dbc21b

… 1D)

raulcd reviewed Feb 3, 2026

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Feb 3, 2026

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 3, 2026

Revert formatting changes per maintainer feedback

76fe338

Keep only essential size checks. Maintainers requested reverting formatting changes to reduce diff noise and improve readability.

Remove redundant axis_order size check per review feedback

a6e5088

The axis_order().size() check was unnecessary because vector equality operator already compares sizes. Keeping only the essential checks for indices() and indptr() that prevent segfault from out-of-bounds access.

AliRana30 requested review from raulcd and rok February 3, 2026 19:25

rok requested changes Feb 3, 2026

View reviewed changes

rok reviewed Feb 3, 2026

View reviewed changes

+bool SparseCSFIndex::Equals(const SparseCSFIndex& other) const {
+  auto eq = [](const auto& a, const auto& b) { return a->Equals(*b); };
+  return axis_order() == other.axis_order()
+      && std::ranges::equal(indices(), other.indices(), eq)
+      && std::ranges::equal(indptr(), other.indptr(), eq);
+}

GH-49104: [C++] Fix Segfault in SparseCSFIndex::Equals with mismatched dimensions #49105

Are you sure you want to change the base?

GH-49104: [C++] Fix Segfault in SparseCSFIndex::Equals with mismatched dimensions #49105

Conversation

AliRana30 commented Jan 31, 2026 • edited by kou Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for This Change

What Changes Are Included in This PR?

Are These Changes Tested?

Are There Any User-Facing Changes?

Uh oh!

github-actions bot commented Jan 31, 2026

Uh oh!

github-actions bot commented Jan 31, 2026

Uh oh!

kou commented Feb 1, 2026

Uh oh!

AliRana30 commented Feb 1, 2026

Uh oh!

kou commented Feb 2, 2026

Uh oh!

AliRana30 commented Feb 2, 2026

Uh oh!

AliRana30 commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AliRana30 commented Feb 3, 2026

Uh oh!

raulcd Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

rok Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

raulcd Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

AliRana30 commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raulcd commented Feb 3, 2026

Uh oh!

AliRana30 commented Feb 3, 2026

Uh oh!

AliRana30 commented Feb 3, 2026

Uh oh!

rok left a comment

Choose a reason for hiding this comment

Uh oh!

rok Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

rok Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

AliRana30 commented Jan 31, 2026 •

edited by kou

Loading

AliRana30 commented Feb 2, 2026 •

edited

Loading

AliRana30 commented Feb 3, 2026 •

edited

Loading