-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-49104: [C++] Fix Segfault in SparseCSFIndex::Equals with mismatched dimensions #49105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
GH-49104: [C++] Fix Segfault in SparseCSFIndex::Equals with mismatched dimensions #49105
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
|
|
Could you add a test for this case? |
d60ea08 to
3e1cbd6
Compare
|
@kou On your request, I have added a new test case, Test details: Fix summary: |
|
Could you fix the lint failure? |
…imit and style guide
|
I have fixed the lint failures by reformatting SparseCSFIndex::Equals() to comply with the 90-character line limit and Arrow's style guide. All functionality remains unchanged. You can have a check on it ): |
The TEST(TestSparseCSFIndex, EqualsMismatchedDimensions) test created SparseCSFIndex objects with empty tensors (nullptr buffers, 0-length shape), causing segfaults during validation on ASAN/UBSAN and 'front() called on empty vector' errors on MSVC. The typed test TestEqualityMismatchedDimensions already properly validates the fix with valid CSF index structures.
|
Note: Some packaging/JNI tests are failing due to Docker image naming with my fork. The core C++ tests should be passing. |
|
@kou can you have a look at this ?? |
cpp/src/arrow/sparse_tensor.cc
Outdated
| for (int64_t i = 0; i < static_cast<int64_t>(indices().size()); ++i) { | ||
| if (!indices()[i]->Equals(*other.indices()[i])) return false; | ||
| if (!indices()[i]->Equals(*other.indices()[i])) { | ||
| return false; | ||
| } | ||
| } | ||
| for (int64_t i = 0; i < static_cast<int64_t>(indptr().size()); ++i) { | ||
| if (!indptr()[i]->Equals(*other.indptr()[i])) return false; | ||
| if (!indptr()[i]->Equals(*other.indptr()[i])) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this being changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pease revert this to make the PR more readable.
cpp/src/arrow/sparse_tensor.cc
Outdated
| if (axis_order().size() != other.axis_order().size()) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why is this required?
|
@raulcd This change fixes a segmentation fault that occurs when comparing Bug description:
Fix: if (indices().size() != other.indices().size()) return false;
if (indptr().size() != other.indptr().size()) return false;
if (axis_order().size() != other.axis_order().size()) return false;These checks ensure that Equals safely returns false when dimensions do not match, preventing access to invalid memory. Test coverage: |
|
Hii @Alirana2829 thanks for the comment. |
Keep only essential size checks. Maintainers requested reverting formatting changes to reduce diff noise and improve readability.
|
@raulcd You're absolutely right - the axis_order size check was redundant. Since the final return axis_order() == other.axis_order() already uses vector equality (which checks size internally), that check was unnecessary. I've removed it. The PR now only contains the essential size checks for indices() and indptr() that prevent the segfault from out-of-bounds access. |
|
@rok I've reverted the formatting changes. The loop bodies are back to single-line format. |
The axis_order().size() check was unnecessary because vector equality operator already compares sizes. Keeping only the essential checks for indices() and indptr() that prevent segfault from out-of-bounds access.
rok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ok, I would just add the one test.
| ASSERT_FALSE(si_3D->Equals(*si_2D)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ASSERT_FALSE(si_3D->Equals(*si_2D)); | |
| } | |
| ASSERT_FALSE(si_3D->Equals(*si_2D)); | |
| ASSERT_TRUE(si_2D->Equals(*si_2D)); | |
| } |
| bool SparseCSFIndex::Equals(const SparseCSFIndex& other) const { | ||
| if (indices().size() != other.indices().size()) { | ||
| return false; | ||
| } | ||
| if (indptr().size() != other.indptr().size()) { | ||
| return false; | ||
| } | ||
|
|
||
| for (int64_t i = 0; i < static_cast<int64_t>(indices().size()); ++i) { | ||
| if (!indices()[i]->Equals(*other.indices()[i])) return false; | ||
| } | ||
| for (int64_t i = 0; i < static_cast<int64_t>(indptr().size()); ++i) { | ||
| if (!indptr()[i]->Equals(*other.indptr()[i])) return false; | ||
| } | ||
| return axis_order() == other.axis_order(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just replacing this function?
| bool SparseCSFIndex::Equals(const SparseCSFIndex& other) const { | |
| auto eq = [](const auto& a, const auto& b) { return a->Equals(*b); }; | |
| return axis_order() == other.axis_order() | |
| && std::ranges::equal(indices(), other.indices(), eq) | |
| && std::ranges::equal(indptr(), other.indptr(), eq); | |
| } |
Rationale for This Change
The
SparseCSFIndex::Equalsmethod can crash when comparing two sparse indices that have a different number of dimensions. The method iterates over theindices()andindptr()vectors of the current object and accesses the corresponding elements in theotherobject without first verifying that both objects have matching vector sizes. This can lead to out-of-bounds access and a segmentation fault when the dimension counts differ.What Changes Are Included in This PR?
This change adds explicit size equality checks for the
indices()andindptr()vectors at the beginning of theSparseCSFIndex::Equalsmethod. If the dimensions do not match, the method now safely returnsfalseinstead of attempting invalid memory access.Are These Changes Tested?
Yes. The fix has been validated through targeted reproduction of the crash scenario using mismatched dimension counts, ensuring the method behaves safely and deterministically.
Are There Any User-Facing Changes?
No. This change improves internal safety and robustness without altering public APIs or observable user behavior.