Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] ArraySpan::FillFromScalar is unsafe #35581

Closed
bkietz opened this issue May 13, 2023 · 6 comments · Fixed by #36018
Closed

[C++] ArraySpan::FillFromScalar is unsafe #35581

bkietz opened this issue May 13, 2023 · 6 comments · Fixed by #36018

Comments

@bkietz
Copy link
Member

bkietz commented May 13, 2023

Describe the bug, including details regarding any error messages, version, and platform.

ArraySpan::FillFromScalar can store pointers into the structure itself (specifically, into ArraySpan::scratch_space) which produces an ArraySpan which easily be unsafely moved and copied:

ArraySpan span;
span.FillFromScalar(scalar);

UseSpan(span); // Fine; the original span is still alive.

return span; // Undefined Behavior; the returned copy views
             // a stack variable whose lifetime has ended.

The capability to view a Scalar as an ArraySpan can be preserved and made safer by restricting access to the span to an explicitly delineated scope:

ArraySpan::FillFromScalar(scalar, [&](ArraySpan span) {
  UseSpan(span); // Fine; the viewed scratch space is alive inside this scope.
});

This has the pleasant side effect of reducing the size of ArraySpan by 16 bytes.

Component(s)

C++

@felipecrv
Copy link
Contributor

ArraySpan span;
span.FillFromScalar(scalar);

UseSpan(span); // Fine; the original span is still alive.

return span; // Undefined Behavior; the returned copy views
             // a stack variable whose lifetime has ended.

Due to RVO, the returned ArraySpan is constructed at the caller's stack and this is not biting us, I guess. I think this is a matter of always using destination-passing style when dealing with ArraySpan and never copying or moving it.

Copy and Move constructors haven't been = delete;d in the class, I wonder if they could be still. I think we are allowed to do so as it's laveled EXPERIMENTAL.

felipecrv added a commit to felipecrv/arrow that referenced this issue May 18, 2023
Copying ArraySpan is not without issues (see apache#35581), and this change
seems good by itself.
@bkietz
Copy link
Member Author

bkietz commented May 18, 2023

This is not biting us currently because AFAICT the only place we use FillFromScalar is

void PromoteExecSpanScalars(ExecSpan* span) {
// In the "all scalar" case, we "promote" the scalars to ArraySpans of
// length 1, since the kernel implementations do not handle the all
// scalar case
for (int i = 0; i < span->num_values(); ++i) {
ExecValue* value = &span->values[i];
if (value->is_scalar()) {
value->array.FillFromScalar(*value->scalar);
value->scalar = nullptr;
}
}
}
where they are in a std::vector from which they are never moved. This is subtle too: as long as the original ArraySpan remains valid copies will be valid too, so

ArraySpan Thing::get(int i) { return vector_of_spans_[i]; }

Might not segfault if the Thing happens to be kept alive.

@felipecrv
Copy link
Contributor

Scary. And as I thought more about my proposal of deleting the copy ctor of ArraySpan, I realized that completely defeats the point of having a non-owning span type.

If we really care about small buffers being cheap, we should instead consider changing the Buffer class to have an optimization similar to the small-string optimization that std::string does. That can be made safe by correct implementation of the move/copy constructors of Buffer

felipecrv added a commit to felipecrv/arrow that referenced this issue May 23, 2023
Copying ArraySpan is not without issues (see apache#35581), and this change
seems good by itself.
@pitrou
Copy link
Member

pitrou commented May 24, 2023

Given that ArraySpan is non-owning, isn't it part of the API contract that you have to make sure the backing object does not fall out of scope?

We should probably make this clearer in the docstrings, though.

@bkietz
Copy link
Member Author

bkietz commented May 25, 2023

It's more serious than that since the ArraySpan does own the scratch space, which means that copies are viewing data in a scalar and also data in the original span (which may not exist anymore)

pitrou pushed a commit that referenced this issue May 30, 2023
### Rationale for this change

Copying ArraySpan is not without issues (see #35581), and this change seems good by itself as it makes it easier for the compiler to SROA [1] the whole `RunEndEncodedArraySpan` class.

[1] https://www.llvm.org/docs/Passes.html#sroa-scalar-replacement-of-aggregates

### What changes are included in this PR?
 
 - Make `array_span` private in `ree_util::RunEndEncodedArraySpan`
 - Allow construction of  `ree_util::RunEndEncodedArraySpan` with separate `offset` and `length`
 - Change `RunEndEncodedBuilder` to avoid a copy of the `ArraySpan` being iterated over
 - Prevent instantiation based on implicit conversion from `ArrayData` to `ArraySpan`
 
### Are these changes tested?

Yes, by the existing unit tests of the various uses of `RunEndEncodedArraySpan`.

### Are there any user-facing changes?

No meaningful changes other than some small tweaks in the interface of a recently added class.

* Closes: #35675

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@bkietz bkietz self-assigned this Jun 8, 2023
bkietz added a commit to bkietz/arrow that referenced this issue Jun 9, 2023
pitrou pushed a commit to bkietz/arrow that referenced this issue Jul 5, 2023
bkietz added a commit to bkietz/arrow that referenced this issue Jul 6, 2023
bkietz added a commit to bkietz/arrow that referenced this issue Jul 6, 2023
pitrou pushed a commit that referenced this issue Jul 11, 2023
ArraySpan contained scratch space inside itself for storing offsets when viewing a scalar as a length=1 array. This could lead to dangling pointers in copies of the ArraySpan since copies' pointers will always refer to the original's scratch space, which may have been destroyed.

This patch moves that scratch space into Scalar itself, which is more conformant to the contract of a view since the scratch space will be valid as long as the Scalar is alive, regardless of how ArraySpans are copied.
* Closes: #35581

Authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 13.0.0 milestone Jul 11, 2023
@raulcd
Copy link
Member

raulcd commented Jul 11, 2023

I am moving this to 14.0.0 as it is not marked as a blocker for the 13.0.0 release, let me know if it should be part of the release though.

@raulcd raulcd modified the milestones: 13.0.0, 14.0.0 Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants