Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-35988: [C#] The C data interface implementation can leak on import #35996

Merged
merged 4 commits into from
Jun 14, 2023

Conversation

CurtHagenlocher
Copy link
Contributor

@CurtHagenlocher CurtHagenlocher commented Jun 8, 2023

What changes are included in this PR?

To ensure proper cleanup, immediately copies the contents of the C structure into the imported class for arrays and streams.
Relaxes the requirement when exporting that the target structure appear uninitialized.

Are these changes tested?

Existing tests pass. We don't as yet seem to have a good way to test for memory leaks so no new tests have been added.

Are there any user-facing changes?

No.

@github-actions
Copy link

github-actions bot commented Jun 8, 2023

⚠️ GitHub issue #35988 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Jun 8, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 8, 2023
@pitrou
Copy link
Member

pitrou commented Jun 8, 2023

Hmm, so there are two (unrelated?) changes here.

The first change (removing the cArray->release != null checks in export functions) looks fine.

The second change is IMHO the symptom of an implementation issue. ImportedArrowArray currently stores the given raw pointer CArrowArray* _cArray. Instead, it should probably store its own CArrowArray struct like the C++ importer does:

// A wrapper struct for an imported C ArrowArray.
// The ArrowArray is released on destruction.
struct ImportedArrayData {
struct ArrowArray array_;
ImportedArrayData() {
ArrowArrayMarkReleased(&array_); // Initially released
}
void Release() {
if (!ArrowArrayIsReleased(&array_)) {
ArrowArrayRelease(&array_);
DCHECK(ArrowArrayIsReleased(&array_));
}
}
~ImportedArrayData() { Release(); }
private:
ARROW_DISALLOW_COPY_AND_ASSIGN(ImportedArrayData);
};

Then, importing an array moves the user-supplied structure to the owned structure, like this:

import_ = std::make_shared<ImportedArrayData>();
c_struct_ = &import_->array_;
ArrowArrayMove(src, c_struct_);

Therefore, the imported array structure does not depend on the lifetime of the user-supplied pointer.

By the way, moving is cheap so you shouldn't fear any inefficiency here:

/// Move the C array from `src` to `dest`
///
/// Note `dest` must *not* point to a valid array already, otherwise there
/// will be a memory leak.
inline void ArrowArrayMove(struct ArrowArray* src, struct ArrowArray* dest) {
assert(dest != src);
assert(!ArrowArrayIsReleased(src));
memcpy(dest, src, sizeof(struct ArrowArray));
ArrowArrayMarkReleased(src);
}

@CurtHagenlocher
Copy link
Contributor Author

CurtHagenlocher commented Jun 8, 2023

The second change is IMHO the symptom of an implementation issue. ImportedArrowArray currently stores the given raw pointer CArrowArray* _cArray. Instead, it should probably store its own CArrowArray struct like the C++ importer does:

Ah, that's much nicer. You can tell I haven't been responsible for managing my own resources in a few years... .

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 8, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 8, 2023
@@ -50,10 +50,6 @@ public static unsafe void ExportType(IArrowType datatype, CArrowSchema* schema)
{
throw new ArgumentNullException(nameof(schema));
}
if (schema->release != null)
{
throw new ArgumentException("Cannot export schema to a struct that is already initialized.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, it looks like this change fixes an issue I ran into using a C# exported stream from an older version of C++ Arrow, where the schema struct used when importing the stream was uninitialized: https://github.com/apache/arrow/blob/apache-arrow-10.0.1/cpp/src/arrow/c/bridge.cc#L1782 (this has since been fixed)

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly the basic premise is that, on import, we let CArrowArray be in the garbage collector pool since it should be safe to destroy if we happen to lose all references to it (unlike an exported array which we assume will have external references)? That seems valid to me.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 13, 2023
@CurtHagenlocher
Copy link
Contributor Author

If I understand correctly the basic premise is that, on import, we let CArrowArray be in the garbage collector pool since it should be safe to destroy if we happen to lose all references to it (unlike an exported array which we assume will have external references)? That seems valid to me.

By embedding in the ImportedArrowArray, it's allocated as part of that object and will have a lifetime equal to it. This is exactly what we want, because it's the ImportedArrowArray which already manages the lifetime of the imported memory.

@pitrou
Copy link
Member

pitrou commented Jun 14, 2023

As far as I can understand, the code looks good now (but I'm not a C# developer).

Copy link
Contributor

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'll merge this soon, if I don't hear any objections.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Jun 14, 2023
@eerhardt eerhardt merged commit 1e2dfce into apache:main Jun 14, 2023
16 of 17 checks passed
@CurtHagenlocher CurtHagenlocher deleted the FixCSharpCApiLeaks branch June 15, 2023 14:19
@conbench-apache-arrow
Copy link

Conbench analyzed the 5 benchmark runs on commit 1e2dfceb.

There was 1 benchmark result indicating a performance regression:

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C#] The C data interface implementation can leak on import
5 participants