Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37906: [Integration][C#] Implement C Data Interface integration testing for C# #37904

Merged
merged 5 commits into from
Oct 2, 2023

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Sep 27, 2023

Rationale for this change

All Arrow implementations supporting the C Data Interface should also implement integration testing, to ensure proper interoperability.

What changes are included in this PR?

  • Implement C Data Interface integration testing for C#
  • Fix bugs in the C Data Interface implementation for C#

Are these changes tested?

Yes, by construction.

Are there any user-facing changes?

No.

@github-actions

This comment was marked as resolved.

@pitrou pitrou changed the title [Integration][C#] Add C Data Interface integration testing for C# GH-37906: [Integration][C#] Add C Data Interface integration testing for C# Sep 27, 2023
@pitrou pitrou changed the title GH-37906: [Integration][C#] Add C Data Interface integration testing for C# GH-37906: [Integration][C#] Implement C Data Interface integration testing for C# Sep 27, 2023
@pitrou pitrou marked this pull request as ready for review September 27, 2023 14:10
@pitrou
Copy link
Member Author

pitrou commented Sep 27, 2023

}

public class JsonSchema
{
public List<JsonField> Fields { get; set; }
public JsonMetadata Metadata { get; set; }

/// <summary>
Copy link
Member Author

@pitrou pitrou Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that all the code below is moved from IntegrationCommand.cs in order for it to be available to the integration testing harness.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 27, 2023
Comment on lines 73 to 76
GC.Collect();
// XXX this doesn't seem to give stable and reliable measurements
var gcInfo = GC.GetGCMemoryInfo();
return gcInfo.PromotedBytes;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to find a formula that produced reliable numbers. Inevitably, the numbers are fluctuating (going up and down). I don't know if the .Net GC is just giving us approximate numbers, or if some internal caches are interfering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that all the tests are using the TestMemoryAllocator (which uses the managed heap)? I suspect some arrays may be allocated using the NativeMemoryAllocator which does not used the managed heap.

Whether or not it's strictly required, I've developed a habit of something like

for (int i = 0; i < 3; i++) {
GC.Collect();
GC.WaitForPendingFinalizers();
}

and this seems to sometimes make a difference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't know what kind of effect pythonnet would have on this. There may be Python objects waiting for a Python GC that are keeping .NET objects alive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that all the tests are using the TestMemoryAllocator (which uses the managed heap)? I suspect some arrays may be allocated using the NativeMemoryAllocator which does not used the managed heap.

Hmm, I'm discovering the C# codebase, but this is what I get:

  1. the JSON reader in Apache.Arrow.IntegrationTest just uses the default memory allocator when calling ArrowBuffer.Builder.Build()
  2. the TestMemoryAllocator doesn't implement IOwnableAllocation, which is required by CArrowArrayExporter for all exported buffers

In any case, however, the problem I'm mentioning in the comment above (unstable measurements) can be observed even on the Schema tests, which shouldn't invoke an external allocator.

I also don't know what kind of effect pythonnet would have on this. There may be Python objects waiting for a Python GC that are keeping .NET objects alive.

That would seem surprising, but I can try a gc.collect() on the Python side as well...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, none of your suggestions works unfortunately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My knowledge of the GC is less like science and more like voodoo. I'm also not familiar with this API. That said, PromotedBytes seems like it's probably not the right choice here and I would expect HeapSizeBytes to be the value that's actually wanted.

EDIT: Or for the total managed heap, GC.GetTotalMemory()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but I also tried GC.GetTotalMemory(), HeapSizeBytes and also HeapSizeBytes - FragmentedBytes. None gave stable values.

@pitrou pitrou requested review from eerhardt and removed request for kou, raulcd and assignUser September 27, 2023 14:42
Copy link
Member

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to me.

But look at the test logs, I'm unsure of where we are setting these tests to skip: https://github.com/apache/arrow/actions/runs/6327155294/job/17182356143?pr=37904#step:8:14519

Some of them I would have thought would run.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Sep 27, 2023
@pitrou
Copy link
Member Author

pitrou commented Sep 27, 2023

But look at the test logs, I'm unsure of where we are setting these tests to skip: https://github.com/apache/arrow/actions/runs/6327155294/job/17182356143?pr=37904#step:8:14519

These are existing skips that also propagate to the C Data Interface testing:

file_objs = [
generate_primitive_case([], name='primitive_no_batches'),
generate_primitive_case([17, 20], name='primitive'),
generate_primitive_case([0, 0, 0], name='primitive_zerolength'),
generate_primitive_large_offsets_case([17, 20])
.skip_tester('C#')
.skip_tester('JS'),
generate_null_case([10, 0]),
generate_null_trivial_case([0, 0]),
generate_decimal128_case(),
generate_decimal256_case()
.skip_tester('JS'),
generate_datetime_case(),
generate_duration_case()
.skip_tester('C#')
.skip_tester('JS'), # TODO(ARROW-5239): Intervals + JS
generate_interval_case()
.skip_tester('C#')
.skip_tester('JS'), # TODO(ARROW-5239): Intervals + JS
generate_month_day_nano_interval_case()
.skip_tester('C#')
.skip_tester('JS'),
generate_map_case()
.skip_tester('C#'),
generate_non_canonical_map_case()
.skip_tester('C#')
.skip_tester('Java') # TODO(ARROW-8715)
# Canonical map names are restored on import, so the schemas are unequal
.skip_format(SKIP_C_SCHEMA, 'C++'),
generate_nested_case(),
generate_recursive_nested_case(),
generate_nested_large_offsets_case()
.skip_tester('C#')
.skip_tester('JS'),
generate_unions_case(),
generate_custom_metadata_case()
.skip_tester('C#'),
generate_duplicate_fieldnames_case()
.skip_tester('C#')
.skip_tester('JS'),
generate_dictionary_case()
.skip_tester('C#'),
generate_dictionary_unsigned_case()
.skip_tester('C#')
.skip_tester('Java'), # TODO(ARROW-9377)
generate_nested_dictionary_case()
.skip_tester('C#')
.skip_tester('Java'), # TODO(ARROW-7779)
generate_run_end_encoded_case()
.skip_tester('C#')
.skip_tester('Java')
.skip_tester('JS')
.skip_tester('Rust'),
generate_binary_view_case()
.skip_tester('C++')
.skip_tester('C#')
.skip_tester('Go')
.skip_tester('Java')
.skip_tester('JS')
.skip_tester('Rust'),
generate_extension_case()
.skip_tester('C#')
# TODO: ensure the extension is registered in the C++ entrypoint
.skip_format(SKIP_C_SCHEMA, 'C++')
.skip_format(SKIP_C_ARRAY, 'C++'),
]

@CurtHagenlocher
Copy link
Contributor

Neat! I'd actually started looking at implementing this yesterday and didn't see that there was a route through Python to test in this fashion so I'd started working on a separate project to use AOT compilation and generate a native DLL for this purpose.

@pitrou
Copy link
Member Author

pitrou commented Oct 2, 2023

@eerhardt Would you like to review this PR soon? Otherwise I'm inclined to merge as it has already been approved by @CurtHagenlocher and @wjones127 .

@pitrou pitrou merged commit 7667b81 into apache:main Oct 2, 2023
15 checks passed
@pitrou pitrou removed the awaiting merge Awaiting merge label Oct 2, 2023
@pitrou pitrou deleted the csharp-cdata-integration branch October 2, 2023 16:08
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 7667b81.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
…ion testing for C# (apache#37904)

### Rationale for this change

All Arrow implementations supporting the C Data Interface should also implement integration testing, to ensure proper interoperability.

### What changes are included in this PR?

* Implement C Data Interface integration testing for C#
* Fix bugs in the C Data Interface implementation for C#

### Are these changes tested?

Yes, by construction.

### Are there any user-facing changes?

No.
* Closes: apache#37906

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
…ion testing for C# (apache#37904)

### Rationale for this change

All Arrow implementations supporting the C Data Interface should also implement integration testing, to ensure proper interoperability.

### What changes are included in this PR?

* Implement C Data Interface integration testing for C#
* Fix bugs in the C Data Interface implementation for C#

### Are these changes tested?

Yes, by construction.

### Are there any user-facing changes?

No.
* Closes: apache#37906

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…ion testing for C# (apache#37904)

### Rationale for this change

All Arrow implementations supporting the C Data Interface should also implement integration testing, to ensure proper interoperability.

### What changes are included in this PR?

* Implement C Data Interface integration testing for C#
* Fix bugs in the C Data Interface implementation for C#

### Are these changes tested?

Yes, by construction.

### Are there any user-facing changes?

No.
* Closes: apache#37906

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…ion testing for C# (apache#37904)

### Rationale for this change

All Arrow implementations supporting the C Data Interface should also implement integration testing, to ensure proper interoperability.

### What changes are included in this PR?

* Implement C Data Interface integration testing for C#
* Fix bugs in the C Data Interface implementation for C#

### Are these changes tested?

Yes, by construction.

### Are there any user-facing changes?

No.
* Closes: apache#37906

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Integration][C#] Implement C Data Interface integration testing for C#
4 participants