Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C#][Integration] C# C Data interface does not release ArrowArray imports where a non-validity buffer is NULL #40898

Closed
paleolimbot opened this issue Mar 29, 2024 · 6 comments

Comments

@paleolimbot
Copy link
Member

paleolimbot commented Mar 29, 2024

Describe the bug, including details regarding any error messages, version, and platform.

In #39302, the integration tests where nanoarrow is producing an C# is consuming are reporting memory leaks. These leaks are based on allocations/frees reported by nanoarrow's special buffer allocator, which I instrumented to do some printing of the results. Those adventures are in a few gists ( https://gist.github.com/paleolimbot/257be29c926c3467b174fd83c3baf2b1 , https://gist.github.com/paleolimbot/8183f50a1c1425db4a744903ba3f905b ) as well as later in this thread, and the output that led me to this conclusion is below: basically, I instrumented the special allocator to print to stdout and did some scraping of the output. I noticed that no buffers are freed when csharp is responsible for releasing, but all the buffers are freed when nanoarrow is (and presumably other implementations too, since no memory leaks are reported for those).

This only happens for custom_metadata, extension, primitive_zerolength, recursive_nested, and union cases, and the patterns seems to be that something happens and then the release callback is just never called. This only happens when nanoarrow is exporting.

#>    exporting importing                 file batch within_batch_id         op
#> 1  nanoarrow        C# primitive_zerolength     0               1 reallocate
#> 2  nanoarrow        C# primitive_zerolength     0               2 reallocate
#> 3  nanoarrow        C# primitive_zerolength     0               3 reallocate
#> 4  nanoarrow        C# primitive_zerolength     0               4 reallocate
#> 5  nanoarrow nanoarrow primitive_zerolength     0               1 reallocate
#> 6  nanoarrow nanoarrow primitive_zerolength     0               2 reallocate
#> 7  nanoarrow nanoarrow primitive_zerolength     0               3 reallocate
#> 8  nanoarrow nanoarrow primitive_zerolength     0               4 reallocate
#> 9  nanoarrow nanoarrow primitive_zerolength     0               5 reallocate
#> 10 nanoarrow nanoarrow primitive_zerolength     0               6 reallocate
#> 11 nanoarrow nanoarrow primitive_zerolength     0               7 reallocate
#> 12 nanoarrow nanoarrow primitive_zerolength     0               8 reallocate
#> 13 nanoarrow nanoarrow primitive_zerolength     0              53       free
#> 14 nanoarrow nanoarrow primitive_zerolength     0              56       free
#> 15 nanoarrow nanoarrow primitive_zerolength     0              59       free
#> 16 nanoarrow nanoarrow primitive_zerolength     0              62       free
#> 17 nanoarrow nanoarrow primitive_zerolength     0             116       free
#> 18 nanoarrow nanoarrow primitive_zerolength     0             119       free
#> 19 nanoarrow nanoarrow primitive_zerolength     0             122       free
#> 20 nanoarrow nanoarrow primitive_zerolength     0             125       free

The commit that eliminated the leaks from nanoarrow's side was a bit of code that ensured that there were no NULL buffers that were exported (except validity buffers, which can be null).

Component(s)

C#, Integration

@CurtHagenlocher
Copy link
Contributor

It would not surprise me to find that there's logic like "if length > 0 then deallocate". I'm reasonably sure the C# implementation doesn't allocate any memory for a zero-length buffer and the interop code may have inherited this assumption. What's the easiest way to trigger the problem? Run the tests on the branch in your draft PR?

@paleolimbot
Copy link
Member Author

Got it! There are some more things I can check...I was just excited to get this far in my debugging 🙂

The reproducing right now is a little complicated since there are a few fixes in various PRs. From a checkout of https://github.com/paleolimbot/arrow/tree/archery-nanoarrow-integration :

git clone https://github.com/paleolimbot/arrow-nanoarrow.git nanoarrow
cd nanoarrow && git switch integration-fixes && cd ..

archery docker run \
  -e ARCHERY_INTEGRATION_WITH_NANOARROW=1 \
  -e ARROW_INTEGRATION_CPP=OFF \
  -e ARROW_INTEGRATION_JAVA=OFF \
  -e ARROW_INTEGRATION_JS=OFF \
  -e ARROW_INTEGRATION_GO=OFF \
  conda-integration

@paleolimbot
Copy link
Member Author

Hmm...upon inspecting the other files where there are leaks, I'm seeing that the pattern is more like: C# is failing to release anything once a very specific thing happens (but I am not sure what that very specific thing is and why it only happens when nanoarrow is exporting). I will try to set up a more minimal reproducer trying smaller parts of those files to see if I can narrow it down.

@paleolimbot paleolimbot changed the title [C#][Integration] C# C Data interface might not be releasing zero-length ArrowArray imports [C#][Integration] C# C Data interface might not be releasing some ArrowArray imports Mar 30, 2024
@paleolimbot
Copy link
Member Author

I managed to get a more minimal reproducer with a local build; however, I'm still working on how to step through that in a debugger to see where exactly things are going awry:

From the root arrow checkout of https://github.com/paleolimbot/arrow
plus git switch archery-nanoarrow-integration:

Make sure archery is installed:

pip install -e "dev/archery[all]"

From the csharp/ subdirectory run

dotnet build

From the arrow checkout, clone + build nanoarrow where the default Archery
options look for the integration test binary.

git clone https://github.com/paleolimbot/arrow-nanoarrow.git nanoarrow
cd nanoarrow && git switch integration-fixes
mkdir cdata && cd cdata && cmake .. -DNANOARROW_BUILD_INTEGRATION_TESTS=ON && cmake --build .
cd ../..

Run the integration test for exactly one file that is failing

python -m "archery.cli" integration \
    --with-nanoarrow=true \
    --with-csharp=true \
    --run-c-data \
    --match=primitive_zerolength

@paleolimbot
Copy link
Member Author

I am fairly sure that the problem is NULL buffers (not counting a missing validity buffer): after one is encountered, apparently csharp fails to release anything (?) (despite apparently computing equality without error). The change on the nanoarrow side that makes the integration tests pass is here: apache/arrow-nanoarrow@d455a1d

The bit of Python I ran to test that locally in details below.

from archery.integration import cdata
from archery.integration.tester_cpp import CppTester
from archery.integration.tester_csharp import CSharpTester
from archery.integration.tester_nanoarrow import NanoarrowTester

with open("primitive-empty.json", "w") as f:
    f.write("""{
  "schema": {
    "fields": [
      {"name": "binary_nullable", "type": {"name": "binary"}, "nullable": true, "children": []}
    ]
  },
  "batches": [
    {
      "count": 0,
      "columns": [
        {"name": "binary_nullable", "count": 0, "VALIDITY": [], "OFFSET": [0], "DATA": []}
      ]
    }
  ]
}""")

tester_na = NanoarrowTester()
tester_cs = CSharpTester()

exporter = tester_na.make_c_data_exporter()
importer = tester_cs.make_c_data_importer()

json_path = "primitive-empty.json"
ffi = cdata.ffi()

c_array_ptr = ffi.new("struct ArrowArray*")

with cdata.check_memory_released(exporter, importer):
    exporter.export_batch_from_json(json_path, 0, c_array_ptr)
    importer.import_batch_and_compare_to_json(json_path, 0, c_array_ptr)

@paleolimbot paleolimbot changed the title [C#][Integration] C# C Data interface might not be releasing some ArrowArray imports [C#][Integration] C# C Data interface does not release ArrowArray imports where a non-validity buffer is NULL Apr 3, 2024
CurtHagenlocher pushed a commit that referenced this issue Apr 7, 2024
…e Arrays (#41054)

### Rationale for this change

When implementing integration tests for nanoarrow, it was observed that C# never released arrays where `array->buffers[i]` was `NULL` (including any buffers of any recursive child arrays). This is allowed ( https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowArray.buffers ); however, every other implementation appears to allocate even for length zero buffers (including nanoarrow after apache/arrow-nanoarrow#399 ).

### What changes are included in this PR?

`AddMemory()` is replaced with `ArrowBuffer.Empty` if the length of the imported buffer would have been 0 bytes. For other buffers (or anywhere I saw dereferencing a buffer pointer), I added a `Debug.Assert` just to be sure.

### Are these changes tested?

I'm not sure what the best way to test them is! They won't be tested in the nanoarrow integration tests since at the point that they run, nanoarrow will no longer export arrays that would trigger this.

### Are there any user-facing changes?

No
* GitHub Issue: #40898

Authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
Signed-off-by: Curt Hagenlocher <curt@hagenlocher.org>
@CurtHagenlocher CurtHagenlocher added this to the 16.0.0 milestone Apr 7, 2024
@CurtHagenlocher
Copy link
Contributor

Issue resolved by pull request 41054
#41054

tolleybot pushed a commit to tmct/arrow that referenced this issue May 2, 2024
…terface Arrays (apache#41054)

### Rationale for this change

When implementing integration tests for nanoarrow, it was observed that C# never released arrays where `array->buffers[i]` was `NULL` (including any buffers of any recursive child arrays). This is allowed ( https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowArray.buffers ); however, every other implementation appears to allocate even for length zero buffers (including nanoarrow after apache/arrow-nanoarrow#399 ).

### What changes are included in this PR?

`AddMemory()` is replaced with `ArrowBuffer.Empty` if the length of the imported buffer would have been 0 bytes. For other buffers (or anywhere I saw dereferencing a buffer pointer), I added a `Debug.Assert` just to be sure.

### Are these changes tested?

I'm not sure what the best way to test them is! They won't be tested in the nanoarrow integration tests since at the point that they run, nanoarrow will no longer export arrays that would trigger this.

### Are there any user-facing changes?

No
* GitHub Issue: apache#40898

Authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
Signed-off-by: Curt Hagenlocher <curt@hagenlocher.org>
vibhatha pushed a commit to vibhatha/arrow that referenced this issue May 25, 2024
…terface Arrays (apache#41054)

### Rationale for this change

When implementing integration tests for nanoarrow, it was observed that C# never released arrays where `array->buffers[i]` was `NULL` (including any buffers of any recursive child arrays). This is allowed ( https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowArray.buffers ); however, every other implementation appears to allocate even for length zero buffers (including nanoarrow after apache/arrow-nanoarrow#399 ).

### What changes are included in this PR?

`AddMemory()` is replaced with `ArrowBuffer.Empty` if the length of the imported buffer would have been 0 bytes. For other buffers (or anywhere I saw dereferencing a buffer pointer), I added a `Debug.Assert` just to be sure.

### Are these changes tested?

I'm not sure what the best way to test them is! They won't be tested in the nanoarrow integration tests since at the point that they run, nanoarrow will no longer export arrays that would trigger this.

### Are there any user-facing changes?

No
* GitHub Issue: apache#40898

Authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
Signed-off-by: Curt Hagenlocher <curt@hagenlocher.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants