Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go] arrow.ipc.writer.compressBodyBuffers leaks memory during compression phase #14883

Closed
lquerel opened this issue Dec 8, 2022 · 16 comments · Fixed by #14892 or #14904
Closed

[Go] arrow.ipc.writer.compressBodyBuffers leaks memory during compression phase #14883

lquerel opened this issue Dec 8, 2022 · 16 comments · Fixed by #14892 or #14904
Assignees
Milestone

Comments

@lquerel
Copy link
Contributor

lquerel commented Dec 8, 2022

Describe the bug, including details regarding any error messages, version, and platform.

Platform: MacOS
Arrow version: Go v10.0.1
Error msg (sample):
checked_allocator.go:127: LEAK of 64 bytes FROM github.com/apache/arrow/go/v10/arrow/ipc.(*recordEncoder).visit line 497
checked_allocator.go:127: LEAK of 64 bytes FROM github.com/apache/arrow/go/v10/arrow/array.(*Int32Builder).newData line 792

All my unit tests work with the CheckedAllocator (the one used in the Arrow Go library) but when I added the zstd compression option to the IPC writer my tests turned red.

The issue is located in the file arrow.ipc.writer line 345 (v10.0.1).

	compress := func(idx int, codec compressor) error {
		if p.body[idx] == nil || p.body[idx].Len() == 0 {
			return nil
		}
		var buf bytes.Buffer
		buf.Grow(codec.MaxCompressedLen(p.body[idx].Len()) + arrow.Int64SizeBytes)
		if err := binary.Write(&buf, binary.LittleEndian, uint64(p.body[idx].Len())); err != nil {
			return err
		}
		codec.Reset(&buf)
		if _, err := codec.Write(p.body[idx].Bytes()); err != nil {
			return err
		}
		if err := codec.Close(); err != nil {
			return err
		}
		p.body[idx] = memory.NewBufferBytes(buf.Bytes()).   <-- line 345
		return nil
	}

The previous buffer is replaced by a new one but without:

  • releasing the previous one,
  • and specifying the current allocator to the new buffer.

Component(s)

Go

@zeroshade
Copy link
Member

looking into this now

@zeroshade
Copy link
Member

@lquerel Please try the branch on the referenced PR and confirm that it fixes your usage. I added a unit test in the PR to test for the memory leak and was able to replicate the CheckedAllocator finding the leak before I fixed it, but I want to be sure that it is accurate to your usage.

@lquerel
Copy link
Contributor Author

lquerel commented Dec 8, 2022

@zeroshade yes I will check that but the fact that the version number is in the import path doesn't simplify this kind of test. I'm back to the version 10.0.1 and your branch is v11 so a single replace statement in go.mod will not be enough. I need to replace all the import statement in the code. If you have a better option, please let me know.
I should be able to give an answer latter today.

@zeroshade
Copy link
Member

If entirely necessary, we could request on the mailing list to backport this to a 10.0.2 patch release. Or, depending on your timeframe, v11 would be being released around mid-january and you could upgrade to that then. Either of those could work

@lquerel
Copy link
Contributor Author

lquerel commented Dec 8, 2022

The sooner the better, so I will vote for a v10.0.2.

@lquerel
Copy link
Contributor Author

lquerel commented Dec 9, 2022

@zeroshade I tested your commit with my current code which doesn't contain the zstd compression configuration. This code works perfectly with v10.0.1 but doesn't work with your commit or even with v11 head. So, I don't think it's related to your fix.

I got the following error: arrow/ipc: unknown error while writing: runtime error: index out of range [1] with length 1

The error is issued from array.list line 157, higher is the stack the program was in ipc.writer line 674, i.e. in the logic to serialize an arrow map. This is probably related to this recent modification in #14780

- start, end = int64(a.offsets[i]), int64(a.offsets[i+1])
+ start, end = int64(a.offsets[i+a.data.offset]), int64(a.offsets[i+a.data.offset+1])

When the code panic:

  • len(a.offsets) = 1
  • i = 0
  • a.data.offset = 0

==> a.offsets[i+a.data.offset+1] is out of bounds.

I need your help to debug this one :)

@zeroshade
Copy link
Member

zeroshade commented Dec 9, 2022

Could you share the code you're using to hit that panic? Or potentially boil it down to a small test case I could debug?

EDIT: Looking at what you've provided, I have another question: was the length of the actual Map 0? Because len(a.offsets) should always equal a.data.length + 1

@lquerel
Copy link
Contributor Author

lquerel commented Dec 9, 2022

@zeroshade You can reproduce this bug with this commit f5/otel-arrow-adapter@cdcd886

The test is located in pkg\otel\arrow_record\producer_consumer_test.go func TestProducerConsumerMetrics

EDIT: I will check the map size in parallel.

@zeroshade
Copy link
Member

@lquerel I believe I figured out the cause of the issue. Gonna see if I can try to run your test with my fix to confirm.

@lquerel
Copy link
Contributor Author

lquerel commented Dec 9, 2022

@zeroshade you can uncomment the replace statement in my go.mod to point directly to you version of Arrow.

@lquerel
Copy link
Contributor Author

lquerel commented Dec 9, 2022

@zeroshade to answer your question:

  • len(a.offsets) = 1
  • len(a.data.length) = 0

@zeroshade
Copy link
Member

@lquerel Okay, two things:

  1. I have a fix for the issue with encoding an empty map, I've opened the PR GH-14883: [Go] Fix IPC encoding empty maps #14904 to fix it.
  2. You have an issue where you're doing extra releases on your builders in that branch. You create a record builder and then you grab the child builders and place them into various objects and then you call release which recursively calls release on your child objects which all release their builders. The issue is that calling release on the record builder will already call release on all of it's child builders and you never call retain on those child builders when you hand them off to your child objects. It's not a big problem, per se, but you can see it if you run your tests with -tags assert which turns on the debug asserts in the Arrow module and you'll find the "too many releases" errors. Just figured i'd point that out to you.

@lquerel
Copy link
Contributor Author

lquerel commented Dec 9, 2022

@zeroshade Great

Regarding the extra release, I will fix that. Thanks.

Will it be possible to have this 2 PRs in a v10.0.2?

@zeroshade
Copy link
Member

zeroshade commented Dec 9, 2022

I'll merge the two PRs and can you please send an email to the arrow dev mailing list requesting a v10.0.2 release incorporating these two PRs? Discussion about doing so will have to occur on the mailing list.

EDIT: I also can confirm that when I combine these two PRs in a local branch and then run that TestProducerConsumerMetrics test, it is successful. 😄

zeroshade added a commit that referenced this issue Dec 9, 2022
* Closes: #14883

Authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
@zeroshade zeroshade added this to the 11.0.0 milestone Dec 9, 2022
zeroshade added a commit that referenced this issue Dec 9, 2022
* Closes: #14883

Authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
@lquerel
Copy link
Contributor Author

lquerel commented Dec 9, 2022

@zeroshade Sure I will send an email to request a v10.0.2.

@lquerel
Copy link
Contributor Author

lquerel commented Dec 9, 2022

@zeroshade email sent.

zeroshade added a commit to zeroshade/arrow that referenced this issue Dec 15, 2022
* Closes: apache#14883

Authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
(cherry picked from commit cd238d7)
zeroshade added a commit to zeroshade/arrow that referenced this issue Dec 15, 2022
…pache#14892)

* Closes: apache#14883

Authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
(cherry picked from commit 3d9f60b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment