Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the best way to measure the size of arrow.Record in Golang? #38836

Closed
Yifeng-Sigma opened this issue Nov 21, 2023 · 2 comments · Fixed by #38839
Closed

What's the best way to measure the size of arrow.Record in Golang? #38836

Yifeng-Sigma opened this issue Nov 21, 2023 · 2 comments · Fixed by #38839
Assignees
Labels
Component: Go Type: usage Issue is a user question
Milestone

Comments

@Yifeng-Sigma
Copy link
Contributor

Yifeng-Sigma commented Nov 21, 2023

Describe the usage question you have. Please include as many useful details as possible.

I want to do some record splitting/merging based on the size, but didn't find a reliable way to estimate the size.
There are two ways:

	for _, col := range rec.Columns() {
		for _, buffer := range col.Data().Buffers() {
			if buffer != nil {
				bytes += uint64(buffer.Cap())
			}
		}
	
}

or

func computeColumnSize(col arrow.Array) uint64 {
	switch colType := col.DataType().(type) {
	case arrow.BinaryDataType:
		switch arr := col.(type) {
		case *array.String:
			return uint64(arr.ValueOffset(arr.Len()) - arr.ValueOffset(0))
                      // ...
		}
	}
}

I'm wondering what's the recommended way to compute the size of arrow.Record.

Component(s)

Go

@Yifeng-Sigma Yifeng-Sigma added the Type: usage Issue is a user question label Nov 21, 2023
@zeroshade
Copy link
Member

My recommendation would be a variation on the first one:

func calcSize(arr arrow.ArrayData) (sz uint64) {
    if arr == nil {
        return
    }

    for _, b := range arr.Buffers() {
        sz += uint64(b.Len())
    }
    for _, c := range arr.Children() {
        sz += calcSize(c)
    }
    sz += calcSize(arr.Dictionary())
    return
}

That would be my recommendation, it might be reasonable to add this as a utility into the arrow library directly via a PR

@Yifeng-Sigma
Copy link
Contributor Author

My recommendation would be a variation on the first one:

func calcSize(arr arrow.ArrayData) (sz uint64) {
    if arr == nil {
        return
    }

    for _, b := range arr.Buffers() {
        sz += uint64(b.Len())
    }
    for _, c := range arr.Children() {
        sz += calcSize(c)
    }
    sz += calcSize(arr.Dictionary())
    return
}

That would be my recommendation, it might be reasonable to add this as a utility into the arrow library directly via a PR

Thanks, opened #38839

zeroshade added a commit that referenced this issue Nov 28, 2023
### Rationale for this change

Address #38836

### What changes are included in this PR?

Add a new function SizeInBytes() to calculate the size of ArrayData.

### Are these changes tested?

### Are there any user-facing changes?

No

* Closes: #38836

Lead-authored-by: Yifeng Wu <yifeng@sigmacomputing.com>
Co-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: Yifeng-Sigma <yifeng@sigmacomputing.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
@zeroshade zeroshade added this to the 15.0.0 milestone Nov 28, 2023
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
### Rationale for this change

Address apache#38836

### What changes are included in this PR?

Add a new function SizeInBytes() to calculate the size of ArrayData.

### Are these changes tested?

### Are there any user-facing changes?

No

* Closes: apache#38836

Lead-authored-by: Yifeng Wu <yifeng@sigmacomputing.com>
Co-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: Yifeng-Sigma <yifeng@sigmacomputing.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Go Type: usage Issue is a user question
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants