Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect memory usage calculation for dictionary arrays #503

Closed
jhorstmann opened this issue Jun 28, 2021 · 3 comments · Fixed by #505
Closed

Incorrect memory usage calculation for dictionary arrays #503

jhorstmann opened this issue Jun 28, 2021 · 3 comments · Fixed by #505
Labels
arrow Changes to the arrow crate bug

Comments

@jhorstmann
Copy link
Contributor

Describe the bug
The memory usage calculation for dictionary arrays seems to report twice the actual used memory.

fn get_array_memory_size(&self) -> usize {
    self.data.get_array_memory_size()
        + self.keys.get_array_memory_size()
        + self.values.get_array_memory_size()
        + mem::size_of_val(self)
}

In the above code, self.keys and self.values are pointing to the same data that is in self.data and so would be counted twice.

Additional context

The memory usage calculation could actually work the same for all array types and could be implemented as a default method on the Array trait.

@jhorstmann jhorstmann added the bug label Jun 28, 2021
@jhorstmann
Copy link
Contributor Author

Implementations in FixedSizeListArray and UnionArray also seem to be counting child arrays twice.

@e-dard
Copy link
Contributor

e-dard commented Jun 28, 2021

@jhorstmann 👋 I recently noticed this discrepency whislt doing some testing for a project that uses Arrow dictionaries. Did you want to take this ticket otherwise I'm happy to fix it.

@jhorstmann
Copy link
Contributor Author

@e-dard I'm just about to open a PR :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants