Skip to content

Optimize ArrowBytesViewMap #19961

@Dandandan

Description

@Dandandan

Is your feature request related to a problem or challenge?

There seem some opportunities for optimizing ArrowBytesViewMap using some more cleverness.

For e.g. ClickBench query 5, >50% CPU is spent during intern:

Image

A lot of it relates to getting / comparing the bytes from the buffers, etc (append_value, get_value, memcmp, makeview, etc).

Image

Describe the solution you'd like

We should be able to avoid (re)creating views every time and comparing against slices, by storing/comparing the views directly, and avoiding the overhead of the GenericByteViewBuilder methods.

To do so, I think we need:

  • Not use values.iter() but use the view buffer and get buffer index
  • Compare against the original view (and buffer in the index if needed)
  • Update the new view with the new index (don't create it again).

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestperformanceMake DataFusion faster

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions