Skip to content

Optimize GroupBy.Sum and GroupBy.Mean for large DataFrames #7554

@dwalterskoetter

Description

@dwalterskoetter

I've analyzed the GroupBy.cs implementation and identified opportunities for performance improvements, including:

  • Implement single-pass iteration over the group rows.
  • Introduce typed accumulators for double and long to avoid boxing and improve numeric performance.
  • Pre-allocate result columns (PrimitiveDataFrameColumn) to reduce repeated resizing.
  • Optimize delegates used during iteration for better performance.

I noticed issue #6824 tracks DataFrame performance improvements. Would performance enhancements for GroupBy operations be welcome as part of this effort?

I can provide benchmarks and a PR if there's interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions