I've analyzed the GroupBy.cs implementation and identified opportunities for performance improvements, including:
- Implement single-pass iteration over the group rows.
- Introduce typed accumulators for double and long to avoid boxing and improve numeric performance.
- Pre-allocate result columns (PrimitiveDataFrameColumn) to reduce repeated resizing.
- Optimize delegates used during iteration for better performance.
I noticed issue #6824 tracks DataFrame performance improvements. Would performance enhancements for GroupBy operations be welcome as part of this effort?
I can provide benchmarks and a PR if there's interest.