Skip to content

feat(dataframe): expose sort and repartition #42

@andygrove

Description

@andygrove

Is your feature request related to a problem or challenge?

Two ordering / layout primitives are missing from the DataFrame API:

  • sort — no way to order a DataFrame today without dropping to SQL.
  • repartition — no way to control parallelism / partitioning of a
    DataFrame.

Describe the solution you'd like

sort. Two ergonomics options worth considering:

  1. SQL-string flavour matching filter(String) / proposed
    withColumn: df.sort("a ASC, b DESC NULLS FIRST"). Parsed via
    parse_sql_expr plus an ORDER BY shim, or via the SQL parser's
    parse_order_by. Cheapest to implement; no Java-side model.
  2. Typed: a SortExpr Java record (column, ascending, nullsFirst) and
    df.sort(SortExpr... exprs). Discoverable, IDE-friendly.

Suggest starting with (1) for consistency with filter, then layering
(2) on top if/when an Expr builder lands for joins.

repartition. DataFusion's Partitioning enum has three variants:

  • RoundRobinBatch(usize)df.repartitionRoundRobin(n)
  • Hash(Vec<Expr>, usize)df.repartitionHash(int n, String... columns) (column-name flavour to start; expression variant later)
  • UnknownPartitioning(usize) — not user-facing.

Tests in DataFrameTransformationsTest (sort round-trip, partition
count assertion via collect_partitioned once that's wired, or via
plan inspection).

Describe alternatives you've considered

ORDER BY / DISTRIBUTE BY via SQL. Works but loses the lazy
DataFrame composition.

Additional context

Each carries a small Java-side design choice (sort-expression shape,
partitioning constructor shape); fine to land them as two separate PRs
under this issue if that's cleaner than one batched PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions