Skip to content

New Sorting Algorithm #173

@ashvardanian

Description

@ashvardanian

Describe what you are looking for

For lexicographic sorting of strings, StringZilla uses a "hybrid-hybrid" approach with $O(n * log(n))$ and.

  1. Radix sort for first bytes exported into a continuous buffer for locality.
  2. IntroSort on partially ordered chunks to balance efficiency and worst-case performance.
    1. IntroSort begins with a QuickSort.
    2. If the recursion depth exceeds a certain threshold, it switches to a HeapSort.

That approach has several issues.

  • The current solution can't work for arrays over 4 billion entries.
  • It's also potentially illformed on 32-bit systems.
  • It doesn't use SIMD anywhere, not even for the Radix step.
  • It loses to hybrid algorithm prototype built with std::sort by a factor of 3!

So there is a lot we can improve, and there several links to keep in mind for anyone considering collaborating here.

  1. Sorting networks are marvelous structures and there is a collection of fast circuits by @bertdobbelaere we can implement with SIMD.
  2. Similar to TimSort, once sorted chunks are produced, one can merge them more efficiently using SIMD techniques implemented in SimSIMD for set intersections in v5.3 for SVE2, AVX-512, and AVX2.

This is not an inclusive list, and I'm open to other algorithmic ideas.

Can you contribute to the implementation?

  • I can contribute

Is your feature request specific to a certain interface?

It applies to everything

Contact Details

No response

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    coreWork on the algorithm design and implementationenhancementNew feature or requestgood first issueGood for newcomershelp wantedExtra attention is neededhugeLarge task potentially involving architectural or breaking changesperformancePerformance related discussion or suggestion

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions