Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specific comparators for strings #81

Closed
Morwenn opened this issue Jul 25, 2016 · 1 comment
Closed

Specific comparators for strings #81

Morwenn opened this issue Jul 25, 2016 · 1 comment

Comments

@Morwenn
Copy link
Owner

Morwenn commented Jul 25, 2016

Comparing strings by their ASCII values is often enough for programmer's purposes, but other kinds of string sorting are often needed for user interaction:

  • Natural sort for numbers, e.g. "I have 99 problems" should be "smaller" than "I have 100 problems".
  • Case-insensitive sort, e.g. "Whatever" and "whatever" should compare equal. This comparator should convert the letters at every comparison since the alternative using more memory can simply be realized with schwartz_adapter and a simple upper or lower projection.

Both might be needed at once, so a general string_compare could be provided, with a bitfield-like structure describing which properties should be taken into account. Not sure whether it's better to pass the value as a template parameter at compile-time, or as a parameter to the constructor at runtime. If it is passed at compile-time, it allows to have dedicated case_insensitive_compare and natural_compare specialization type aliases implementing more efficient algorithms. There might be way to force string_compare to accept runtime parameters when no compile-time parameter is explicitly given.

Concerning the names, natural_less may be better than natural_compare; morevoer, it might allow for a corresponding natural_greater if needed. Making them customization points might let the door open for other people to reimplement the functions for custom string types.

This comparator would only handle ASCII, not unicode since handling unicode would require more code than the library has in its current state.


This could be the birth of a new cpp-sort/comparators subdirectory and the linked cpp-sort/comparators.h general include file. It could be interesting to expose some of the comparators in detail such as projection_compare, since that specific one could potentially be used at more places in the library and thus might need some documentation (see issue #53). The *_less and *_greater comparators described in #18 can also be included in this module.

Morwenn added a commit that referenced this issue Aug 6, 2016
The default version works with any forward sequence of char, including
std::string, std::vector<char> or char[]. It might work with other types
of characters than char, but it isn't guaranteed since it uses
std::isdigit.
Morwenn added a commit that referenced this issue Aug 14, 2016
@Morwenn
Copy link
Owner Author

Morwenn commented Aug 14, 2016

The comparators aren't perfect, but the remaining problems are hardly addressable or are the topic of other issues:

  • natural_less only recognizes ASCII digits, but making it work with other kinds of digits is probably not doable with the standard library unless the numeration system is positional and the digits are one-character long and d1 < d2 also means that parse(d1) < parse(d2). I don't know enough about character encondings and numeration systems to be sure that it's worse the cost.
  • case_insensitive_less isn't performant because of the repeated calls to std::use_facet, but this can be addressed by issue Refining comparators and projections with type #83. Additional performance might be obtained for char locales thanks to a technique described in How to do case-insensitive string comparison by Matt Austern.

I decided to separate the comparators rather than combining them for the sake of simplicity. A case-insensitive natural sort could surely be performed by a mix of comparisons and projections if needed:

sort(sequence, natural_less, tolower);

@Morwenn Morwenn closed this as completed Aug 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant