Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort with collator #3115

Merged
merged 41 commits into from Aug 26, 2020
Merged

Sort with collator #3115

merged 41 commits into from Aug 26, 2020

Conversation

moisesbr-dw
Copy link
Contributor

@moisesbr-dw moisesbr-dw commented Jun 1, 2020

When intl extension is available, all alphabetical sorts are done using a collator. Otherwise, primitive PHP functions are called.

The collator is created using the locale given in $conf['lang']. It always uses case insensitive "natural" ordering in its collation. The fallback solution uses the primitive PHP functions that return almost the same results when the input is text with only [A-Za-z0-9] characters.

All those screens will be properly sorted when intl extension is configured in php.ini:

  • Sitemap (Index)
  • Search
    • Matching pagenames
    • Fulltext results [added]
    • namespace selector below search field
  • Quick search (when typing)
  • Backlinks
  • All media lists (including fullscreen)
    • Media backlinks (View tab in the 3rd column of Media Manager)
  • Internal link suggestion when editing a page
  • Administration
  • Access Control List Management
    • tree list of namespaces and pages
    • drop-down list with users and groups
    • rule list by page/namespace (main sort)
    • rule list by user/group (secondary sort) [added]
  • Authorization plugins
    • group sort in authplain [added]
  • List of all pages in Remote API [added]
  • OpenSearch suggestions
  • Command line tool wantedpages.php

Future maintenance (e.g. plugins) will be very easy. Alphabetical comparisons and sorts should be changed the following way:

  • from strcmp(), strcasecmp(), strnatcmp() and strnatcasecmp() to Sort::strcmp()
  • from sort() to Sort::sort()
  • from ksort() to Sort::ksort()
  • from asort(), natsort() and natcasesort() to Sort::asort() or Sort::asortFN()

Closes #2037 (collation part).

New file "sort.php" for sort functions.
Inclusion of "sort.php" in the bootstrap ("load.php").
Reimplementation of natsort() using collator.
Fix of search() function in "search.php".

Functions that call search() and now have the expected results:
Ajax.php:188, callMedians(): Return subnamespaces for the Mediamanager
Ajax.php:322, callIndex(): Return sub index for index view
Ajax.php:392, callLinkwiz(): List matching namespaces and pages for the link wizard
html.php:908, html_index(): Display page index
media.php:713, media_filelist(): List all files in a given Media namespace
media.php:1528, media_searchlist(): List all files found by the search request
media.php:1966, media_nstree(): Build a tree outline of available media namespaces
Remote\ApiCore.php:349, readNamespace(): List all pages in the given namespace (and below)
Remote\ApiCore.php:432, listAttachments(): List all media files

Screens fixed:
- Index
- All media lists (including fullscreen)
- Internal link suggestion when editing a page
Reimplementation of strcmp() using collator.
Fix of ft_pagesorter() function in "fulltext.php".
ft_pagesorter() is used by _ft_pageLookup(), which is used by ft_pageLookup().

Functions that call ft_pageLookup() and now have the expected results:
Action\Search.php:69, execute(): run the search
Ajax.php:50, callQsearch(): Searches for matching pagenames
Ajax.php:352, callLinkwiz(): List matching namespaces and pages for the link wizard

Screens fixed:
- Search (list of Matching pagenames)
- Quick search (when typing)
- Internal link suggestion when editing a page (when nothing was typed)
Reimplementation of sort() using collator.
Fix of ft_backlinks() and ft_mediause() function in "fulltext.php".

Functions that call ft_backlinks() and now have the expected results:
html.php:1085, html_backlinks(): display backlinks
Remote\ApiCore.php:456, listBackLinks(): Return a list of backlinks

Functions that call ft_mediause() and now have the expected results:
media.php:222, media_inuse(): Convenience function to check if a media file is still in use
media.php:1199, media_details(): Prints mediafile tags

Screens fixed:
- Backlinks
- List of pages that references a media file (View tab in the 3rd column of Media Manager)
Fix of callSuggestions() function in "Ajax.php".

Screens fixed:
- OpenSearch suggestions: /lib/exe/ajax.php?call=suggestions&q={query}
Reimplementation of ksort() using collator.
Fix of getAdditionalNamespacesFromResults() function in "Ui/Search.php".

Screens fixed:
- Search (namespace selector below search field)
Reimplementation of strcasecmp() using collator.
It uses the same implementation already made for strcmp().
Fix of menuSort() function in "Ui/Admin.php".

Screens fixed:
- Administration
Fix of sort_search_fulltext() function in "search.php".
This function does not seem to be used anywhere.
Name changed from natural_sort() to sort_filenames().
Fixed behavior without collator: page name sorting reflected in filename array.
Fix of search() function in "search.php".

Functions that call search() and now have the expected results:
Ajax.php:188, callMedians(): Return subnamespaces for the Mediamanager
Ajax.php:322, callIndex(): Return sub index for index view
Ajax.php:392, callLinkwiz(): List matching namespaces and pages for the link wizard
html.php:908, html_index(): Display page index
media.php:713, media_filelist(): List all files in a given Media namespace
media.php:1528, media_searchlist(): List all files found by the search request
media.php:1966, media_nstree(): Build a tree outline of available media namespaces
Remote\ApiCore.php:349, readNamespace(): List all pages in the given namespace (and below)
Remote\ApiCore.php:432, listAttachments(): List all media files

Screens fixed:
- Index
- All media lists (including fullscreen)
- Internal link suggestion when editing a page
Name changed from strcompare() to compare().

Fix of ft_pagesorter() function in "fulltext.php".
ft_pagesorter() is used by _ft_pageLookup(), which is used by ft_pageLookup().

Functions that call ft_pageLookup() and now have the expected results:
Action\Search.php:69, execute(): run the search
Ajax.php:50, callQsearch(): Searches for matching pagenames
Ajax.php:352, callLinkwiz(): List matching namespaces and pages for the link wizard

Screens fixed:
- Search (list of matching pagenames)
- Quick search (when typing)
- Internal link suggestion when editing a page (when nothing was typed)

Fix of menuSort() function in "Ui/Admin.php".

Screens fixed:
- Administration

Fix of sort_search_fulltext() function in "search.php".
This function does not seem to be used anywhere.
Name changed from sort_pages() to sort_pagenames().
Fixed behavior without collator: sort() with flags SORT_NATURAL and SORT_FLAG_CASE.

Fix of ft_backlinks() and ft_mediause() function in "fulltext.php".

Functions that call ft_backlinks() and now have the expected results:
html.php:1085, html_backlinks(): display backlinks
Remote\ApiCore.php:456, listBackLinks(): Return a list of backlinks

Functions that call ft_mediause() and now have the expected results:
media.php:222, media_inuse(): Convenience function to check if a media file is still in use
media.php:1199, media_details(): Prints mediafile tags

Screens fixed:
- Backlinks
- List of pages that references a media file (View tab in the 3rd column of Media Manager)

Fix of callSuggestions() function in "Ajax.php".

Screens fixed:
- OpenSearch suggestions: /lib/exe/ajax.php?call=suggestions&q={query}
Fixed behavior without collator: ksort() with flags SORT_NATURAL and SORT_FLAG_CASE.
Fix of getAdditionalNamespacesFromResults() function in "Ui/Search.php".

Screens fixed:
- Search (namespace selector below search field)
Fix of functions _sort_filenames_without_collator() and compare().
Updated documentation.
Better naming of functions to help future maintenance.
- compare() renamed to intl_strcmp()
- sort_pagenames() renamed to intl_sort()
- sort_keys() renamed to intl_ksort()
- new function intl_asort()
- sort_filenames() renamed to intl_asortFN()
Updated documentation.
All sortable lists:
- tree list of namespaces and pages
- drop-down list with users and groups
- rule list by page/namespace (main sort)
- [enhancement] rule list by user/group (secondary sort)
[enhancement] group sorting in authplain
As the $data sort order has been fixed in search() (see "search.php"), utf8_encodeFN() must not be used in the comparison.
getPagelist(), getAttachments() and getBackLinks() provided sorted results, but listPages() did not.
[bug fix] Added missing 'doc' keys in lines 145 and 149.
_ft_pageLookup() provided sorted results, but _ft_pageSearch() did not.
Copy link
Collaborator

@splitbrain splitbrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good except for my one comment about wrapping it in a class. @moisesbr-dw can you change that? Or shall I do it?

inc/sort.php Outdated Show resolved Hide resolved
@splitbrain
Copy link
Collaborator

Can we change the tests in a way that they use a dataProvider or load different test data from files, so we can setup test in various languages?

@moisesbr-dw
Copy link
Contributor Author

This looks good except for my one comment about wrapping it in a class. @moisesbr-dw can you change that? Or shall I do it?

Sorry @splitbrain , I'll be very busy in the next days.
I'm glad you've found the changes worth merging!

@moisesbr-dw
Copy link
Contributor Author

Can we change the tests in a way that they use a dataProvider or load different test data from files, so we can setup test in various languages?

Only function test_intl_strcmp() uses a dataProvider.
This could certainly be extended to the other tests.
Unfortunately I won't be able to change the tests as soon as I would want.

The proper use of data providers now make it much easier to add
addtional languages to the test
Now the use of the intl extension can be turned off, allowing for easy
testing of the fallback. The test now inherits from the collator test so
we avoid too much duplicate code
Changes for the collator-based sort: sort functions as static methods; better testing
Many minor details and use of Sort::xyz() instead of intl_xyz() in files outside the "inc" folder.
@moisesbr-dw
Copy link
Contributor Author

@splitbrain All tests are passing. Just to be sure, I'll pass through all screens listed in the first message before considering the PR done.

Collations for Portuguese and Spanish; examples for Portuguese; better comments
Correction for German collation; examples for German and Spanish; much better comments
@moisesbr-dw
Copy link
Contributor Author

Now I consider the test classes fully functional, testing everything in the Esperanto, German, Portuguese and Spanish collators.
I've written extensive comments to guide anyone interested in testing a collator for another language.

I still have to verify the screens as I've mentioned before.
I hope I'll manage to do that on the weekend.

@splitbrain could you please check the German data in the tests (last commit)?

@lmachucab could you please do the same with the Spanish data?

@moisesbr-dw
Copy link
Contributor Author

Screens tested.
From my part, all set and done, @splitbrain !

@splitbrain splitbrain merged commit d267a3c into dokuwiki:master Aug 26, 2020
@splitbrain
Copy link
Collaborator

Merged. Excellent work here! Thanks a lot!

@Klap-in
Copy link
Collaborator

Klap-in commented Aug 26, 2020

Thanks for this work!
Could you update the function names in the start post please? Then I could use that for documentation updates at other places.

@moisesbr-dw
Copy link
Contributor Author

Thanks for this work!
Could you update the function names in the start post please? Then I could use that for documentation updates at other places.

Done!

@moisesbr-dw
Copy link
Contributor Author

Maybe a devel:sort page could be inserted somewhere in the Development Manual.

@moisesbr-dw
Copy link
Contributor Author

@Klap-in please take a look at the files attached in this comment, as they provide valuable information.

@lmachucab
Copy link

I'll try to get to this when I have some extra time. Haven't really touched anything DW-related in a few weeks.

1 similar comment
@lmachucab
Copy link

I'll try to get to this when I have some extra time. Haven't really touched anything DW-related in a few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Search/Collation limitations when using non-ASCII scripts, or extended Latin
4 participants