Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add entity specific collation support (smw_sort), refs 2065 #2429

Merged
merged 1 commit into from
May 7, 2017

Conversation

mwjames
Copy link
Contributor

@mwjames mwjames commented Apr 22, 2017

This PR is made in reference to: #2065, #1386

This PR addresses or contains:

  • Prototype on how to extend SMW and support a flexible (based on $smwgEntityCollation) collation mapping for entities.
  • $smwgEntityCollation should correspond to the $wgCategoryCollation setting (also in regards to selected argument values), yet it is kept separate to have a better control over changes in regards to the collation, sorting, and display of values.
  • As mentioned in Local-specific (ICU) sorting and collation #2065, a new field smw_sort is required that can hold a collated sort value and no longer conflicts with the search and match smw_sortkey field.
  • The smw_sortkey becomes a field that stores a literal value for a search hence it will be renamed in a follow-up to reflect that intention.
  • This PR alters the DB schema and can therefore not be back-ported.
  • When changing the $smwgEntityCollation option it is expected that a user applies the same caution as for the standard MediaWiki system with either rebuildData.php or updateEntityCollation.php being run immediately post the change to ensure that data are updated and correspond to the setting.
  • $smwgEntityCollation uses identity as default setting and will produce the same sorting results as without this change

Example

Output example of the ID_TABLE with a $wgCategoryCollation = 'uppercase'; setting.

image

Update

In case the table is updated and data exists but without the smw_sort field being present, the update.php or setupStore.php will initiate a post processing where the content of smw_sortley is copied once to the smw_sort field during the setup. The message will be similar to:

Checking post creation activities ...
   Table smw_object_ids ...
   ... copying smw_sortkey to smw_sort ... done.
   ... done.

Depending on the size of the table, this process may take a moment. Any update on the collation/sort field hereafter has to be done executing the updateEntityCollation.php script.

The updateEntityCollation.php maintenance script will output something like:

$ php maintenance/updateEntityCollation.php

[ Notes ]

- $smwgEntityCollation              uca-default-u-kn
- $wgCategoryCollation              numeric

The setting of $smwgEntityCollation and $wgCategoryCollation are different
and may result in an inconsitent sorting display for entities.

[ Update ]

- Selecting rows ...                1 to 100659
- Updating the smw_sort field ...     0% (313/100659)

Technical notes

  • SMW_SPARQL_QF_COLLATION (see DefaultSettings.php) which allows to replicate collation specific information using swivt:sort
  • Collator is the interface to MediaWiki's Collation class
  • TableFieldUpdater is the sole responsible class to update the smw_sortkey and smw_sort field
  • TableIntegrityExaminer will copy the content of smw_sortkey once at the time smw_sort is added to the schema
  • TableBuilder::getProcessLog returns a list of activities occurred during the table update and is used in TableIntegrityExaminer
  • SortPropertyValueResourceBuilder now handles the export of _SKEY resources
  • updateEntityCollation.php script provides a tool for mass updates on the occasion that the $smwgEntityCollation setting is changed

This PR includes:

  • Tests (unit/integration)
  • CI build passed

@mwjames mwjames added the enhancement Alters an existing functionality or behaviour label Apr 22, 2017
@mwjames mwjames added this to the SMW 3.0.0 milestone Apr 22, 2017
@mwjames mwjames force-pushed the smw-sort-field branch 4 times, most recently from c77a3f2 to 6a7e1c4 Compare May 7, 2017 05:18
@mwjames mwjames changed the title [WIP] Add smw_sort field, refs 2065 Add smw_sort field, refs 2065 May 7, 2017
@mwjames mwjames force-pushed the smw-sort-field branch 2 times, most recently from 0912926 to 8361482 Compare May 7, 2017 15:08
@mwjames
Copy link
Contributor Author

mwjames commented May 7, 2017

@kghbln Some testing guidelines:

  • Run update.php or setupStore.php and you expect to see ... copying smw_sortkey to smw_sort ... done. only the first time the script is run
  • smw_sort field should be visible in the smw_object_ids table with the same content as smw_sortkey
  • Change $smwgEntityCollation to something like numeric or uca-default-u-kn [0]
  • After the setting has changed, you are required to run updateEntityCollation.php with the result that the content in smw_sort has changed accordingly
  • Sorting in [1] should be different from [2] (using the default identity setting)

Other specific testing scenarios are available in:

  • f-0304.json for identity collation
  • f-0305.json for uppercase collation
  • f-0306.json for numeric collation
  • uca-* isn't explicitly tested because results depend on the installed ICU version and that varies with the available PHP version

[0] https://www.mediawiki.org/wiki/Manual:$wgCategoryCollation
[1] https://sandbox.semantic-mediawiki.org/wiki/Issue/2429/NumericSorting
[2] https://cloud.githubusercontent.com/assets/1245473/25782835/abf402c8-338c-11e7-9e0b-940bae59503e.png

@mwjames mwjames changed the title Add smw_sort field, refs 2065 Add entity specific collation support (smw_sort), refs 2065 May 7, 2017
@mwjames mwjames merged commit 13c0689 into master May 7, 2017
@mwjames mwjames deleted the smw-sort-field branch May 7, 2017 16:35
@mwjames
Copy link
Contributor Author

mwjames commented May 7, 2017

Run update.php or setupStore.php and

I had to run Special:SemanticMediaWiki updatetables (as replacement for update.php or setupStore.php) otherwise the sandbox would have been blocked on:

Original exception: [7e298e0712bc0dbc0186f5b2] /wiki/Issue/2429/NumericSorting DBQueryError from line 1054 of /var/www/htdocs/mw/02100/w/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading?
Query: SELECT smw_id,smw_sortkey,smw_sort FROM `02100_smw_object_ids` WHERE smw_title = 'Issue/2429/NumericSorting' AND smw_namespace = '0' AND smw_iw = '' AND smw_subobject = '' LIMIT 1
Function: SMWSql3SmwIds::getDatabaseIdAndSort
Error: 1054 Unknown column 'smw_sort' in 'field list' (localhost)

expect to see ... copying smw_sortkey to smw_sort ... done. only the first time the script is run

The output contained:

Checking post creation activities ...
   Table smw_object_ids ...
   ... copying smw_sortkey to smw_sort ... done.
   ... done.

@kghbln kghbln added the wikidocu missing Code changes (mostly features) what have not yet been documented label May 13, 2017
@kghbln
Copy link
Member

kghbln commented Nov 23, 2017

Documented:

@kghbln
Copy link
Member

kghbln commented Nov 24, 2017

@mwjames Code docu does not seem to have notes about the maintenance script

@kghbln kghbln removed the wikidocu missing Code changes (mostly features) what have not yet been documented label Nov 24, 2017
@mwjames
Copy link
Contributor Author

mwjames commented Nov 25, 2017

Code docu does not seem to have notes about the maintenance script

Yes, because updateEntityCollation.php has no class in SMW\Maintenance as it uses TableFieldUpdater directly to push related updates when running the script.

@kghbln
Copy link
Member

kghbln commented Nov 25, 2017

Thanks for the information. Just wanted to make sure! :)

@kghbln
Copy link
Member

kghbln commented Nov 25, 2017

@mwjames Another question: The default for $smwgEntityCollation is "identity" is different to MW's $wgCategoryCollation which is "uppercase". So running "updateEntityCollation.php" should through a note about this incongruent setting even though the sysadmin did not change a thing in configuration. My theory is that SMW was always different in this respect however if one chooses to make a change in this respect the sysadmin should probably do congruent settings for both configuration parameters?!?

@mwjames
Copy link
Contributor Author

mwjames commented Nov 25, 2017

The default for $smwgEntityCollation is "identity" is different to MW's $wgCategoryCollation which is "uppercase".

Well, the default SMW sort mode has been that equally of "identity" therefore it should remain so since we don't want to break existing queries that would otherwise carry a different sort match. If users want to change the setting then of course it is suggested to match both.

@kghbln
Copy link
Member

kghbln commented Nov 25, 2017

Well, the default SMW sort mode has been that equally of "identity" ...

That's what I have guessed. Thanks for confirming!

@mwjames
Copy link
Contributor Author

mwjames commented Jan 12, 2019

@kghbln @krabina

refs https://sourceforge.net/p/semediawiki/mailman/message/36448466/

ist that correct? This is what I am expecting when I use a German collation instead of the standard one.
However, it does not work in my wiki, as you can see here [4].
[1] https://www.mediawiki.org/wiki/Manual:$wgCategoryCollation
[2] https://www.semantic-mediawiki.org/wiki/Help:$smwgEntityCollation
[3] #2429
[4] https://www.verwaltungspreis.gv.at/index.php?title=Kategorie:Verwaltungsprojekte

Looking at [0] it shows ICU 4.2.1 which is a very old ICU version and its is inevitable to use a more recent one to make sure you get something like (uses ICU 57.1):

$wgCategoryCollation = 'uca-de-u-kn';
$smwgEntityCollation = 'uca-de-u-kn';

image

image

[0] https://www.verwaltungspreis.gv.at/Spezial:Version

@mwjames
Copy link
Contributor Author

mwjames commented Jan 12, 2019

It goes without saying that once you change one of the variables (such as the ICU version, or type of collation) you have to rerun listed scripts.

@krabina
Copy link
Contributor

krabina commented Jan 14, 2019

Thank you, I wasn't even aware of an outdated version. Will talk to my hoster to upgrade it and test ist again with a more recent version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Alters an existing functionality or behaviour
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants