Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds script to remove duplicates, refs 2883 #2932

Merged
merged 1 commit into from Jan 6, 2018
Merged

Adds script to remove duplicates, refs 2883 #2932

merged 1 commit into from Jan 6, 2018

Conversation

mwjames
Copy link
Contributor

@mwjames mwjames commented Jan 6, 2018

This PR is made in reference to: #2883

This PR addresses or contains:

  • Adds the removeDuplicateEntities.php maintenance script to remove all duplicate entities with non-residual references (meaning those that have no reference in any other table) from the entity table

This PR includes:

  • Tests (unit/integration)
  • CI build passed

Fixes #

@mwjames mwjames added the new feature A new, or altered behaviour of an existing functionality that fundamentally impacts behaviour label Jan 6, 2018
@mwjames mwjames added this to the SMW 3.0.0 milestone Jan 6, 2018
@mwjames
Copy link
Contributor Author

mwjames commented Jan 6, 2018

The script will output something similar to the example below with the disposed section containing those that have been removed and untouched that persist due to being the one with an active reference. In case both would carry active references (which should not but can happen) then the both remain and the user has to resolve the issue manually by inspecting the listed entity and purge (or save) the page [0].

Found: 1 duplicates

.

Log

{
    "disposed": {
        "311800": {
            "smw_title": "_wpg",
            "smw_namespace": "102",
            "smw_iw": "",
            "smw_subobject": ""
        }
    },
    "untouched": {
        "189135": {
            "smw_title": "_wpg",
            "smw_namespace": "102",
            "smw_iw": "",
            "smw_subobject": ""
        }
    }
}

[0] https://www.semantic-mediawiki.org/wiki/Help:Duplicate_entities

@mwjames
Copy link
Contributor Author

mwjames commented Jan 6, 2018

@kghbln Happy testing. (PS: Remember the [0] list is cached!)

[0] https://sandbox.semantic-mediawiki.org/w/index.php?title=Sp%C3%A9cial:SemanticMediaWiki&action=duplookup

@mwjames mwjames merged commit fcc1755 into master Jan 6, 2018
@mwjames mwjames deleted the rem-dups branch January 6, 2018 03:10
@kghbln kghbln added the wikidocu missing Code changes (mostly features) what have not yet been documented label Jan 6, 2018
@kghbln
Copy link
Member

kghbln commented Jan 6, 2018

@mwjames I just ran the script (log). Interestingly this log seems to provide some hints about the whereabouts of #2001 ("This_page_supports_semantic_in-text_annotations ..."). Since the log is rather lengthy it will perhaps be an idea allow logging to a file similarly to what rebuildData.php does.

@mwjames
Copy link
Contributor Author

mwjames commented Jan 7, 2018

Since the log is rather lengthy it will perhaps be an idea allow logging to a file similarly to what rebuildData.php does.

Can you create a follow-up task on this?

@kghbln
Copy link
Member

kghbln commented Jan 8, 2018

Can you create a follow-up task on this?

Done with #2946

@mwjames
Copy link
Contributor Author

mwjames commented Jan 14, 2018

@kghbln FYI

whereabouts of #2001 ("This_page_supports_semantic_in-text_annotations ...").

Here the analysis:

First, check:

image

Second, find the ID (use the full name as input, it will find the IDs) and reference the ID is used.

image

With the result of:

{
    "21692": [],
    "21843": [],
    "21881": {
        "smw_di_wikipage": {
            "s_id": "3378"
        }
    }
}

Looking at 3378 will reveal:

image

After looking at Utilisateur:Lalquier, you will find:

{{#set:Description=This page supports semantic in-text annotations (e.g. "Is specified asWorld Heritage Site") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help page.This page supports semantic in-text annotations

I'd say, kick the text from the user page!

@kghbln
Copy link
Member

kghbln commented Aug 7, 2018

I'd say, kick the text from the user page!

Yeah, done! This was driving me nuts.

@kghbln
Copy link
Member

kghbln commented Jan 20, 2019

Documented

@kghbln kghbln removed the wikidocu missing Code changes (mostly features) what have not yet been documented label Jan 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature A new, or altered behaviour of an existing functionality that fundamentally impacts behaviour
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants