Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report similarity on property labels #2244

Merged
merged 1 commit into from Feb 11, 2017
Merged

Report similarity on property labels #2244

merged 1 commit into from Feb 11, 2017

Conversation

mwjames
Copy link
Contributor

@mwjames mwjames commented Feb 11, 2017

This PR is made in reference to: #

This PR addresses or contains:

  • Adds Special:PropertyLabelSimilarity to list properties with a calculate syntactic similarity

This PR includes:

  • Tests (unit/integration)
  • CI build passed

@mwjames mwjames added the new feature A new, or altered behaviour of an existing functionality that fundamentally impacts behaviour label Feb 11, 2017
@mwjames mwjames added this to the SMW 2.5.0 milestone Feb 11, 2017
@mwjames
Copy link
Contributor Author

mwjames commented Feb 11, 2017

It can be difficult to figure by hand which properties are most likely cover the same concepts (based on the property label) [0] therefore Special:PropertyLabelSimilarity will help with that effort by calculating the syntactic similarity and provide a list such as:

image

The page only reports those similarities and does not provide a function for merging or deleting reported properties, instead the user should use the information to create redirects [1] or delete duplicate properties.

Of course, sometimes the syntactic similar of a property may in fact be a false positive on the assumption that both represent the same concept and to eliminate those cases smwgSimilarityLookupExemptionProperty setting contains the name (which by default is owl:differentFrom) of a property that is used to mark and exempt properties from each by declaring them to be different from each other.

Furthermore, to give the report a bit more flexibility, the threshold of the similarity distance can be freely adjusted.

@kghbln FYI

[0] https://www.semantic-mediawiki.org/wiki/Property_similarity
[1] https://www.semantic-mediawiki.org/wiki/Redirects

@mwjames mwjames merged commit cb3af89 into master Feb 11, 2017
@mwjames mwjames deleted the simi branch February 11, 2017 18:48
@mwjames
Copy link
Contributor Author

mwjames commented Feb 11, 2017

The Type ID checkbox is to help making a quick assumption about the nature of a property and as it can be seen on [0] because sometimes results reveal similar names but different types.

    {
        "property": [
            {
                "label": "Defaultvalue1",
                "type": "_txt"
            },
            {
                "label": "Defaultvalue4",
                "type": "_dat"
            }
        ],
        "similarity": 92.31
    },

What's more, the user has to select a limit to minimize the computational effort (the comparison is exponential with size of the list members).

[0] https://sandbox.semantic-mediawiki.org/w/index.php?title=Sp%C3%A9cial%3APropertyLabelSimilarity&limit=500&offset=0&threshold=90&type=yes

@kghbln
Copy link
Member

kghbln commented Mar 26, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature A new, or altered behaviour of an existing functionality that fundamentally impacts behaviour
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants