Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LabelsUsingCapitals metric #34

Closed
clange opened this issue Jun 18, 2014 · 1 comment
Closed

LabelsUsingCapitals metric #34

clange opened this issue Jun 18, 2014 · 1 comment

Comments

@clange
Copy link
Contributor

clange commented Jun 18, 2014

Implement a metric LabelsUsingCapitals that identifies triples whose property is from a pre-configured list of label properties (a subset of the annotation properties from #32), and whose object uses a bad style of capitalisation.

We consider the following widely used label properties:

  • http://www.w3.org/2004/02/skos/core#altLabel
  • http://www.w3.org/2004/02/skos/core#hiddenLabel
  • http://www.w3.org/2004/02/skos/core#prefLabel
  • http://www.w3.org/2000/01/rdf-schema#label

For now, this list of properties can be hard-coded (maybe somehow shared with #32); we might think about a more extensible implementation later.

For now we define "bad" capitalisation as "camel case", for which we should design a regular expressions to match such strings. Consider, e.g., a label "InterestingThing": this is a suitable name for a class/resource, but the label should rather be "interesting thing" or "Interesting Thing"

E.g. a triple like the following should be matched:

<http://...> <http://www.w3.org/2000/01/rdf-schema#label> "InterestingThing" .

The metric value is defined as the ratio of labels with "bad capitalisation" to all labels (i.e. all triples having such properties).

Note: in the cleaning UI, triples that match this metric should be reported as non-critical errors.

(Background: D3.1 Table 20 on page 91)

@clange clange added this to the July Deliverable milestone Jun 18, 2014
@muhammadaliqasmi
Copy link
Contributor

LabelsUsingCapitals identifies triples whose property is from a pre-configured list of label properties, and whose object uses a bad style of capitalization list of widely used annotation properties are stored in ..src/main/resources/LabelPropertiesList.txt

metric value = total number of bad capitalization literals / total number of literals

Metric value Range = [0 - 1]
Best Case = 0
Worst Case = 1

--implemented in issue#34 branch
--issue#34 branch merged with master branch

@clange clange modified the milestones: D3.2, D5.2 Jun 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants