Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent metric names #51

Open
nfriesen opened this issue Jul 24, 2014 · 2 comments
Open

Consistent metric names #51

nfriesen opened this issue Jul 24, 2014 · 2 comments
Assignees
Labels

Comments

@nfriesen
Copy link
Contributor

Looking over the metric implementation for cleaning purposes I noticed some inconsistences in metric names:

  1. In the most cases the metric name reflects the corresponding quality problem, so 'UndefinedClasses', or 'DuplicateInstances' or 'ObsoleteConceptInOtology'. However, some metrics have different meaning - LowUsageOfBlankNodes, Even it's not possible for all metrics, In some cases it makes more sence to reflect the quality problems, e.g. BlankNodeUsage.
    Some metric names are confusing, they don't reflect the metric definition. It would be helpful either adapt their implementation (if exactly this metric is required for use case) or rename them. This is the list of such metrics:
  • ShortURIs metric actually computes the average URU length, so maybe (AverageURILength) ?
  • LowBlankNodeUsage metric actually computes the ratio of 'good' entities. - the current implementation computes NoBlankNodesRatio, but it would makes more sence to define BlankNodesRatio.
  • Metric UnstructuredData probably shoulb be separated into the two metrics: UnstructuredData and DeadURIs metrics

@jerdeb BTW the test for Dereferencability metric fails.

@jerdeb
Copy link
Contributor

jerdeb commented Jul 24, 2014

Most of those are specific to EBI use cases, therefore are not generic.
They are marked as so. Also, there are a number of metrics which need to be
reviewed after the deliverable. Keep this ticket open.

On 24 July 2014 11:26, Natalja Friesen notifications@github.com wrote:

Looking over the metric implementation for cleaning purposes I noticed
some inconsistences in metric names:

  1. In the most cases the metric name reflects the corresponding quality
    problem, so 'UndefinedClasses', or 'DuplicateInstances' or
    'ObsoleteConceptInOtology'. However, some metrics have different meaning -
    LowUsageOfBlankNodes, Even it's not possible for all metrics, In some cases
    it makes more sence to reflect the quality problems, e.g. BlankNodeUsage.
    Some metric names are confusing, they don't reflect the metric definition.
    It would be helpful either adapt their implementation (if exactly this
    metric is required for use case) or rename them. This is the list of such
    metrics:
  • ShortURIs metric actually computes the average URU length, so maybe
    (AverageURILength) ?
  • LowBlankNodeUsage metric actually computes the ratio of 'good'
    entities. - the current implementation computes NoBlankNodesRatio, but it
    would makes more sence to define BlankNodesRatio.
  • Metric UnstructuredData probably shoulb be separated into the two
    metrics: UnstructuredData and DeadURIs metrics

@jerdeb https://github.com/jerdeb BTW the test for Dereferencability
metric fails.


Reply to this email directly or view it on GitHub
#51.

@nfriesen
Copy link
Contributor Author

@jerdeb Please check the test for Dereferencability metric, it fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants