Skip to content

DiseaseOntology/DO.utils

Repository files navigation

DO.utils

R-CMD-check Codecov test coverage

DO.utils is an R package primarily designed to support the operations of the Human Disease Ontology (DO; disease-ontology.org) but with a number of capabilities that will be useful to the broader scientific community for:

  1. Assessing Resource Use & Impact (Bibliometrics/Scientometrics)

    • A brief summary can be found under Assessing Resource Use (Bibliometrics/Scientometrics).
    • For a detailed description refer to the "Assessing Resource Use: Obtaining Use Records" tutorial included with this package (vignette("obtain_use_records", package = "DO.utils")) or the peer-reviewed article:

    J. Allen Baron, Lynn M Schriml, Assessing resource use: a case study with the Human Disease Ontology, Database, Volume 2023, 2023, baad007. PMID:36856688, https://doi.org/10.1093/database/baad007.

  2. Simplifying common R tasks (see General Utilities section).

Operations specific to the use, analysis, maintenance, and improvement of the ontology itself are described briefly in the DO Improvement & Analysis section.

DO.utils is work in progress. If you are interested in contributing, please reach out. Note that our goal is to work collaboratively to make functions as broadly useful as possible.

Installation

Installing Prerequisites

To use DO.utils you must first install R from CRAN. Installing RStudio can also be useful but is not required. The devtools package is also required and can be obtain by executing install.packages("devtools") within R.

Installing DO.utils

DO.utils can be installed from Github or from a persistent, open-access repository hosted by Zenodo.

To install from Github, run devtools::install_github("DiseaseOntology/DO.utils") within R.

To install from Zenodo, first download DO.utils (DOI: 10.5281/zenodo.7467668) to your local machine. Then, within R run devtools::install_git(<local_path_to_DO.utils>), replacing <local_path_to_DO.utils> with the local path to DO.utils.

Assessing Resource Use & Impact (Bibliometrics/Scientometrics) {#assess_use}

DO.utils includes functions to assist in both assessing how a resource is used and in measuring the impact of that use. Most of these functions may be broadly useful to anyone trying to accomplish these tasks, while a much smaller number are specific to measuring the DO's impact.

Components that will be broadly useful to any resource can:

  1. Identify scientific publications that use a resource from:
    1. Citations of one or more article(s) published by the resource ("cited by"; citedby_pubmed() and citedby_scopus()).
    2. PubMed or PubMed Central (PMC) search results (search_pubmed() and search_pmc()).
    3. A MyNCBI collection (read_pubmed_txt()).
  2. Identify matching publication records in different record sets (must be formatted data.frames; see match_citations()).

To those interested in Bioconductor package download statistics,get_bioc_pkg_stats() may be useful, while other measures of impact are designed specifically with the DO in mind (e.g. count_alliance_records()).

DO Improvement & Analysis {#do_specific}

DO.utils provides the following capabilities used for improvement and analysis:

  1. Git repo management, iterative execution across git repository tags, and SPARQL queries implemented with wrappers (DOrepo(), owl_xml()) around the related pyDOID python package.
  2. Automation of disease-ontology.org updates, including:
  3. Definition source URL validation.
  4. Prediction of mappings/cross-references between other resources & DO, via PyOBO/GILDA or approximate string matching.
  5. Simplified system installation of the OBO tool ROBOT.

General Utilities {#general}

DO.utils includes general utilities to make programming in R easier including, for example, those that assist with:

  • Type/content testing -- is_blank(), is_positive(), is_vctr_or_df(), all_duplicated()
  • Vector-to-scalar conversion -- collapse_to_string(), unique_if_invariant()
  • Data reduction -- collapse_col(), drop_blank()
  • Value replacement -- replace_null(), replace_blank()
  • Sorting (by a specified priority)
  • Dates -- cur_yr(), today_datestamp()
  • Temporary bug workarounds -- restore_names()

About

R package for improvement and analysis of the Human Disease Ontology (DO), including semi-automated use assessment from published literature.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages