Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Build Status


Extract ontology terms referenced from PubMed abstracts as per the MEDLINE/PubMed Baseline Repository by using SciGraph against a set of ontologies.


Using OmniCorp requires the following open source tools:

  • Git
  • Maven
  • Scala and sbt
  • wget

On macOS, these can be installed using Homebrew by running the command: brew install git maven scala sbt wget.

Setting up SciGraph

We need to use a specially modified version of SciGraph in order to carry out text annotations.

To install this version locally, run make SciGraph. This will download, compile and install the customized SciGraph we use.

You will then need to run make omnicorp-scigraph to generate the SciGraph instance for the ontologies specified in ontologies.ofn.


Extract ontology terms used in the COVID-19 Open Research Dataset (CORD) as tab-delimited files for further processing in COVID-KOP.

In order to generate OmniCORD output files, you should:

  1. Update the ROBOCORD_DATE variable in Makefile. You can look up the latest CORD-19 release date on their website.
  2. Download the CORD-19 dataset by running make robocord-download. This will automatically create a directory in the robocord-datas directory and download the CORD-19 dataset for $ROBOCORD_DATE into that directory.
  3. Uncompress the dataset by running make robocord-data.
  4. Test the extraction program by running make robocord-test. This will extract data from some articles in order to ensure that the program is working correctly. It will also create a directory in the robocord-outputs directory to store the results in. It's usually a good idea to clear the robocord-output directory after running the test and ensuring that the output files look correct.
  5. Use robocord.job to attempt to run all the jobs on a SLURM cluster. Any number of jobs can be specified, but values of around 4000 seem to work with. Example: sbatch --array=0-3999 robocord.job.
  6. Use RoboCORDManager to re-run any jobs that failed to complete. You can use the --dry-run option to see what jobs will be executed before they are run. Jobs are executed using the script, so modify that if necessary. Example: srun sbt "runMain org.renci.robocord.RoboCORDManager --job-size 20

Ontologies used

Currently, we look for terms from the following ontologies:


No description, website, or topics provided.







No releases published


No packages published

Contributors 4