Skip to content
This repository has been archived by the owner on Sep 24, 2019. It is now read-only.

Version 2 Plans

William Hayes edited this page Oct 22, 2015 · 2 revisions

Resource Generator V2

Goals:

  • RDF triplestore-based

  • Easy to customize

  • Optionally create BELFramework 3.0 Namespace, Annotation and Equivalence files

  • Provide a changelog

  • Allow use of close matches for equivalencing using synonyms (matching optionally restricted to domain/datasets)

  • Tests!!! specifically data access/format/parsing tests using a test framework

  • Monitoring of generated datasets via statistic comparisons to detect pipeline processing issues

Plans

  • Data Parsers will be distributable components that can be added/removed/enabled/disabled

  • Data Parsers will incorporate following components

    • configuration

      • allow registry of remote code for parsers or for pre-processed RDF

      • allow enabling/disabling parsers (via tags?)

      • make it easy to test new additional datasets (e.g. run single parser against pre-generated data)

    • optional: data freshness check (has data changed since last run?)

    • data access and localization

    • data parsing into RDF

    • logging

    • optional: data statistics generation

    • optional: tests

      • is data accessible?

      • has data format changed?

      • are current results significantly smaller than last set of results?

  • Pipeline framework will provide:

    • General configuration

    • Ability to pick up new Data Parsers and run them

    • Will provide location to save original dataset downloads

    • Will provide location to save generated RDF datasets

    • Will load RDF datasets into triplestore

    • Will run RDF enhancements

      • Add transitive closure to exact matches (and optionally close matches that are synonym-based)

      • Will add identifiers.org URI’s to BEL Entities where possible

    • Will create changelog from comparison of current and previous Resources in triplestore (via Named Graph comparisons)

    • Export from triplestore into BEL Namespace, Annotation and Equivalence resource files (optional)

Clone this wiki locally