New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Case: Record and Discover Derived Products #2

rayi113 opened this Issue Jan 16, 2015 · 0 comments


None yet
2 participants

rayi113 commented Jan 16, 2015

Use Case: Discover Derived Products

  • Contributors: Matt Jones (@mbjones), Alva Couch, Chris Jones (@csjx)
  • Similar to: #9

Goals and Summary

Via data processing, analysis, modeling, and visualization processes, researchers create derived products, including derived data sets, figures, tables, animations, and other artifacts. By establishing citation relationships showing provenance relationships among these derived and source products, we can preserve the dependency relationships for use in reproducing the science, thereby enabling discovery of data and products from their relationships. For example, with appropriate relationships (prov:wasGeneratedBy, prov:used), one can determine if one product was derived from another, and following the graph of such linkages, could discover other analyses and products that were derived from the same source data sets.

Why is it important and to whom?

  • To reproduce science, researchers need the ability to follow data derivation changes
  • Because researchers tend to only cite the proximate data used in a study, these provenance relationships allow researchers to get credit for the impact of upstream source data in downstream synthetic analysis
  • In a complex workflow, an error may be introduced in raw products that were used to create a derived product. Data source citations allow one to proceed from source to products, notifying appropriate researchers of the errors.

Why hasn’t it been solved yet?

  • Provenance modeling languages have been in flux (e.g., PROV, OPM)
  • Few tools support capture of provenance information in a standard format
  • Data repositories usually lack provenance information, or it is in natural language format

Actionable Outcomes

An example diagram showing provenance relationships as envisioned by DataONE:

Provenance trace in ProvONE

Additional Information and Links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment