Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Case: citation referring to a data collection #4

rayi113 opened this issue Jan 16, 2015 · 1 comment


Copy link

commented Jan 16, 2015

Use Case: citation referring to a data collection

  • Contributors: Joan Starr
  • Similar to:

Goals and Summary

At a scientific database application hosting data sets for tests performed on engineering alloys has been enabled for data citation using DataCite DOIs. The collection consists of about 20,000 discrete data sets, of which to date approximately 10% have been assigned DOIs. For traditional scientific publications, it is typically the case that hundreds of data sets will be reported. Obviously, citing the data sets individually in the references section of the written publication is not practical. However, for transparency and reproducibility the full range of data sets needs to be cited.

Why is it important and to whom?

Researchers wanting credit for their work (metrics), information professionals trying to describe objects, researchers looking for information

Why hasn’t it been solved yet?

As per the example below, presently it is recommended that DOI range is specified. This approach is far from satisfactory. The obvious problem is that it is only appropriate in the case of sequential blocks of data sets. Further, and more importantly, those data sets that are not quoted explicitly will fail to be registered by the Thomson Reuters Data Citation Index and other indexers, and, hence, citation metrics will not be accurate.

Example data citation from

32 M. Bruchhausen, F. de Haan, Test data for low cycle fatigue on material 14Cr 1W ODS at 650 ◦C and 750 ◦C, JRC Petten, 2013. to inclusive, v1.0, data set

One option to address this issue is to use data catalogs. A standard for describing data catalogs has been elucidated in the W3C DCAT schema. However, it’s not clear exactly how a catalog is turned into a practical citation.

Additional Information and Links

  • Contributor: Joan Starr, California Digital Library and DataCite
  • Source: This use case was presented to the DataCite Metadata Working Group by Tim Austin,
  • European Commission Joint Research Centre Institute for Energy and Transport.

This comment has been minimized.

Copy link

commented Jan 27, 2015

Actually this is where larger data sets come in. This problem could be solved if the repository created an aggregate data set consisting of all data sets of that type. It's documentation cites all of its source data sets (just like a journal article or book cites all of its sources). The data set as a whole is given a citation. In addition the proposed RDA solution for identifying the exact subset of the larger data set used by the repository and included in the specific citation given to the user to put in their article is used. In this way, a single citation that itself references all of the included data can be used.

If the ESIP data citation guidelines are used, the citation would look like:

Doe, J. and R. Roe. 2001, updated daily. The FOO Gridded Time Series Data Set. Version 3.2. Unique exact subset specifier. The FOO Data Center. Accessed 1 May 2011.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
3 participants
You can’t perform that action at this time.