Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Use Case: citation referring to a data collection #4
Use Case: citation referring to a data collection
Goals and Summary
At https://odin.jrc.ec.europa.eu a scientific database application hosting data sets for tests performed on engineering alloys has been enabled for data citation using DataCite DOIs. The collection consists of about 20,000 discrete data sets, of which to date approximately 10% have been assigned DOIs. For traditional scientific publications, it is typically the case that hundreds of data sets will be reported. Obviously, citing the data sets individually in the references section of the written publication is not practical. However, for transparency and reproducibility the full range of data sets needs to be cited.
Why is it important and to whom?
Researchers wanting credit for their work (metrics), information professionals trying to describe objects, researchers looking for information
Why hasn’t it been solved yet?
As per the example below, presently it is recommended that DOI range is specified. This approach is far from satisfactory. The obvious problem is that it is only appropriate in the case of sequential blocks of data sets. Further, and more importantly, those data sets that are not quoted explicitly will fail to be registered by the Thomson Reuters Data Citation Index and other indexers, and, hence, citation metrics will not be accurate.
Example data citation from http://dx.doi.org/10.1016/j.jnucmat.2013.09.059
32 M. Bruchhausen, F. de Haan, Test data for low cycle fatigue on material 14Cr 1W ODS at 650 ◦C and 750 ◦C, JRC Petten, 2013. http://dx.doi.org/10.5290/1000021 to http://dx.doi.org/0.5290/1000033 inclusive, v1.0, data set
One option to address this issue is to use data catalogs. A standard for describing data catalogs has been elucidated in the W3C DCAT http://www.w3.org/TR/vocab-dcat/ schema. However, it’s not clear exactly how a catalog is turned into a practical citation.
Additional Information and Links
Actually this is where larger data sets come in. This problem could be solved if the repository created an aggregate data set consisting of all data sets of that type. It's documentation cites all of its source data sets (just like a journal article or book cites all of its sources). The data set as a whole is given a citation. In addition the proposed RDA solution for identifying the exact subset of the larger data set used by the repository and included in the specific citation given to the user to put in their article is used. In this way, a single citation that itself references all of the included data can be used.
If the ESIP data citation guidelines are used, the citation would look like:
Doe, J. and R. Roe. 2001, updated daily. The FOO Gridded Time Series Data Set. Version 3.2. Unique exact subset specifier. The FOO Data Center.http://dx.doi.org/10.xxxx/notfoo.547983. Accessed 1 May 2011.