Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add fields to support data papers #269

31 tasks done
mbjones opened this issue Apr 22, 2017 · 4 comments
31 tasks done

add fields to support data papers #269

mbjones opened this issue Apr 22, 2017 · 4 comments


Copy link

@mbjones mbjones commented Apr 22, 2017

ESA and other societies are moving towards the publication of data papers that include more complete narratives about a data set and its importance and use. An ESA committee has recommended that a new data paper publication be created, and that it use EML and other related metadata standards to create the papers. To support that, EML would need to be extended to support descriptive fields desired in data papers but not yet in EML. These would be optional to support backward compatibility. Below is a list of the fields required for a data paper, and notes on each as to its disposition in EML.

Proposed fields

  • Title
    • Descriptive title of data paper – 8-20 word title that describes the contents of the data paper, limited to 120 characters including spaces
    • /eml/dataset/title
  • Author(s)
    • First name, middle initial (if applicable), last name of each author of the data paper (EML); note the additional section later in the paper that provides author details
    • /eml/dataset/creator
  • Keywords
    • Up to 12 keywords or phrases that pertain to the data paper; terms possibly come from one or more controlled vocabularies such as the Global Change Master Directory (GCMD) (EML)
    • /eml/dataset/keywordSet
  • Location
    • Geographic coordinates (lat/long or decimal degrees, in WGS84 datum) of central location or bounding box that encompasses the study area and includes min and max X, Y coordinates (or “global”) or a table that includes multiple site locations such as 8 sites spread over a large region where a bounding box makes no sense (EML); note that EML supports these options presently and the input can be easily rendered on a map; additional maps and photos/figures may also be included in the Site Description.
    • /eml/dataset/coverage/geographicCoverage
  • Temporal coverage
    • Beginning year to ending year of the data described in the paper plus various alternatives for representing paleo-dates, etc. (EML); Note: EML supports a wide range of alternatives
    • /eml/dataset/coverage/temporalCoverage
  • Citation
    • /eml/dataset/referencePublication holds the canonical citation in addition to the dataset citation that is present in the various fields. This, for example, would be set to the EcoSphere citation when the datapaper is in EcoSphere.
    • Authors //citation/creator (Note: ensure this is consistent with /eml/dataset/creator)
    • Year //citation/pubDate (Note: ensure this is consistent with /eml/dataset/pubDate)
    • Title `//citation/title
    • Journal //citation/article/journal
    • Journal number //citation/article/volume
    • Paper number (from Ecosphere) //citation/article/issue
    • DOI /eml/@packageId
    • EML currently has these fields in separate fields. Need to consider whether to add a new citation element to hold the proposed citation, and if so, whether that would be CitationType or text.
    • Note the DOI field in the citation. Need to decide if that goes in packageId
  • Abstract
    • one to two paragraphs (350 word maximum, with no references) that provide a summary of the data paper including type(s) of data collected, significance of the data, and potential applications of the data
    • /eml/dataset/abstract
  • Introduction /eml/dataset/introduction
    • one to many paragraphs that provide background and context for the data paper with appropriate references (e.g., project objectives, hypotheses being addressed, what is known about the pattern/process under study, how the data have been used to date including references, and could be used in the future); may include figure(s) and table(s); Note: an example of a long Introduction (from 2012 Data Paper) with five figures.
    • This section requires a new type of inline reference to embed citations, images, and tables inline in the text document. This may require an extension to TextType, and would enable one to point to the id for entities to be inline displayed. One way to handle this would be via adding DocBook's inlinemediaobject to the TextType (see issues in #275). Alternatively, these could be links in TextType that allow pointing to the id field for entities in the data set.
  • Site description
    • one or more paragraphs that describe relevant features of the site(s) where data were collected (e.g., countries, biome/ecosystem/habitat, vegetation types, soils, climate, elevation, etc.); may also include tables, map(s) and picture(s)
    • /eml/dataset/methods/sampling/studyExtent/description
  • Experimental or sampling design
    • description of the approaches (statistical and otherwise) used to design the study; may include figure(s) or table(s) that illustrate the design
    • /eml/dataset/methods/sampling/samplingDescription
  • Research methods
    • one or more paragraphs that describe methods used in acquiring, managing and processing the data (EML)
    • /eml/dataset/methods/methodStep
  • Data quality checks
    • one or more paragraphs that describe quality assurance/quality control (QA/QC) methods used to ensure data are high quality, including identification and treatment of missing data and “outliers”, authentication and verification procedures, etc.
    • /eml/dataset/methods/qualityControl
  • Data synopsis
    • one or more paragraphs as well as table(s) and figure(s) that summarize the data included in the data paper, including a summary of key findings if appropriate
    • Expanded definition of /eml/dataset/purpose to include the data synopsis (see sha c32cbe9)
  • Getting started /eml/dataset/gettingStarted
    • one or more paragraphs that describe the data package—i.e., the number and names of data files and whether any specialized software is available and/or may be necessary for analyzing or interpreting the data, possibly include high level description of data format, etc.
  • Acknowledgements /eml/dataset/acknowledgements
    • one or more sentences that acknowledge funders and other key contributors to the study (excluding the data paper authors)
  • Literature cited
    • bibliographic references to literature, software, methods, data and websites that were cited in the data paper
    • /eml/dataset/literatureCited
  • Literature citing the data
    • /eml/dataset/usageCitation
    • Same as issue #259, a usageCitation
    • bibliographic references to literature, software, data and websites that have used and cited data in the data paper; possibility for a dynamic data citation section (possibly a future enhancement); note that Wiley provides some pertinent guidelines at, but some additions may be required
  • Data package structure and content
    • Note: more specific information about data structure, data types, etc. would be associated with the complete metadata record (e.g., ISO19115, EML); pointer(s) to updated data (here or as a separate element); look at F1000 as an example for how to deal with versions/revisions
    • /eml/dataset/dataTable and related entity types
    • may require addition of some new entity types, including Matrix and Image
  • Author information section (possibly locate at end of article)
    • Author(s): First name, middle initial (if applicable), last name of each author of the data paper (EML)
    • ORCID
      • ORCID identifiers for all authors (EML)
      • /eml/dataset/creator/@userId
    • E-mail address
      • E-mail address of corresponding author (EML); note that the ScholarOne submission form requires that each author has a user profile in the system and that includes e-mail addresses; however, only the correspondent's e-mail is included with the published paper
      • /eml/dataset/creator/electronicMailAddress
    • Address(es)
      • Affiliate organization, address, city, state or province, zip code or postal code, country for each author
      • /eml/dataset/creator/address
  • Appendices
    • appendices may include software code, species lists, etc.
    • Most appendices would just be additional tables or figures in the entity list, so probably no need for a special new field, but we may need to differentiate the main data from supplemental data

Other fields that could be added:

  • Provenance information
  • Semantic types
@mbjones mbjones created this issue from a note in EML 2.2.0 Release (High priority) Apr 22, 2017
@mbjones mbjones added this to the EML2.2.0 milestone Apr 22, 2017
@mbjones mbjones self-assigned this Apr 22, 2017
@mbjones mbjones moved this from High priority to In progress in EML 2.2.0 Release Sep 9, 2017
mbjones added a commit that referenced this issue Feb 10, 2018
This includes a new `markdown` element in `txt:TextType` to support
Github flavored markdown.  And new elements for an introduction,
gettingStarted, and acknowledgements.  See issue #269 and #275.
Copy link
Member Author

@mbjones mbjones commented Feb 10, 2018

Added new fields for Introduction, Getting Started, and Acknowledgements. Also added a new markdown element. See SHA 54c7cd7 and issue #275.

Copy link
Member Author

@mbjones mbjones commented Apr 25, 2018

Added support for inline citations in markdown and new citation fields in commit sha 3beb10b.

mbjones added a commit that referenced this issue Apr 25, 2018
These include a new ability to use Bibtex citation format both within
the `citation` element, and within a new `bibtex` element, to create
lists of refs using these in a literatureCited element (#300), as well
as in usageCitation (#259), and referencePublication (#277) elements.
All of this helps support data papers (#269), for which pandoc-style
citation keys can be used to cite these references in the text of
Markdown blocks in the EML document.  Added these features as
demonstrations in the eml-data-paper.xml sample document.
@mbjones mbjones closed this Apr 25, 2018
Copy link
Member Author

@mbjones mbjones commented Apr 25, 2018

Completed, assuming that Appendices are just added as additional entities within the EML. Open new tickets to deal with specific issue that arise in evaluating data paper field changes.

Copy link
Member Author

@mbjones mbjones commented Apr 25, 2018

See example document here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

No branches or pull requests

1 participant