This repository acts as the central management point for a set of repositories that are used to generate digital object identifiers (DOIs) for datasets in the Neotoma Paleoecology Database.
DOIs are generated at the level of a dataset, which in Neotoma consists of all measurements of a given data type for a single collection unit at a site (e.g. all vertebrate fossils from a bone pile in a cave; all fossil pollen samples from a core in a lake; etc.) All DOIs are associated with a landing page.
Linked repositories include:
- Neotoma Postgres Functions
- Neotoma Landing Pages
- Neotoma API
- Neotoma DOI Technical Paper
- Tilia API Endpoints
This project is currently under development. All participants are expected to follow the code of conduct for this project.
NOTE: The DataCite XML validation files in the data/
folder (and include
subfolder) were obtained from the DataCite GitHub Schema repository.
For any single dataset, the DOI provides access to three related elements:
- The live record (accessed from Neotoma via the various APIs)
- The frozen record (saved one week from dataset submission)
- The DOI metadata (posted to DataCite)
The live record lives as the relationship between elements in the database, linked to the datasets
table. Thus, the live record can change over time, as taxonomies or linked chronologies change.
The frozen record is generated within a week of dataset submission. It represents the state of the record at the time of upload. This version supports journal requirements for data submissions and aligns with data-management best practices. The frozen record lives in the doi
schema of the database and is stored as a (Postgres) jsonb
data type, along with the datasetid
, the date created and date modified (if neccessary
The DOI metadata is stored with DataCite and is generated from a script in this repository. When a new DOI is minted the DOI and related datasetid is added to the datasetdoi
table.
-
A Neotoma data steward uploads a dataset to Neotoma (Tilia -> Tilia API -> NeotomaDB)
-
Chron job running in
data-dev
checks for all records generated at least one week ago, without a "frozen" version (query in the neotoma_doi repository)
- The script generates a frozen version of the dataset in the table
doi.frozen
in the database. - The function returns a list of aggregated datasetids along with the contact information for the dataset PI.
- [not currently implemented] An email will be sent to each dataset PI with a listed email address. The email will confirm that a DOI or a set of DOIs have been reserved, and that the PI has one week to review the relevant data. It will also indicate that certain metadata (ORCIDs, email, site notes or descriptions) would assist in improving the usefulness of the data. Provide a link to the Explorer and Landing Pages for the data record and a link to (?something?) to facilitate adding the required metadata.
-
The PI of record can contact the steward to update the metadata (or a token can be generated to allow the PI to update things?)
-
The same chron job in #2 will identify records where the
ndb.dataset
entry is older than 14 days, the dataset has an entry indoi.frozen
and no entry inndb.datasetdoi
. This assumes that PIs and stewards have had an opportunity to revise their datasets.
- For each entry
UPDATE
the frozen dataset usingdoi.doifreeze()
. - For each entry run the function
assign_doi()
to build the DataCite XML file, and post the DOI metadata - Send an email to each dataset PI indicating the DOIs have been successfully minted.
This work has been supported by grants from the National Science Foundation: NSF-1541002, NSF-1550855 and NSF-1550707.