Update data-curation.rst [skip ci]

danieljprice · Jan 31, 2024 · 2e7f7b8 · 2e7f7b8
1 parent fb2df88
commit 2e7f7b8
Showing 1 changed file with 32 additions and 6 deletions.
diff --git a/docs/data-curation.rst b/docs/data-curation.rst
@@ -1,22 +1,48 @@
-Long-term archiving of your phantom calculations
+Publishing the data from your phantom calculations
 ==================================================================
-One of the biggest headaches we have with shared supercomputer projects
-is that inevitably somebody fills whatever disk quota was allocated,
-and the project halts for everyone. A true tragedy of the commons. To solve this, shift your data somewhere more permanent.
+Recommended best practice for open science is that parameter files, initial conditions
+and snapshots from calculations with phantom should be made publicly available on publication.
+
+FAIR Principles
+----------------
+According to the `FAIR principles for scientific data management <https://ardc.edu.au/resource/fair-data/>`__, your data should be:
+
+- Findable, e.g. with links to and from the paper publishing the simulations
+- Accessible, available for free in a publicly accessible repository
+- Interoperable, data is labelled and able to be reused or converted
+- Reusable, include enough information to be able to reproduce your simulations
 
 Data curation
 -------------
 For calculations with phantom that have been published in a paper,
-best practice is to upload the **entire calculation including .in and
+ideal practice is to upload the **entire calculation including .in and
 .setup files, .ev files and all dump files in a public repository**.
 
 See for example a dataset from Mentiplay et al. (2020) using figshare: `<https://doi.org/10.6084/m9.figshare.11595369.v1>`_
 
 Or this example from Wurster, Bate & Price (2018) in the University of Exeter repository: `<https://doi.org/10.24378/exe.607>`_
 
+However, size limitations may restrict preservation of all data, in which case we recommend saving:
+
+- .in files
+- .setup files
+- .ev files
+- dump files used to create figures in your paper, with a link to splash or sarracen in the metadata for how to read/convert these files
+- dump files containing initial conditions, if these are non-trivial
+- metadata including link to your publication or arXiv preprint, link to the phantom code, code version information and labelling of data corresponding to simulations listed in your paper
+
+Zenodo community
+----------------
+To facilitate better data sharing between phantom users, we have set up a Zenodo community:
+
+   https://zenodo.org/communities/phantom
+
+Please join this community and let's learn from each other to create best-practice data curation. 
+Zenodo currently has a 50Gb limit on data size, which is sufficient for the recommended list of files to save above.
+
 Archiving your data to Google Drive using rclone
 ------------------------------------------------
-You can use rclone to copy data from a remote cluster or supercomputing facility to Google Drive. For universities with institutional subscriptions, this provides almost unlimited storage.
+You can use rclone to copy data from a remote cluster or supercomputing facility to Google Drive. This is not recommended as a long term storage solution but can facilitate short-term data sharing between users.
 
 Set this up by logging into your supercomputer and typing::