analysis/FAIR_data.Rmd

---
title: "FAIR data"
author: "Robert Schlegel"
date: '`r format(Sys.Date(), "%d %B %Y")`'
output: workflowr::wflow_html
editor_options:
  chunk_output_type: console
css: acid.css
csl: frontiers.csl
bibliography: FACE-IT.bib
---

```{r global_options, include = FALSE}
knitr::opts_chunk$set(fig.width = 8, fig.align = 'center',
                      echo = FALSE, warning = FALSE, message = FALSE, 
                      eval = TRUE, tidy = FALSE)

# Libraries
library(tidyverse) # The tidy dialect of R
library(kableExtra) # For formatting static tables
```

<center>
![](assets/FACE-IT_Logo_900.png){ width=70% }
</center>

# Overview

While some online data repositories (e.g. Zenodo) are very quickly and conveniently provide a DOI (therefore generally making it acceptable for project proposals etc.), many of these repositories do not ensure that the data undergo any quality control.

In the FAIR data scheme, Zenodo allows for data to be Findable and Accessible. Though the findability i.e. search functionality in Zenodo is not very sophisticated, meaning that most users wouldn't find the dataset unless they have the link or they have a good idea of what they are looking for. The main issue with Zenodo comes mostly from the Interoperability and Reusability of the data. Because Zenodo has no requirements for what can be uploaded, it is a "Wild West" situation where a user never knows what exactly they may have to work with.

As for PANGAEA, even though it takes much longer to get ones data hosted there, it has very strict requirements on data quality and formatting. There is a sophisticated search platform on the website, in addition to an R package that allows data searching and downloading directly from R/RStudio. Part of the quality control is ensuring that all data are classified into pre-existing names and units, helping to allow users to integrate existing datasets into their future projects. Without that the I and R of the data is greatly diminished.

In the context of the FACE-IT project specifically, a large amount of the budget was allocated to host data on PANGAEA, and support to upload those data is reserved via WP1, which is why it is the preferred platform. Without these two things I understand why Zenodo would be preferable. It is arguably the best option when one needs only to quickly generate a DOI for a given dataset and nothing more.

Looking at the Zenodo website, I do see that it is funded by Horizon2020. So I see why this e-mail must seem a bit odd.

All of that being said, we are not absolutely required to host everything on PANGAEA. Other data hosting websites with some sort of institutional affiliation, for example NMDC, NPDC, SIOS, GEM, etc. are fine.

## Findable

## Accessible

## Interoperable

## Reusable