# Archiving Dataset

With **fairly**, we can remotely archive and edit datasets in a user account. Users can prepare a dataset for archiving by editing metadata, defining which files are part of a dataset, and uploading them to a data repository. One of the purposes of **fairly** is to *remove the need of preparing metadata and data for every repository to which a dataset will be archived*. Therefore, saving time and effort, and lowering the barriers for practicing Open Science.
This tutorial shows what is possible by using the 4TU.ResearchData repository. The procedure is similar for Zenodo.

**Requirements:**

* A 4TU.ResearchData account
* A personal access token. See [configuring access token](https://fairly.readthedocs.io/en/latest/package/account-token.html) if you don't have one.
* Files to be archived. We will use a hypothetical case in this tutorial.

> For this tutorial, we assume that our goal is to archive a dataset in 4TU.ResearchData, that we previously archived in Zenodo. We will use the dataset [Quality and timing of crowd-based water level class observations](https://zenodo.org/records/3929547), as an example.
   

## 1. Download the Zenodo dataset

First, we need to download the [Quality and timing of crowd-based water level class observations](https://zenodo.org/records/3929547), using its URL. If you did this already in the tutorial on *downloading datasets from Zenodo*, you can skip this step.

In [3]:
import fairly

# Create a Zenodo client
zenodo = fairly.client("zenodo")

# Connect and download a dataset
source_dataset = zenodo.get_dataset("https://zenodo.org/records/3929547") 
source_dataset.store("./quality/") 

## 2. Editing Metadata

Now we can load the downloaded dataset and edit its metadata. For example, we can add a few more *keywords* and edit the *license*.

In [4]:
import fairly

# Load a previously downloaded dataset by passing its path
local_dataset = fairly.dataset("./quality/")

# Display the metadata
print(local_dataset.metadata)

{'access_type': 'open', 'authors': [Person({'fullname': 'Etter, Simon', 'institution': 'University of Zurich, Department of Geography', 'name': 'Simon', 'orcid_id': '0000-0002-7553-9102', 'surname': 'Etter'}), Person({'fullname': 'Strobl, Barbara', 'institution': 'University of Zurich, Department of Geography', 'name': 'Barbara', 'orcid_id': '0000-0001-5530-4632', 'surname': 'Strobl'}), Person({'fullname': 'Seibert, Jan', 'institution': 'University of Zurich, Department of Geography', 'name': 'Jan', 'orcid_id': '0000-0002-6314-2124', 'surname': 'Seibert'}), Person({'fullname': 'van Meerveld, Ilja (H.J.)', 'institution': 'University of Zurich, Department of Geography', 'name': 'Ilja (H.J.)', 'orcid_id': '0000-0002-7547-3270', 'surname': 'van Meerveld'})], 'description': '<p>This are the data and the R-scripts used for the manuscript &quot;Quality and timing of crowd-based water level class observations&quot; accepted for publication in the journal Hydrological Processes in July 2020 as 

In [5]:
# Edit keywords
local_dataset.metadata["keywords"] = ["CrowdWater", "Hydrology", "made by fairly"]

# Edit the license name to match what is required by 4TU.ResearchData
local_dataset.metadata["license"] = "CC BY 4.0"

## 3. Archive to 4TU.ResearchData
Now we can create a new dataset in a 4TU.ResearchData account. We assume a **personal access token** has already been added to `~/.fairly/config.json` 

In [6]:
local_dataset.upload("figshare", notify=fairly.notify)

DataForUploadToZenodo.zip, 26765942/10485760
DataForUploadToZenodo.zip, 26765942/20971520
DataForUploadToZenodo.zip, 26765942/26765942


<fairly.dataset.remote.RemoteDataset at 0x7f62f5326200>

> We could continue uploading files or editing the metadata in a similar way. For now, **publishing** the dataset should be done via the web interface of 4TU.ResearchData.