# Using the JupterLab Extension

This tutorial shows how to use the JupyterLab extension to clone and create research datasets using the graphical inteface of JupyterLab, and how to upload dataset  to popular research data repositories.

If you haven not done so, [install the full toolset.](https://fairly.readthedocs.io/en/latest/installation.html)


## Start JupyterLab

1. Star JupyterLab with the **fairly** extension. This will start JupterLab in your browser.

### Linux / MacOS

From the terminal, run: 

```shell
jupyter lab
```

### Windows

You will us the Shell Terminal to start JupyterLab.
> IMPOTANT: For the following to work, you need Pyton in the PATH environment variable on Windows. If your not sure that is the case. Open the Shell, and type `python --version`. You should see the version of Python on the screen. If you see otherwise, follow these steps to [add Python to the PATH on Windows](https://realpython.com/add-python-to-path/#how-to-add-python-to-path-on-windows)

On the shell type the following and press `Enter`:
```shell
jupyter lab
```

JupyterLab should automatically start on you browser.

![jupyterlab](../img/start-jupyterlab.png)


## Part 1: Clonning Dastasets

Public research datasets can be cloned (copy and downloaded) directly to an empty directory, using the dataset **URL** or **DOI**. We will use [this datset](https://data.4tu.nl/articles/dataset/Earthquake_Precursors_detected_by_convolutional_neural_network/21588096) from 4TU.ResearchData an example.


Using the JupyterLab inteface create a new directory called `workshop`. Notice that the contents of your main directory would be different.

![create directory](../img/create-directory.png)

1. Inside the workshop directory, create a new directory called `clone`
2. Right click on the left panel to open the context menu
3. Click on `Clone Dataset`
4. Copy and paste the URL for the example dataset on the dialog window
5. Click `Clone`

![clone](../img/clone1.png)

![clone-dialog](../img/clone2.png)

After a few seconds, you should see a list of files on JupyterLab. All the files, except for `manifest.yaml` are files that belong to the dataset in the research repository. The file `manifest.yaml` is automatically created by the Fairly Toolset, and it contains metadata from the research data repository, such as:

- Authors 
- Keywords
- License
- DOI
- Files in the dataset
- etc.


## Part 2: Create a Fairly Dataset from Scratch

Now, we will show you how can you create and prepare your own dataset using the JupyterLab extension of *fairly*.

   1. Create a new directory called `mydataset` inside the *workshop directory*.
   2. Inside `workshop/mydataset/`. Open the context menu and click on `Create Fairly Dataset`
   3. Select `Zenodo` as template from the drop-down list.
   4. Click `Create`. A `manifest.yaml` file will add to the *mydataset* directory

![create-dataset1](../img/create-dataset1.png)
![create-dataset2](../img/create-dataset2.png)


### Include Files in your Dataset

Add some folders and files the `mydataset` directory. You can add files of your own, but be careful not to include anything that you want to keep confidential. Also consider the total size of the files you will add, the larger the size the longer the upload will take. Also remember that for the current Zenodo API each file should be `100MB` or smaller; this will change in the future.

If you do not want to use files from your own, you can download and use the [dumy-data](https://drive.google.com/drive/folders/160N6MCmiKV3g-74idCgyyul9UdoPRO8T?usp=share_link) 

After you have added some file and/or folders to `mydataset`, JupyterLab should look something like this:

![my-dataset](../img/my-dataset.png)


### Editing the Manifest

The `manifest.yaml` file contains several sections to describe the medatadata of a dataset. Some of the sections and fiels are compulsory (they are required by the researh data repository), others are optional. In this example you started a *fairly* dataset using the template for the Zenodo repository, but you could also do so 4TU.ResearchData. 

However, if you are not sure which repository you will use to publish a dataset, use the `Default` template. This template contains the most common sections and fields for the repositories supported by the Fairly Toolset.

>Notice that independently of which template you use to start a dataset, the `manifest.yaml` file is interoperable between data repositories, with very few exceptions. This means that you can use the same manifest file for various data repositories. Different templates are provided only as a guide to indicate what metadata is possible to provide in each data repository. 

1. Open the `manifest.yaml` file using the context menu, or by doble-clicking on the file

![open-metadata](../img/open-metadata.png)

2. Edit the dataset metadata by typing in `manifest.yaml` file, as follows. Here, we use only a small set of fields that are possible for Zenodo.

    ```yaml
    metadata:
    type: dataset
    publication_date: "2023-03-22"
    title: "My Dataset"
    authors: 
        - fullname: Your-Surname, Your-Name
          affilication: Your institution
    description: A dataset from the Fairly Toolset workshop
    access_type: open
    license: CC0-1.0
    doi: ""
    prereserve_doi:
    # Set 'true' to reserve a DOI.
    keywords:
    - workshop
    - dummy data
    notes: ""
    related_identifiers: []
    communities: []
    grants: []
    subjects: []
    version: 1.0.0
    language: eng
    template: zenodo
    files:
      includes:
      - ARP1_.info
      - ARP1_d01.zip
      - my_code.py
      - Survey_AI.csv
      - wind-mill.jpg
      - wind-mill.jpg
    excludes: []
    ```

> The `includes`  field must list the files you want to include as part of the dataset. They will be uploaded to the research data repository. The `excludes` field can be use when you want to explicitly indicate what files you don't want to include as part of the datasets, for example, files that contain sensitive information.

## Part 3: Upload Dataset to Zenodo

In the last part of this tutorial, we explain how to upload a dataset to an existing account in Zenodo. If you do not have an account yet, you can [sign up in this webpage.](https://zenodo.org/signup/)

### Create Personal Token

A personal token is a way in which data repositories identify a user. We need to set a token for creating datasets in the repository and uploading files to an specific account.

1. Sign in to Zenodo. 
2. On the top-right corner click on drop-down arrow, then *Applicaitons*.
3. On the section *Personal access tokens*, click the *New token* button.
4. Enter a name for your token, for example: `workshop`
5. For scopes, check all three boxes, and click *Create*
6. Copy the token (list of characters in red) to somewhere secure. You will only see the token once.
7. Under *Scopes*, check all three boxes once more. Then click *Save*

![token-zenodo](../img/zenodo-token.png)

### Configure Fairly for Uploads

Now, you configure *fairly* to you your token.

**Windows**

1. Using the Windows File Explorer, go to`C:\Users\<You-user-name>\
2. Create a directory called `.fairly`
3. Inside `~/.fairly`, create a file file called `config.json`. You may need to change the explorer settings to show the file extension or *fairly* will not be able to read the token.
4. Copy the following test into this file, and add your token under **zenodo**

    ```json
    {
     "fairly": {
             "orcid_client_id":"APP-IELS3LR4OCLHLELC",
             "orcid_clien_secret": "",
             "orcid_token": ""
     },
     "4tu": {
     "token": "<your-token>"
     },
     "zenodo": {
     "token":"<your-token>"
     }
    }
    ```
5. Save the changes to the file



**Linux/MacOS**

1. In your user home directory `~/`, create hidden directory called `.fairly`
2. Inside `~/.fairly`, create a file file called `config.json`
3. Copy the following test into this file, and add your token under **zenodo**

    ```json
    {
     "fairly": {
             "orcid_client_id":"APP-IELS3LR4OCLHLELC",
             "orcid_clien_secret": "",
             "orcid_token": ""
     },
     "4tu": {
     "token": "<your-token>"
     },
     "zenodo": {
     "token":"<your-token>"
     }
    }
    ```
4. Save the changes to the file

### Upload Dataset

Go back to JupyterLab and navigate to the  `mydataset` directory. 

1. On the left panel, do right-click, and then click `Upload Dataset`
2. Select Zenodo from the dowp-down list, and click `Continue`
3. Confirm that you want to upload the dataset to Zenodo by ticking the checkbox.
4. Click `OK`. The download will take a moment to complete.
5. Go to your Zenodo and click on `Upload`. The `my dataset` should be there. 

![zenodo-upload](../img/zenodo-upload.png)

> Explore the dataset and notice that all the files and metadata you added in JupyterLab has been automatically added to the new dataset. You should also notice that the dataset is not **published**, this is on purpose. This gives you the oportunity to review the dataset before deciding to publish. In this way we also prevent a user to publish dataset by mistake.

> Notice that in the current version of the JupyterLab extension, repeating the steps to upload a dataset will create a new entry in the repository. In the future we will develop the extension further to allow to update existing datasets and sincronize changes.