# Download and Upload with OSF

This tutorial shows how to download and upload cryo-EM datasets using the `datasets` module from `ioSPI`, that interact with the [Open Science Foundation (OSF)](https://osf.io/) framework.

OSF is an initiative that aims to increase the openness, reproducibility and integrity of scientific research. Among other functionalities, it is possible to upload scientific data which can be accessed by an Application Programming Interface (API). 

``ioSPI`` offers functionalities that allow uploading and accessing cryo-EM data using:
- either, in order to get started: using the class `Project` that leverages the package `osfclient`
- or, if the user requires finer control: using the class `OSFUpload` which follows `OSF APIv2`. 

This tutorial introduces both options.

# Set-up

First, you will need to get setup with osf.

- Create an account on https://osf.io/ and save the email address you use.
- On this account, create a personal token in [Settings](https://osf.io/settings/tokens) and save it.

The email address and the token will be needed to connect to different OSF projects.

We import the `datasets` module from `ioSPI`:

In [2]:
import sys
sys.path.append('../')

from ioSPI import datasets

# Getting Started

## Configure your credentials to access the OSF Project

Find the OSF project from which you wish to download your data. 

In this tutorial, we use a project called "cryoEM simulated" which contains simulated images from the 80s human ribosome. This project is on osf at the url: "https://osf.io/7g42j/".

- Save the ID of the project of interest, which appears in the project's url.

In our case, the project ID is `7g42j`.

- Create an object from the class `Project` using:
  - your credentials from the set up: email address and token,
  - the project ID that you just saved.

In [4]:
cryoem_simulated_project = datasets.OSFProject(
    username="ninamio78@gmail.com", 
    token="HBGGBOJcLYQfadEKIOyXJiLTum3ydXK4nGP3KmbkYUeBuYkZma9LPBSYennQn92gjP2NHn",
    project_id="xbr2m")

OSF config written to .osfcli.config!


You have successfully set up the configuration of the OSF project!

## List Files on the OSF Project

Now you can list the files available on this OSF project. Note that this code can take a few minutes to run.

In [5]:
cryoem_simulated_project.ls()

Listing files from OSF project: xbr2m...
osfstorage/new_4v6x_randomrot_copy0_defocus3.0_yes_noise.txt


We observe that this project contains many files, organized in different folders.

 ## Download Files from the OSF Project

We can download one of these files, e.g. choosing from the above list the following txt file:

- `osfstorage/randomrot1D_nodisorder/4v6x_randomrot_copy0_defocus3.0_yes_noise.txt`.


In [4]:
cryoem_simulated_project.download(
    remote_path="osfstorage/randomrot1D_nodisorder/4v6x_randomrot_copy0_defocus3.0_yes_noise.txt", 
    local_path="4v6x_randomrot_copy0_defocus3.0_yes_noise.txt")

Downloading osfstorage/randomrot1D_nodisorder/4v6x_randomrot_copy0_defocus3.0_yes_noise.txt to 4v6x_randomrot_copy0_defocus3.0_yes_noise.txt...
Done!


100%|██████████| 4.22k/4.22k [00:00<00:00, 19.1Mbytes/s]


## Upload Files to an OSF Project

Importantly, OSF will not let you upload data to any folder: authorization is requested.

To test this functionality, you can create a new project through osf.io (https://osf.io/myprojects/) by clicking: `Create project`.

This will create a new project page, as the one we are using here.
- Save the project ID of the project you just created!

You should then create a new `my_project` object of the class `datasets.Project` with the new project ID.

For the purpose of this tutorial, however, we will stay with our original project cryoEM simulated and use our object `cryoem_simulated_project`.

We re-upload the file that we just downloaded, renaming it by adding a `new_version` prefix to its name. We will first create a child node inside the parent node which corresponds to our root directory. Then, we will upload the file to ths child node.

To do this, let's create an instance of ``OSFUpload`` class which takes care of uploading data to ``osf.io``. Provide the personal token and project ID.

In [None]:
cryoem_simulated_project.upload("osfstorage/new_4v6x_randomrot_copy0_defocus3.0_yes_noise.txt", "4v6x_randomrot_copy0_defocus3.0_yes_noise.txt")


In [6]:

cryoem_simulated_project.delete("osfstorage/new_4v6x_randomrot_copy0_defocus3.0_yes_noise.txt")


Deleting osfstorage/new_4v6x_randomrot_copy0_defocus3.0_yes_noise.txt in the project...
Done!


In [None]:

osf = datasets.OSFUpload(token=cryoem_simulated_project.token, data_node_guid=cryoem_simulated_project.project_id)
print(osf.headers)

Now we create a child node inside the parent node, for representing the dataset with 80s ribosome data. We will use the pdb id as the name for this new node, and the function will return its ID. Since the child is also a node, it can be accessed separately from the parent node.

In [6]:
pdb_id = '4v6x'
child_guid = osf.write_child_node(parent_guid=cryoem_simulated_project.project_id, title= pdb_id)
print(child_guid)

8wz6g


Now, we finally upload the files related to the 80s ribosome to the child node using ``write_files`` function. Note the ``file_paths`` must be a list, thus ``[]`` is needed around the filepaths.

In [7]:
osf.write_files(child_guid, ["4v6x_randomrot_copy0_defocus3.0_yes_noise.txt"])

Uploaded 4v6x_randomrot_copy0_defocus3.0_yes_noise.txt 


True

We can now check if it was uploaded correctly using the function `read_structure_guid` which will return an ID corresponding to the pdb id passed as a parameter.

In [8]:
osf.read_structure_guid(pdb_id)

'ezh4k'

Congratulations! You have successfully downloaded and uploaded data from/to OSF.