# ioSPI
In this tutorial we will explain the basic pipeline of ``ioSPI``. Library which provides functionalities to work with cryo-EM data.

## Particle Metadata

In the first part of the tutorial we show how to create a `.star` file using the module `particle_metadata`. This module basically format and write particle metadata as `.star` files, following RELION conventions.

In [1]:
import os
import sys
import warnings

sys.path.append(os.path.dirname(os.getcwd()))
warnings.filterwarnings('ignore')

In order to create the `.star` file is necessary to provide information about the experiment, such as the image pixel size and image center shift. This information is passed in the form of a list and a `Config` object.

In [5]:
from ioSPI import particle_metadata

class Config:
    """Class to instantiate the config object."""
    def __init__(self, ctf, shift):
        self.ctf = ctf
        self.shift = shift

In [6]:
data_list = [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]]
config = Config(ctf=True, shift=True)

The names of the metadata for the star file (in RELION convention) can be accessed using the function `get_starfile_metadata_names` passing a `Config` object.

In [7]:
variable_names = particle_metadata.get_starfile_metadata_names(config)
print(variable_names)

['__rlnImageName', '__rlnAngleRot', '__rlnAngleTilt', '__rlnAnglePsi', '__rlnOriginX', '__rlnOriginY', '__rlnDefocusU', '__rlnDefocusV', '__rlnDefocusAngle', '__rlnVoltage', '__rlnImagePixelSize', '__rlnSphericalAberration', '__rlnAmplitudeContrast', '__rlnCtfBfactor']


Using the list of values and the `Config` object it is possible to format the `.star` file that will be later saved with the `format_metadata_for_writing_cryoem_convention` function, which creates a dataframe with the data.

In [8]:
metadata_df = particle_metadata.format_metadata_for_writing_cryoem_convention(data_list=data_list, config=config)
metadata_df

Unnamed: 0,__rlnImageName,__rlnAngleRot,__rlnAngleTilt,__rlnAnglePsi,__rlnOriginX,__rlnOriginY,__rlnDefocusU,__rlnDefocusV,__rlnDefocusAngle,__rlnVoltage,__rlnImagePixelSize,__rlnSphericalAberration,__rlnAmplitudeContrast,__rlnCtfBfactor
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14


After formatting the data we can use the function `write_metadata_to_starfile` providing the metadata, the path and name of the star file. 

In [9]:
metadata_path = os.path.join(os.getcwd(), "data")
filename = "metadata.star"
particle_metadata.write_metadata_to_starfile(path=metadata_path, metadata=metadata_df, filename=filename)

Finally, we check whether a `.star` file with the name `metadata.star` was created or not, using the function `check_star` function which will raise an exception if the file is not found.

In [10]:
particle_metadata.check_star_file(os.path.join(metadata_path, filename))

The file was successfully created as shown by printing its content.

In [11]:
with open(os.path.join(metadata_path, filename)) as star_file:
    print(star_file.read())
    star_file.close()

# Created by the starfile Python package (version 0.4.11) at 11:16:57 on 09/03/2022

data_

loop_
___rlnImageName #1
___rlnAngleRot #2
___rlnAngleTilt #3
___rlnAnglePsi #4
___rlnOriginX #5
___rlnOriginY #6
___rlnDefocusU #7
___rlnDefocusV #8
___rlnDefocusAngle #9
___rlnVoltage #10
___rlnImagePixelSize #11
___rlnSphericalAberration #12
___rlnAmplitudeContrast #13
___rlnCtfBfactor #14
1	2	3	4	5	6	7	8	9	10	11	12	13	14





# Datasets

In the second part of the tutorial we will show how to manage cryo-EM datasets using the `datasets` module from `ioSPI`. This module uses the Open Science Foundation (OSF) framework, which is an initiative that aims to increase the openness, reproducibility and integrity of scientific research. Among other functionalities, it is possible to upload scientific data which can be accessed by an Application Programming Interface (API). ``ioSPI`` offers functionalities that allow uploading and accessing cryo-EM data using the OSF APIv2. This notebook is by no means a tutorial for the OSF API, but we will introduce some basic concepts used here. For more information you can access <https://developer.osf.io/>. The two main components from the OSF that we use here are:

* GUID: Every file, project, and component on the OSF gets a Globally Unique ID (GUID). The GUID is the five characters after the <https://osf.io/> in the web address. For instance, we provide cryo-EM Datasets with GUID 24htr. There you will find cryo-EM datasets for different proteins.
* Nodes: On OSF files, projects, and components are called nodes, they can be either public or private and it is where the data is contained. For example, <https://osf.io/24htr/> is a node containing cryo-EM data.

In [12]:
from ioSPI import datasets

Before following this tutorial you will have to create your own node for upload cryo-EM data and set an access token. The access token is responsible to manage which actions can be performed by the user who possesses the token. After this step it is possible to instantiate an `OSFUpload` object, to access this OSF node using the OSF API by informing the GUID of the node and the access token. 

In [14]:
token = 'PLACE_YOUR_TOKEN_HERE'

guid = 'PLACE_YOUR_PARENT_GUID_HERE'

osf = datasets.OSFUpload(token = token, data_node_guid= guid)
print(osf.headers)




Now we will create a child node inside the parent node, for representing the dataset with 80s ribosome data. We will use the pdb id as the name for this new node, and the function will return its GUID. Since the child is also a node, it can accessed separately from the parent node.

In [15]:
pdb_id = '4v6x'
child_guid = osf.write_child_node(parent_guid = guid, title = pdb_id)
print(child_guid)

8yg32


Having created a child node for putting the files related to the 80s ribosome, we can upload files to it by using the function `write_files`.

In [16]:
#Create a list of the files to be uploaded
file_paths = [os.path.join(os.getcwd(), pdb_id + '.pdb')]

#Write files at the child node
osf.write_files(dataset_guid = child_guid, file_paths = file_paths)

Uploaded c:\Users\Luis\Documents\GitHub\ioSPI\notebooks\4v6x.pdb 


True

We can now check if it was uploaded correctly by checking if the function `read_structure_guid` will return a GUID corresponding to the pdb id passed as parameter.

In [17]:
osf.read_structure_guid(pdb_id)

'8yg32'