# **<center>SPARC FAIR Codeathon 2022</center>**
<center>
<a href="https://sparc.science">
<img src="https://sparc.science/_nuxt/img/logo-sparc-wave-primary.8ed83a5.svg" alt="SPARC" width="150"/>
</a>
</center>
<center>
<a href="https://sparc.science/help/2022-sparc-fair-codeathon">
<img src="https://images.ctfassets.net/6bya4tyw8399/2qgsOmFnm7wYIfRrPrqbgx/ae3255858aa12bfcebb52e95c7cacffe/codeathon-graphic.png" alt="FAIR" width="75">
</a>
</center>

## <center>Mapping 2D **SPARC** data points to a 3D scaffold: a tutorial</center>


## **Introduction**
Welcome to the Quilted tutorial! We will be demonstrating different features from the [**SPARC**](https://sparc.science/) project. The goal will be to project the 2D locations of neurites in the rat stomach onto a 3D scaffhold of the organ. The data points and the 3D scaffhold will be pulled from **SPARC** datasets. Because the data is [**FAIR**](https://www.nature.com/articles/sdata201618) we will be combining three different datasets of the spatial distribution of the vagal afferents and efferents. 


## **Setting up a virtual environment**
We assume that you have already followed the instructions in the **Getting started** section of the [README](https://github.com/SPARC-FAIR-Codeathon/SPARC-Tutorial).
The first step will be to create a virtual environment in which we will be able to run this tutorial. We will install the needed package to setup the virtual environment.

## **Installing the dependencies**
This tutorial relies on several Python packages that have been developed as part of the **SPARC** project. We will be installing them in order to complete this tutorial. However, for each one of these packages, there is a GUI application that is available. Links to setting up each one of them will be provided. 

In [9]:
!pip install pandas
!pip install openpyxl

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


## **Retrieving the data**
Now that all the dependencies have been installed we will retrieve the data from directly from the [**SPARC**](https://sparc.science) project website. 
We will be using three datasets:
 * [vagal afferents associated with the myenteric plexus of the rat stomach](https://sparc.science/datasets/10?type=dataset&datasetDetailsTab=files)
 * [vagal afferents within the longitudinal and circular muscle layers of the rat stomach](https://sparc.science/datasets/11?type=dataset&datasetDetailsTab=files)
 * [vagal efferents associated with the myenteric plexus of the rat stomach](https://sparc.science/datasets/12?type=dataset&datasetDetailsTab=files)
 
You can search through all of the **SPARC** datasets [here](https://sparc.science/data?type=dataset) or simply click on the links above to be redirected directly to the datasets. 

It is possible to downlowd the entire dataset by clicking on the purple ***Download full dataset*** button  in the **Download Dataset** tab or selecting specific files and folders in the **Dataset Files** tab lower in the page. If you haven't used the links above, you can click on the purple ***Get Dataset*** button on the left side of the screen or directly in the ***Files*** tab. 

For this tutorial, we are only interested in the contents of the _derative_ folder which contains two .xlsx files: one with the data (IGLE_data.xlsx, IMA_analyzed_data.xlsx, and Efferent_data.xlsx) and a manifest (manifest.xlsx). Enter the _derivative_ folder and select the xlsx file containing the data by ticking the box in front of it. Download the file by clicking the **Download Selected Files and Folders** button at the bottom. You will then be prompted to select the location in which to save it. For each dataset, save it in the _SPARC-tutorial_ folder. 

### **Pennsieve**
[Pennsieve](https://app.pennsieve.io/) is the cloud-based solution for managing, analysing, and sharing scientific **SPARC** datasets.

### **Imports**
Here we import all of the dependencies that we will need to run the code correctly.

In [30]:
import pandas as pd

Let us start with loading the data we have downloaded in Python.
For this we are going to define some helper functions which relies on the pandas library.It will take as arguments the name of the .xlsx file we wish to load, the name of the columns we want to keep and the limits for the y and z direction. The output is a DataFrame called df with the desired columns.

In [31]:
def get_position(percent, min_val, max_val):
    """ Converts the position from percentage to distance
    
    Input:
    percent -- float, percentage value.
    min_val -- float, minimum distance for conversion.
    max_val -- float, maximum distance for conversion.
    
    Return:
    converted_value -- float, converted value.
    
    """
    return percent / 100 * (max_val - min_val) + min_val 

def load_data(data_name, col_keeps, y_lims, z_lims):
    """ Loads the data from an .xlsx file
    
    Input:
    data_name -- str, nane of the .xlsx file to read.
    col_keeps -- dict{str:str}, dictionnary containing the names of the columns
        to keep.
    y_lims -- list[int], limits for the y direction to convert back to mm,
            first element is the minimum and second is the maximum.
    z_lims -- list[int], limits for the z direction to convert back to mm,
        first element is the minimum and second is the maximum.
    
    Return:
    df -- DataFrame, data frame containing the desired data.
    
    """
    df = pd.read_excel(data_name)
    # remove unnecessary columns
    for col in df.columns:
        if col in col_keeps:
            df.rename(columns = {col:col_keeps[col]}, inplace = True)
        else:
            df.drop(col, axis=1, inplace=True)
    df['y'] = get_position(df['%y'], y_lims[0], y_lims[1])
    df['z'] = get_position(df['%x'], z_lims[0], z_lims[1]) # x becomes z
    df['-%y'] = 100 - df['%y']
    # change the area to mm
    return df

Now, let us setup some variables that we will need to prepare the data for mapping and plotting. In these datasets, the distances are in percentages and we need to define the references to convert them back in mm.

In [32]:
## setup maximimum x and y width
z_lims = [0, 36.7]
y_lims = [4.6, 0]

col_keeps = {'%x (distance from pylorus side)':'%x', '%y (distance from bottom)':'%y',
             'Average IGLE Area (um²)':'area', 'Area Of Innervation':'area', 
             'Neuron Area Of Innervation (um²) -Convex Hull':'area'}

We can now load the locations of the nerves into DataFrame:

In [33]:
df_igle = load_data('IGLE_data.xlsx', col_keeps, y_lims, z_lims)
df_ima = load_data('IMA_analyzed_data.xlsx', col_keeps, y_lims, z_lims)
df_efferent = load_data('Efferent_data.xlsx', col_keeps, y_lims, z_lims) 