# BioImagePy: experiment data processing

In this tutorial, we are going to write a pipeline for batch processing data stored in an `experiment`. **BioImagePy** comes with a class called `Pipeline`. This class run on an `experiment`. We add a sequence of precesses on the pipeline class and run the processes on data selected by simple queries.

## The experiment

In this tutorial, we will use the same experiment than the one created in the *tutorial1-experiment*. The experiment goal is explained in the *tutorial1*. As a reminder, we have two populations of images contained spots in noisy images. In this tutorial, we are going to segment and count the spots, and then run a statistical test to conclude if the two population have significantly different number of spots. 

In [None]:
# BioImagePy import and initialisation
import sys
sys.path.append("../bioimagepy") # change this path to the path where you install bioimagepy

from bioimagepy.config import ConfigAccess
from bioimagepy.process import ProcessAccess
ConfigAccess('../bioimagepy/config_sample.json') 

In [None]:
# in this cell we create the experiment, import and anotate data
from bioimagepy.experiment import Experiment

# create the experiment
my_experiment = Experiment()
my_experiment.create(name="myexperiment", 
                     author="Sylvain Prigent", 
                     uri="../userdata/") 

# import the data
my_experiment.import_dir(dir_uri='./synthetic_data/data/',
                         filter="\.tif$",
                         author='Sylvain Prigent', 
                         format='tif', 
                         date='2019-03-17', 
                         copy_data=True)

# tag population
my_experiment.tag_from_name("Population", ['population1', 'population2'] )
# tag ID
my_experiment.tag_using_seperator(tag="ID", separator="_", value_position=1) 

my_experiment.display()

## The Pipeline class

The **BioImagePy** pipeline module contains the main class `Pipeline`. It's the only class we are going to use to process the data. Let's look the module documentation: 

In [None]:
import bioimagepy.pipeline as pipelinepy

help(pipelinepy)

### Initialize the pipeline 


In [None]:
from bioimagepy.pipeline import Pipeline
pipeline = Pipeline(my_experiment)


## Processing

In the following of this tutorial, we are going to process the data in 3 steps.
1. Image deconvolution on each image to ease the spot segmentation
2. Auto thresholding and particle analysis on the deconvolutes images
3. Statistical testing with the wilcoxon test to conclude if the two population have significantly different number of spots

The 3 proposed steps are just one possible way to analyse the data, to illustrate the use of BioImagePy. Many other processing workflow are possible to analyse this dataset, but it is not the purpose of this tutorial.

### Step 1: image deconvolution

To make the spot segmentation easier to identify, we chose to preprocess the data with a deconvolution algorithm. The selected algorithm is the SVDeconv2D implemented in c++. It is available in **BioImagePy** with the ID `svdeconv2d_v0.1.0`. 

We thus add a process with the name `svdeconv2d_v0.1.0`. To setup the process we need 3 methods:
1. `set_parameters` to set the all the process parameters as pairs name, value list
2. `set_dataset_name` to set the name of the dataset where the outputs will be stored. This method is not mandatory. If you do not use it, the process name will be used as a default dataset name
3. `add_input` for each input of the process. Here we have only one input 'i' the input image, and we set it to the dataset `data` with an empty filter in order to process all the data in the raw dataset called `data`

Finally we call the `run` method to run the process

In [None]:
# visualize the inputs and parameters of the svdeconv2d process
svdeconv2d = ProcessAccess().get('svdeconv2d_v0.1.0')
svdeconv2d.man()

In [None]:
# run the process
process1 = pipeline.add_process(svdeconv2d)
process1.set_parameters('sigma', 2, 'regularization', 2, 'weighting', 0.1, 'method', 'SV')
process1.set_dataset_name('deconv')
process1.add_input('i', 'data')
process1.run()

Let visualize the obtained results:

In [None]:
from bioimagepy.data import ProcessedData
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

# get one processeddata and it origin raw data
processed_data = ProcessedData(my_experiment.get_data('deconv', 'Population=population1 AND ID=001')[0])
origin_data = processed_data.get_origin()

# plot the origin data
img=mpimg.imread(origin_data.metadata.uri)
imgplot = plt.imshow(img)
plt.show()

# plot the processed data
img=mpimg.imread(processed_data.metadata.uri)
imgplot = plt.imshow(img)
plt.show()

### Step 2: Image segmentation

In this step, we apply an automatique threshold and a particle analysis in each images in order to obtain the number of spots in each images.

The selected algorithm is a Fiji macro that runs an auto-threshold and the analyse particles tool. It is available in **BioImagePy** with the ID `threshold_particles_v1.0.0`. 

We can then run it the same way that the previous process:

In [None]:
# visualize the inputs and parameters of the svdeconv2d process
threshold_particles = ProcessAccess().get('threshold_particles_v1.0.0')
threshold_particles.man()

In [None]:
# run the process
process2 = pipeline.add_process(threshold_particles)
process2.set_parameters('threshold', 'Default')
process2.set_dataset_name('particles')
process2.add_input('input', 'deconv')
process2.run()

### Step 3: Statistical test

In the previous step, the segmentation algorithm calulated the number of spots for each images. Then, we now need to perform a statistical test to measure if the spot number is statistically different for the two populations.

We will use the Wilcoxon test available in **BioImagePy** using the `wilcoxon.xml` wrapper. 


In [None]:
process3 = pipeline.add_process(ProcessAccess().get('Wilcoxon_v1.0.0'))
process3.set_dataset_name('wilcoxon')
process3.add_input('x', 'particles', 'Population=population1', 'count')
process3.add_input('y', 'particles', 'Population=population2', 'count')
process3.run()

With the code above, the Wilcoxon test has been performed between *population1* and *population2* using the result `count` of the process `Count_partices`. 

Because the `Wilcoxon_v1.0.0` process works on merged data, we do not need to 'manualy' merge the results in an array files since **BioImagePy** do it for us.

Now we can read the results of the Wicoxon test:

In [None]:
wilcoxon_pvalue_file = ProcessedData(my_experiment.get_data('wilcoxon', 'name=p')[0]).metadata.uri

with open(wilcoxon_pvalue_file, 'r') as content_file:
    p = content_file.read()    
print('p-value=', p)    