<h1>OpenMSI Arrayed Analysis Tool</h1>
<h2>Introduction</h2>
Mass spectrometry imaging (MSI) enables the mass analysis of thousands of spatially defined samples, and can be applied in the high-throughput screening of, for example, enzyme activity or compound libraries. Here we present the OpenMSI Arrayed Analaysis Tool, an iPython based software tool for the analysis of spatially defined samples with MSI.

This tutorial Notebook ([name notebook](link to notebook in reposetory)) will demonstrate the basic features of the OpenMSI Arrayed Analaysis Tool, with step by step guidance on how to run the iPython Notebook. A version without markdown is also available ([name notebook](link to notebook in reposetory)). To complete this tutorial, one would need:
<ul>
<li>Jupyter/iPython. OpenMSI Arrayed Analaysis Tool requires Jupyter version 4.1 and Python version 2.7. Further information on iPython/Jupyter can be found at and http://ipython.org/. </li>
<li>An OpenMSI account. Users need to get an OpenMSI account in order to use this tool. An OpenMSI account can be obtained through the OpenMSI team and NERSC (https://openmsi.nersc.gov/openmsi/client/omsiAccount) at no cost.</li>
</ul>

Users don't have to provide a MSI data file. For this tutorial, we have selected the usage of the [name dataset](link to dataset in openmsi) MSI data set (as used in the [manuscript](link to paper), which is publicly available. 

<h2>How to use this iPython/Jupyter Notebook</h2>
*An interactive demo for new users of iPython/Jupyter notebooks can be found at [Nature](http://www.nature.com/news/ipython-interactive-demo-7.21492)*

In this tutorial, there are 2 types of content, namely text and code. This content is placed in boxes called "cells". If you click around on this page, you'll see different cells highlighted. To execute each cell (regardless of content), you hit on your keyboard SHIFT+ENTER or press the play button. If the cell contains text, the content will be displayed directly. If the cell contains code, the code will be executed. 

<h2>Loading Arrayed Analysis Tool</h2>
Execute the cell below to load the OpenMSI Arrayed Analysis Tool. 
<p>_--When succesful, the message "Completed loading OpenMSI Arrayed Analysis Toolkit" will appear--_

In [1]:
#load the code. Since it's specialized ipython notebook code, use '%run' rather than 'import'
%run Arrayed_Analysis_Tools.ipy

Completed loading OpenMSI Arrayed Analysis Toolkit


<h2>Log into OpenMSI</h2>
Execute the cell below. When the cell is executed, the user is asked to put in their NERSC/OpenMSI username. 
<p>After entering their username, the user is asked to put in their NERSC/OpenMSI password. _--If login is succesful, the message "Login appears to be successful!" will appear--_<p>
<p> __--Log in is not required for this tutorial. However, in order to analysis the users' own files, logging into OpenMSI is required--__

In [2]:
#log into OpenMSI.nersc.gov
openMSIsession = login()

Login appears to be successful!


<h2>File and ion selection</h2>
After a successful login, executing the file selector cell will prompt a list of the users available OpenMSI files. _--If not logged into OpenMSI, only the publicly available OpenMSI files will be displayed, including the file used in this tutorial--_
<ul>
<li> Next, select the file (name file) by clicking on the file name.
<li> Then, put in the corresponding Experiment Index and Data Index to the file you want to analysie. For this tutorial, put in '0' for both indexes. 
<li> Next, provide the m/z values of the ions for analysis. Insert the m/z value in the 'Add an ion' box, and click the 'Add Ion" button. The value will appear in the box 'Select which ions you want to load'. For the tutorial, add the following m/z values: ...-... 
<li> Ions can be removed be first clicking on the m/z value in the 'Select which ions you want to load' box and then on the 'Remove Ion' button.
<li> Then, put in the value for the integrate at +/- this amount of the inserted m/z values. Users can choose between 'absolute m/z values' or '% of m/z'. For this tutorial select the 'absolute m/z values' and set it at ... .
<li> Last, using the given parameters, a base image has to generated. To do this, the user has to click on the 'Load Image!' button. _--After clicking on the 'Load Image!' button, the line "Loading image... " will appear. The progress of loading the separate ions will be displayed, in the form of "loading ion 1 of x. m/z = x"--_ ___--When loading has completed, the message "Image has been loaded." will appear--___  
<ul>

In [3]:
if "openMSIsession" not in locals():
    openMSIsession=OpenMSIsession()
openMSIsession.imageLoader_with_dialogs() #once loaded the image will be stored in the "img" variable

Stored 'arrayed_analysis_default_filename' (unicode)
Loading image...
loading ion 1 of 4. m/z = 900.000000 +/- 0.500000
loading ion 2 of 4. m/z = 800.899624 +/- 0.500000
loading ion 3 of 4. m/z = 888.212000 +/- 0.500000
loading ion 4 of 4. m/z = 750.654600 +/- 0.500000
Image has been loaded.
Image has been saved in the global 'img' variable.
Loading image...
loading ion 1 of 4. m/z = 900.000000 +/- 0.500000
loading ion 2 of 4. m/z = 800.899624 +/- 0.500000
loading ion 3 of 4. m/z = 888.212000 +/- 0.500000
loading ion 4 of 4. m/z = 750.654600 +/- 0.500000
Image has been loaded.
Image has been saved in the global 'img' variable.


<h2>Display base image</h2>
In order to display the generated base image, execute the cell below. The base image will be displayed in new window. The base image is the ion-intensity vizualization of all selected ions and will be used for mask placement. 
<p> _--For this tutorial, this step is optional--_
<p>___--In order to continue with running the next cell in the iPython notebook, the base image figure window has to be closed--___


In [4]:
plt.imshow(img.baseImage,cmap='jet_r',clim=(0.0,np.amax(img.baseImage)/2)) #get rid of the /2 to see a wider range, or divide
                                                                           #by a bigger number if you want a narrower range
plt.colorbar()
putWindowOnTop()
plt.show()

The window should be open now. If you can't see it, check to see if it's behind another window


<h2>Placing trapezoidal mask</h2>
By executing the cell below, a trapezoidal mask containing individual markers will be generated. The size of the trapezoid and the number of markers is determined by number of rows and columns. For this tutorial, generate a trapezoid with # rows and # columns. When the cell is executed, the base image with the trapezoidal mask will be displayed in new window.
<p> Then, the trapezoidal mask is roughly placed over the arryaed samples; optimization of positioning of the individual markers will be performed in the next cells. The mask can be moved by dragging the corner markers of the trapezoid (highlighted with red halos) to the prefered position. For this tutorial, place the trapezoidal mask by dragging the corner markers of the trapezoid to the corner samples. Since the top right corner doesn't contain samples, roughly position the top right marker so that the top row and right column markers allign with the samples.
<p>__--To continue, the base image figure window has to be closed. The last coordinates of the mask will be stored.--__

In [5]:
#define spot centers as a trapezoid.
img.roughPosition_with_dialogs()

Number of columns? leave blank for default ("2") 
Number of rows? leave blank for default ("10") 
Stored 'arrayed_analysis_columns' (int)
Stored 'arrayed_analysis_rows' (int)
The window should be open now. If you can't see it, check to see if it's behind another window
new spot x and y locations have been saved.


<h2>Automatic spot optimization</h2>
In the next cell, the Jupyter notebook will optimze the marker position. For detials on the optimization algorithm, see the method section in the [manuscript](link to paper). For this tutorial, perform the automatic spot optimization.
<ul>
<li> First, put in the integration radius for the individual markers in the mask. For the tutorial, put in x. 
<li> Then, put in the number of rounds of optimization. For this tutorial, put in x. 
<li> Next, put in the number of pixels how far away from the current location should the algorithm searches. For this tutorial, put in x.
<li> Then, if you don't want that the markers will overlap after optimalization, check the box. For this tutorial, check the box.
<li> Next, give weighting values for each ion. For this tutorial, put in 1 for all ions. 
<li> You can calculate the scores for the current marker locations, by clicking on the 'Calculate scores for current spot locations' button. Optimal scores will be x. For this tutorial, you can skip this step.
<li> Then, put in the minimum score necessary to move a marker. For this tutorial, put in x. 
<li> Last, using the given parameters, the marker position can be optimized. To do this, the user has to click on the 'Optimize Spots!' button. _--After clicking on the button, the progress of the optimization will be displayed. When optimization is completed, the message "optimization routine completed. new spot x and y positions saved." will appear--_

<p> _--Performing the spot optimization is optional. Individual markers can be positioned manually in the cell 'Displaying optimized markers(s) positioning'--_

In [6]:
#automagically optimize the spot centers to correspond to the actual spots on the image
img.optimizeSpots_with_dialogs()

Stored 'arrayed_analysis_radius' (float)
Stored 'arrayed_analysis_minScore' (float)
done with optimization round 1 of 3
done with optimization round 2 of 3
done with optimization round 3 of 3
optimization routine completed. new spot x and y positions saved.
Stored 'arrayed_analysis_radius' (float)
Stored 'arrayed_analysis_minScore' (float)
done with optimization round 1 of 3
done with optimization round 2 of 3
done with optimization round 3 of 3
optimization routine completed. new spot x and y positions saved.
Stored 'arrayed_analysis_radius' (float)
Stored 'arrayed_analysis_minScore' (float)
done with optimization round 1 of 3
done with optimization round 2 of 3
done with optimization round 3 of 3
optimization routine completed. new spot x and y positions saved.
Stored 'arrayed_analysis_radius' (float)
Stored 'arrayed_analysis_minScore' (float)


SpotOptimizationException: The optimization algorithm was unable to optimize a spot.This could be because there is no signal, because the ion weightingis all zeroes, the overlapDistance is too large in Distance mode, or, in Pixel Overlap mode, spots are overlapping so severely at thebeginning of this routine that it could not find a new location nomore than halfboxsize away that does /not/ overlap with another spot.

Stored 'arrayed_analysis_radius' (float)
Stored 'arrayed_analysis_minScore' (float)
done with optimization round 1 of 3
done with optimization round 2 of 3
done with optimization round 3 of 3
optimization routine completed. new spot x and y positions saved.


<h2>Displaying and finetuning optimized marker positioning</h2>
In order to view the optimized marker postioning, execute the cell below. The base image will be displayed in new window. Individual markers can be moved by dragging the markers to the preferred postion.
<p>If no automatic spot optimization was performed, the trapezoidal mask will be at same postion as it was. Still, individual markers can be moved by dragging the markers to the preferred postion.
<p>___--In order to continue with running the next cell in the iPython notebook, the base image figure wondow has to be closed. The last coordinates of the mask will be stored--___

In [10]:
#check the positions of the spots and manually adjust them if need be
img.fineTunePosition(colormap='jet_r')

The window should be open now. If you can't see it, check to see if it's behind another window
new spot x and y locations have been saved.


<h2>Saving Arrayed Image</h2>
If needed, the arrayed image, including the coordinates of the spots, can be stored in a 'pickle' file. First, enter a name between the parentheses, replacing name_pickle_file. Then execute the cell to save the file. 
<p> _--For this tutorial, saving the mask position is optional--_

In [11]:
#Optional: Save the ArrayedImage into a pickle file.
filename="bug_fix_pickle.arrayed_img"
import pickle
pickle.dump(img, open(filename,"wb"))
print "Done saving."

Done saving.


<h2>Loading saved mask position</h2>
Saved arrayed images can be loaded from a pickle file. Enter the name of the saved pickle file between the parentheses, replacing name_saved_pickle_file. Then, execute the cell to load the file. 
<p> _--For this tutorial, this step is optional--_

In [None]:
#Optional: Load an ArrayedImage from a pickle file. This way you can work off-line
filename="filename.arrayed_img"
import pickle
img=pickle.load(open(filename,"rb"))
print img

<h2>Calculating the spot areas, and final inspection</h2>
Execute the cell below, to calculate which pixels belong to which spot. The integration radius of the markers can be adjusted, but if you leave it as-is, the same number you used in the optimization stage will be used.
A visual representation of the marker size and positioning will be genreated for visiual inspection.
<p>--The message "x spots generated. number of spots with N pixels:{x: x, x: x}" will appear--_

In [12]:
#You'll need to call this function. It returns a list of spots (where each spot is a list of pixels),
#which is also stored inside the object.
%store -r arrayed_analysis_radius
spots=img.generateSpotList(integrationRadius=arrayed_analysis_radius)
img.showMaskedImage(spotList=spots,alphaRows=True)
#this is the same integration radius that you set in the optimization step

20 spots generated. number of spots with N pixels:{71: 1, 111: 5, 112: 3, 113: 5, 114: 1, 115: 5}
The window should be open now. If you can't see it, check to see if it's behind another window


<h2>Exporting results into a tab-separated file</h2>
The next cell is used to save the results of the arrayed analysis tool as a table into a .tab file. It name the file after the current date and time, but if you want to give the file a specific name, enter the name between the quotes after filename=,

The file will contain a tab-separated file which you could open in Excel to do further data anlysis.

In [13]:
#Write results to a file
#if you don't pass it an explicit spotList it will use the spot set stored in the ArrayedImage
filename="bug_fix_pickle_results_file"
img.writeResultTable(fileName=filename)

<h2>Using pandas to perform programmatic data analysis</h2>
If you prefer to use Pandas over Excel to do data analysis, the resultsDataFrame method returns a Pandas dataframe with the same kind of information that the writeResultTable method gives.
In this example, we compute the first ion as a percentage of all ions loaded, and plot those percentages, from small to large, using matplotlib. 

In [14]:
df=img.resultsDataFrame(minPixelIntensity=0,alphaRows=True) #generate the dataframe
IPython.display.display(df)
sums_df=df.loc[:,(slice(None),'sum')] #get the sums from the dataframe
sums_df.columns = sums_df.columns.get_level_values(0) #name the columns to make indexing easier later
percentage_firstion=100.0*sums_df[img.ions[0]]/sums_df.sum(axis=1) #calculate the percentage
percentage_firstion.sort_values(inplace=True) #rank the data from low to high
plt.bar(range(len(percentage_firstion)),percentage_firstion,edgecolor='b') #define a bar chart
plt.xlabel('Spot rank') #set x axis label
plt.ylabel("% m/z={:.1f} of all loaded ions".format(img.ions[0])) #set y axis label
plt.xlim(0,len(percentage_firstion)-1) #set x axis range
plt.show()

ion,900.000000,900.000000,900.000000,900.000000,900.000000,900.000000,800.899624,800.899624,800.899624,800.899624,...,888.212000,888.212000,888.212000,888.212000,750.654600,750.654600,750.654600,750.654600,750.654600,750.654600
descriptor,sum,mean,median,min,max,num_pixels,sum,mean,median,min,...,median,min,max,num_pixels,sum,mean,median,min,max,num_pixels
A01,3251,45.788732,16,1,346,71,13540,190.704225,184.0,8,...,9,3,32,71,5324,74.985915,65,8,245,71
A02,1549,13.707965,9,3,155,113,10518,93.079646,84.0,6,...,6,1,16,113,7279,64.415929,40,4,215,113
B01,3800,33.333333,16,1,346,114,17140,150.350877,135.0,11,...,9,1,32,114,6849,60.078947,38,6,251,114
B02,1613,14.401786,9,1,151,112,11154,99.589286,75.5,6,...,6,1,21,112,5910,52.767857,33,4,171,112
C01,5797,52.225225,40,3,210,111,13627,122.765766,114.0,39,...,39,9,63,111,10235,92.207207,86,21,235,111
C02,5148,44.765217,39,22,169,115,15506,134.834783,120.0,32,...,32,22,55,115,13225,115.0,75,29,316,115
D01,6003,54.081081,45,19,181,111,24654,222.108108,200.0,57,...,35,19,66,111,12051,108.567568,89,35,349,111
D02,4364,38.619469,32,17,187,113,13504,119.504425,106.0,29,...,29,17,47,113,9699,85.831858,60,24,227,113
E01,5495,48.628319,42,19,210,113,28735,254.292035,251.0,80,...,34,21,71,113,15666,138.637168,112,42,374,113
E02,4186,37.711712,34,17,127,111,13332,120.108108,101.0,31,...,27,17,55,111,10901,98.207207,63,27,307,111


AttributeError: 'Series' object has no attribute 'sort_values'