<h1>OpenMSI Arrayed Analysis Tools</h1>
<h2>Introduction</h2>
Mass spectrometry imaging (MSI) enables the mass analysis of thousands of spatially defined samples, and can be applied in the high-throughput screening of, for example, enzyme activity or compound libraries. Here we present OpenMSI Arrayed Analaysis Tools, an Python based software tool for the analysis of spatially defined samples with MSI.

This tutorial Notebook ([Tutorial_OpenMSI_Arrayed_Analysis_Tools](link to notebook in reposetory)) will demonstrate the basic features of OpenMSI Arrayed Analaysis Tools, with step by step guidance on how to run the Jupyter/iPython Notebook. A normal version, without the turtorial markdown, is also available ([OpenMSI_Arrayed_Analysis_Tools](link to notebook in reposetory)). Furthermore, a version showcasing advanced capabilities and no mardown is also available ([Advanced_OpenMSI_Arrayed_Analyis_Tools](link to notebook inreposetory). To complete this tutorial, one would need:
<ul>
<li>Jupyter/iPython. OpenMSI Arrayed Analaysis Tools requires Jupyter version 4.1+ and Python version 2.7+ or 3.2+. Further information on iPython/Jupyter can be found at and http://ipython.org/. </li>
<li>The following Python packages:<ul>
<li>numpy</li>
<li>matplotlib</li>
<li>requests</li>
<li>pandas</li>
<li>future (for Python 2.7)</li>
</ul></li>
<li>An OpenMSI account. Users need to get an OpenMSI account in order to use this tool. An OpenMSI account can be obtained through the OpenMSI team and NERSC (https://openmsi.nersc.gov/openmsi/client/omsiAccount) at no cost.</li>
</ul>

If the interactive dialogs do not appear, try running Jupyter explicitly from the command line using the "jupyter notebook" command. Many shortcuts will call the deprecated "ipython notebook" in stead.

Users don't have to provide a MSI data file. For this tutorial, we have selected the usage of the [name dataset](link to dataset in openmsi) MSI data set (as used in the [manuscript](link to paper), which is publicly available. 

<h2>How to use this iPython/Jupyter Notebook</h2>
*An interactive demo for new users of iPython/Jupyter notebooks can be found at [Nature](http://www.nature.com/news/ipython-interactive-demo-7.21492)*

In this tutorial, there are 2 types of content, namely text and code. This content is placed in boxes called "cells". If you click around on this page, you'll see different cells highlighted. To execute each cell (regardless of content), you hit on your keyboard SHIFT+ENTER or press the play button. If the cell contains text, the content will be displayed directly. If the cell contains code, the code will be executed. 

<h2>Cell 1. Loading Arrayed Analysis Tool</h2>
Execute the cell below to load the OpenMSI Arrayed Analysis Tool. It should load in less than 30 seconds
<p>_--When succesful, the message "Completed loading OpenMSI Arrayed Analysis Toolkit" will appear--_

In [1]:
#load the code. Since it's specialized ipython notebook code, use '%run' rather than 'import'
%run omaat_lib.ipy

Completed loading OpenMSI Arrayed Analysis Toolkit


<h2>Cell 2. Log into OpenMSI</h2>
Execute the cell below. When the cell is executed, the user is asked to put in their NERSC/OpenMSI username. 
<p>After entering their username, the user is asked to put in their NERSC/OpenMSI password. _--If login is succesful, the message "Login appears to be successful!" will appear--_<p>
<p> __--Log in is not required for this tutorial. However, in order to analysis the users' own files, logging into OpenMSI is required--__

In [3]:
#log into OpenMSI.nersc.gov
openMSIsession = login()

Login appears to be successful!


<h2>Cell 3. File and ion selection</h2>
After a successful login, executing the file selector cell will prompt a list of the users available OpenMSI files. _--If not logged into OpenMSI, only the publicly available OpenMSI files will be displayed, including the file used in this tutorial--_
<ul>
<li> Next, select the file (name file) by clicking on the file name.
<li> Then, put in the corresponding Experiment Index and Data Index to the file you want to analysie. For this tutorial, put in '0' for both indexes. 
<li> Next, provide the m/z values of the ions for analysis. Insert the m/z value in the 'Add an ion' box, and click the 'Add Ion" button. The value will appear in the box 'Select which ions you want to load'. For the tutorial, add the following m/z values: ...-... 
<li> Ions can be removed be first clicking on the m/z value in the 'Select which ions you want to load' box and then on the 'Remove Ion' button.
<li> Then, put in the value for the integrate at +/- this amount of the inserted m/z values. Users can choose between 'absolute m/z values' or '% of m/z'. For this tutorial select the 'absolute m/z values' and set it at ... .
<li> Last, using the given parameters, a base image has to generated. To do this, the user has to click on the 'Load Image!' button. _--After clicking on the 'Load Image!' button, the line "Loading image... " will appear. The progress of loading the separate ions will be displayed, in the form of "loading ion 1 of x. m/z = x"--_ ___--When loading has completed, the message "Image has been loaded." will appear--___  
<ul>

In [4]:
if "openMSIsession" not in locals():
    openMSIsession=OpenMSIsession()
openMSIsession.imageLoader_with_dialogs() #once loaded the image will be stored in the "img" variable

Stored 'arrayed_analysis_default_filename' (unicode)
Loading image...
loading ion 1 of 3. m/z = 1141.350000 +/- 0.500000
Time to load ion: 0.166079998016 seconds
loading ion 2 of 3. m/z = 1143.050000 +/- 0.500000
Time to load ion: 0.160965919495 seconds
loading ion 3 of 3. m/z = 1241.250000 +/- 0.500000
Time to load ion: 0.169579029083 seconds
Image has been loaded.
Image has been saved in the global 'img' variable.


<h2>Cell 4. Display base image</h2>
In order to display the generated base image, execute the cell below. The base image will be displayed in new window. The base image is the ion-intensity vizualization of all selected ions and will be used for mask placement. 
<p> _--For this tutorial, this step is optional--_
<p>___--In order to continue with running the next cell in the iPython notebook, the base image figure window has to be closed--___


In [5]:
color_map="hot_r" #want to try a different color map? change it here, run the cell, and all functions below will use it
marker_color="blue"
Zzoom=1 #set this to a higher number to see a narrower range of values (useful if one pixel is way brighter than the rest)
plt.imshow(img.baseImage,cmap=color_map,clim=(0.0,np.amax(img.baseImage)/Zzoom))
plt.colorbar()
putWindowOnTop()
plt.show()

The window should be open now. If you can't see it, check to see if it's behind another window


<h2>Cell 5. Placing trapezoidal mask</h2>
By executing the cell below, a trapezoidal mask containing individual markers will be generated. The size of the trapezoid and the number of markers is determined by number of rows and columns. For this tutorial, generate a trapezoid with # rows and # columns. When the cell is executed, the base image with the trapezoidal mask will be displayed in new window.
<p> Then, the trapezoidal mask is roughly placed over the arryaed samples; optimization of positioning of the individual markers will be performed in the next cells. The mask can be moved by dragging the corner markers of the trapezoid (highlighted with red halos) to the prefered position. For this tutorial, place the trapezoidal mask by dragging the corner markers of the trapezoid to the corner samples. Since the top right corner doesn't contain samples, roughly position the top right marker so that the top row and right column markers allign with the samples.
<p>The Hexagonal Offset, which can be a decimal number, moves every other row that many spots to the right. If you want a traditional hexagonally tiled mask, set the Hexagonal Offset to 0.5 to move every other row to the right, or set it to -0.5 to move every other row to the left. 
<p>__--To continue, the base image figure window has to be closed. The last coordinates of the mask will be stored.--__

In [6]:
#define spot centers as a trapezoid.
#if you want to pass rows/columns as arguments, or choose the colormap, use the img.roughPosition() method in stead.
img.roughPosition_with_dialogs(colormap=color_map,markercolor=marker_color)

Number of columns? leave blank for default ("2") 
Number of rows? leave blank for default ("2") 
Hexagonal Offset? This shifts every other line by this many spots. leave blank for default ("0.000000") 
Stored 'arrayed_analysis_columns' (int)
Stored 'arrayed_analysis_rows' (int)
Stored 'arrayed_analysis_offset' (float)
The window should be open now. If you can't see it, check to see if it's behind another window
new spot x and y locations have been saved.


## <h2>Cell 6. Automatic spot optimization</h2>
In the next cell, the Jupyter notebook will optimze the marker position. For detials on the optimization algorithm, see the method section in the [manuscript](link to paper). For this tutorial, perform the automatic spot optimization.
<ul>
<li> First, put in the integration radius for the individual markers in the mask. For the tutorial, put in x. 
<li> Then, put in the number of rounds of optimization. For this tutorial, put in x. 
<li> Next, put in the number of pixels how far away from the current location should the algorithm searches. For this tutorial, put in x.
<li> Then, if you don't want that the markers will overlap after optimalization, check the box. For this tutorial, check the box.
<li> Next, give weighting values for each ion. For this tutorial, put in 1 for all ions. 
<li> You can calculate the scores for the current marker locations, by clicking on the 'Calculate scores for current spot locations' button. Optimal scores will be x. For this tutorial, you can skip this step.
<li> Then, put in the minimum score necessary to move a marker. For this tutorial, put in x. 
<li> Last, using the given parameters, the marker position can be optimized. To do this, the user has to click on the 'Optimize Spots!' button. _--After clicking on the button, the progress of the optimization will be displayed. When optimization is completed, the message "optimization routine completed. new spot x and y positions saved." will appear--_

<p> _--Performing the spot optimization is optional. Individual markers can be positioned manually in the cell 'Displaying optimized markers(s) positioning'--_

In [7]:
#automagically optimize the spot centers to correspond to the actual spots on the image
img.optimizeSpots_with_dialogs()

Stored 'arrayed_analysis_radius' (float)
Stored 'arrayed_analysis_minScore' (float)
done with optimization round 1 of 3
done with optimization round 2 of 3
done with optimization round 3 of 3
optimization routine completed. new spot x and y positions saved.


<h2>Cell 7. Displaying and finetuning optimized marker positioning</h2>
In order to view the optimized marker postioning, execute the cell below. The base image will be displayed in new window. Individual markers can be moved by dragging the markers to the preferred postion.
<p>The radius of the circular spot markers in this tool is not necessarily the same as your actual integration radius, though we have tried our best to make it a reasonable approximation. Use the "Calculating spot areas" cell below to see the shapes of your actual calculated spots</p>
<p>If no automatic spot optimization was performed, the trapezoidal mask will be at same postion as it was. Still, individual markers can be moved by dragging the markers to the preferred postion.
<p>___--In order to continue with running the next cell in the iPython notebook, the base image figure wondow has to be closed. The last coordinates of the mask will be stored--___

In [8]:
#check the positions of the spots and manually adjust them if need be
radius=arrayed_analysis_radius if ("arrayed_analysis_radius" in locals()) else 2
img.fineTunePosition(markerRadius=radius,colormap=color_map,markercolor=marker_color)

The window should be open now. If you can't see it, check to see if it's behind another window
new spot x and y locations have been saved.


<h2>Cell 8. Saving Arrayed Image</h2>
If needed, the arrayed image, including the coordinates of the spots, can be stored in a 'pickle' file. First, enter a name between the parentheses, replacing name_pickle_file. Then execute the cell to save the file. 
<p> _--For this tutorial, saving the mask position is optional--_

In [9]:
#Optional: Save the ArrayedImage into a pickle file.
filename="filename.arrayed_img"
import pickle
pickle.dump(img, open(filename,"wb"))
print("Done saving.")

Done saving.


<h2>Cell 9. Loading saved mask position</h2>
Saved arrayed images can be loaded from a pickle file. Enter the name of the saved pickle file between the parentheses, replacing name_saved_pickle_file. Then, execute the cell to load the file. 
<p> _--For this tutorial, this step is optional--_

In [10]:
#Optional: Load an ArrayedImage from a pickle file. This way you can work off-line
filename="filename.arrayed_img"
import pickle
img=pickle.load(open(filename,"rb"))
print(img)

ArrayedImage based on 20120913_nimzyme.h5
Ions loaded: [1141.35, 1143.05, 1241.25]
# of spot locations defined: 4
# of spot pixel masks defined: None


<h2>Cell 10. Calculating the spot areas, and final inspection</h2>
Execute the cell below, to calculate which pixels belong to which spot. The integration radius of the markers can be adjusted, but if you leave it as-is, the same number you used in the optimization stage will be used.
A visual representation of the marker size and positioning will be generated for visiual inspection.
<p>--The message "x spots generated. number of spots with N pixels:{x: x, x: x}" will appear--_

In [11]:
#You'll need to call this function. It returns a list of spots (where each spot is a list of pixels),
#which is also stored inside the object.
%store -r arrayed_analysis_radius
spots=img.generateSpotList(integrationRadius=arrayed_analysis_radius)
img.showMaskedImage(spotList=spots,alphaRows=True)
#this is the same integration radius that you set in the optimization step

4 spots generated. number of spots with N pixels:{11: 1, 13: 3}
The window should be open now. If you can't see it, check to see if it's behind another window


<h2>Cell 11. Exporting results into a comma-separated text (csv) file</h2>
The next cell is used to save the results of the arrayed analysis tool as a table into a .csv file. It name the file after the current date and time, but if you want to give the file a specific name, enter the name between the quotes after filename=.

The file will contain a comma-separated file which you could open in Excel to do further data anlysis.

In [12]:
#Write results to a file
#if you don't pass it an explicit spotList it will use the spot set stored in the ArrayedImage
filename=""  #.csv extension will be automatically added
img.writeResultTable(filename=filename,alphaRows=True)

<h2>Cell 12. Using pandas to perform programmatic data analysis</h2>
If you prefer to use Pandas over Excel to do data analysis, the resultsDataFrame method returns a Pandas dataframe with the same kind of information that the writeResultTable method gives.
In this example, we compute the first ion as a percentage of all ions loaded, and plot those percentages, from small to large, using matplotlib. 

In [13]:
df=img.resultsDataFrame(minPixelIntensity=0,alphaRows=True) #generate the dataframe
IPython.display.display(df)

sums_df=df.loc[:,(slice(None),'sum')] #get the sums from the dataframe
sums_df.columns = sums_df.columns.get_level_values(0) #name the columns to make indexing easier later
percentage_firstion=100.0*sums_df[img.ions[0]]/sums_df.sum(axis=1) #calculate the percentage
percentage_firstion.sort() #rank the data from low to high. Using sort() instead of sort_values(inplace=True) for backward compatibility
plt.bar(range(len(percentage_firstion)),percentage_firstion,edgecolor='b') #define a bar chart
plt.xlabel('Spot rank') #set x axis label
plt.ylabel("% m/z={:.1f} of all loaded ions".format(img.ions[0])) #set y axis label
plt.xlim(0,len(percentage_firstion)-1) #set x axis range
plt.show()

ion,1141.35,1141.35,1141.35,1141.35,1141.35,1141.35,1143.05,1143.05,1143.05,1143.05,1143.05,1143.05,1241.25,1241.25,1241.25,1241.25,1241.25,1241.25
descriptor,sum,mean,median,min,max,num_pixels,sum,mean,median,min,max,num_pixels,sum,mean,median,min,max,num_pixels
A01,35893.0,3263.0,3349.0,321.0,5566.0,11.0,4522.0,411.090909,408.0,16.0,724.0,11.0,111574.0,10143.090909,11977.0,1029.0,15032.0,11.0
A02,44428.0,3417.538462,2990.0,98.0,8148.0,13.0,5664.0,435.692308,370.0,5.0,1029.0,13.0,93948.0,7226.769231,8213.0,315.0,13905.0,13.0
B01,21354.0,1642.615385,1590.0,87.0,4008.0,13.0,2434.0,187.230769,179.0,5.0,539.0,13.0,136714.0,10516.461538,7009.0,1023.0,22521.0,13.0
B02,14681.0,1129.307692,855.0,10.0,3191.0,13.0,1564.0,120.307692,65.0,5.0,419.0,13.0,103822.0,7986.307692,10098.0,27.0,16508.0,13.0




<h2>Viewing and manipulating spot spectra</h2>
You can request the average spectra for your spots from the OpenMSI server.
The toolkit can return the spectral data as a dataframe, which makes plotting straight-forward

Quite a lot of data needs to be transferred between the server and this script, so allow some time
for this example to run

In [14]:
spectra_df=openMSIsession.getSpotSpectra(img,verbose=True) #Loads the spectra from the OpenMSI server
                                                           #It's  lot of data, so save the desulting dataframe
                                                           #so that you dont have to run this method repeatedly.

Finished loading spectrum 1 out of 4
Finished loading spectrum 2 out of 4
Finished loading spectrum 3 out of 4
Finished loading spectrum 4 out of 4


In [15]:
A01_spectrum=spectra_df["A01"] # get only the spectrum for the spot at location A01
A01_spectrum.plot() #plot the entire spectrum for spot A01
plt.xlabel("m/z")
plt.ylabel("intensity")
plt.show()

In [16]:
A01_spectrum[900:1950].plot() #plot only the m/z values between 900 and 1200
plt.xlabel("m/z")
plt.ylabel("intensity")
plt.show()

In [17]:
spectra_df.plot() # plot ALL the spectra that are loaded on top of each other.
                  #Depending on how many spots are in your image, this can be
                  #a LOT of data, if you don't have a good computer this might crash.
plt.xlabel("m/z")
plt.ylabel("intensity")
plt.show()

<h2>Cell 13. Uploading results to OpenMSI (Python 2 only)</h2>

You can save the results from the spot analysis to an OpenMSI HDF5 file and upload the file to OpenMSI to easily share results with others.

If you only want to save Arrayed Image to an OpenMSI HDF5 file then do:

In [19]:
# Save the ArrayedImage to an OpenMSI File
import sys
sys.path.append('/Users/oruebel/Devel/BASTet-git/bastet')
save_filename = 'save_omaat2.h5'
omsi_out_file = img.saveToOpenMSIFile(filename=save_filename, spotSpectra=spectra_df)  
# filename : If None, than a dialog will be shown to ask for a filename to be used

We can now easily restore our analysis from the local file at any time via:

In [20]:
spotSpectra_resored, arrayedImage_restored = openMSIsession.restore_omaat_results(
                            filename=save_filename,
                            localFile=True)

To upload our ArrayedImage to OpenMSI we simply call:

In [21]:
openMSIsession.upload_omaat_results(filename=save_filename, machine='edison')
# filename: If we set the fileName to None, then we'll ask the user for a filename via a dialog
# username: We can set the username or we'll ask for it
# session: The NERSC NEWT session to be used (not the OpenMSI seesion). We'll create it if not given
# machine: The NERSC machine we should use for the upload, e.g., 'cori' or 'edison'

Please enter your NERSC Password
Enter password for user "oruebel" 
········


(True, True, True, None)

We can naturally also download the complete HDF5 file from OpenMSI simply via:

In [23]:
remote_file = os.path.join(openMSIsession.username, os.path.basename(save_filename))
download_filename = openMSIsession.download_file(filename=remote_file)

We can now again restore our analysis from the downloaded file in the same way as before via:

In [24]:
spotSpectra_resored, arrayedImage_restored = openMSIsession.restore_omaat_results(
                            filename=download_filename,
                            localFile=True)

We can also restore our arrayed image and spot spectra data from the remote file stored on OpenMSI directly via

In [25]:
#remote_file = os.path.join(openMSIsession.username, os.path.basename(save_filename))
#spotSpectra_resored, arrayedImage_restored = openMSIsession.restore_omaat_results(
#                            filename=remote_file,
#                            localFile=False)