NECTAR-MSI: NoisE CorrecTion AlgoRithm in MSI

With NECTAR you can obtain a de-noised list of compounds of interest from your .imzML mass-spectrometry-imaging data.  
NECTAR applies baseline correction, determines signal/noise in the mean spectrum, as well as correct from chemical noise (sinusoidal noise). In addition, if your imaging data has a background area around the sample of interest (e.g., tissue), NECTAR applies imaging background noise correction.

Here we show the basic steps to obtain your final list of compounds of interest. More details in (ref to paper)

Run the next cell to load the necessary packages. (You need to install nectar-msi. You can do this with pip install nectar-msi)

In [None]:
from nectar-msi import Readers, Savers, DataOperations, NoiseCorrection, PeakPicking, DatabaseMatching, Plotting
import sys

-NEEDS INPUT-

Define the paths to your datafile, and the path where you want to save the ouputs.


In [None]:
path_data = "..."
file = "example.imzML"  # Example dataset
path_outputs = "..."

polarity = "..." # To determine the adducts of interst for the HMDB database matching step (optional)
modality = "MALDI" # example of used modality

Run the next cell to read the .imzML data, and the "tissue only" mean spectrum provided as example. 
(We provide the mean spectrum of the tissue only, as creating the different mean spectra takes a long time).

In [None]:
data = reader.read_imzml(path_data + file)
mean_tissue = reader.read_hdf5_spectrum(path_outputs + 'mean_tissue.hdf5')

If you want to create the mean spectrum yourself, you can run the next cell. 
After creating "total_mean" (mean spectrum of the whole imzML file), you can separate tissue and background using K-means (cluster number = 2). 

In [None]:
total_mean = dataop.get_mean_spectrum(data)
saver.save_spectrum_hdf5(path_outputs + 'total_mean.hdf5', total_mean) # save the mean spectrum in outputs_path

data_masked = dataop.background_subtraction(data, total_mean, path_outputs, n_clusters=2, show_plot=True)
saver.save_hdf5(path_outputs + 'example_masked.hdf5', data_masked) # save the masked datacube in outputs_path in .hdf5 format.
#saver.save_imzML(path_outputs + 'example_masked.imzML', data_masked) # It can be saved in imzML format as well.

# data_masked = reader.read_hdf5(path_outputs + 'example_masked.hdf5') # Reads the masked data in hdf5 format.
# data_masked = reader.read_imzML(path_outputs + 'example_masked.imzML') # Reads the masked data in imzML format.

-NEEDS INPUT-

You can create the mean spectra for tissue and backgkround. 
NECTAR will ask you for the cluster number corresponding to tissue and background area, so you would need to check this in the image created when running the previous cell (background_subtraction).

In [None]:
mean_tissue, mean_background = dataop.get_mean_spectrum_tissue_background(data_masked)

saver.save_spectrum_hdf5(path_outputs + 'mean_background.hdf5', mean_background) 
saver.save_spectrum_hdf5(path_outputs + 'mean_tissue.hdf5', mean_tissue)

# mean_tissue = reader.read_hdf5_spectrum(path_outputs + 'mean_tissue.hdf5')
# mean_background = reader.read_hdf5_spectrum(path_outputs + 'mean_background.hdf5')

The next cell applies baseline and chemical noise correction on the mean spectrum of interest (tissue only), and identifies spatial background noise peaks. 

In [None]:
mean_tissue_corrected = noisecorrection.noise_correction_with_chemical_noise(mean_tissue, plot_noise=True, plot_chemicalnoise=True)
#mean_tissue_corrected = noisecorrection.noise_correction(mean_tissue, plot_noise=True) # if your data does not have sinusoidal chemical noise you can determine signal and noise with the SigmaClipping function only.

saver.save_spectrum_hdf5(path_outputs + 'mean_tissue_corrected.hdf5', mean_tissue_corrected) # to save the corrected spectrum
#mean_tissue_corrected = reader.read_hdf5_spectrum(path_outputs + 'mean_tissue_corrected.hdf5') # to read the spectrum

# Determination of all peaks of interst above the noise level
list_of_fittings, full_list_of_fittings = peakpicking.peak_picking(mean_tissue_corrected, path_outputs, plot_peaks=True, save_tables=True,
                                                                   save_fitting=False)
#full_list_of_fittings = pd.read_csv("X:\\Ariadna\\PDAC\\MALDI\\nectar_outputs\\full_list_of_fittings.csv")

# Background noise correction
peaks_classification = noisecorrection.backgroundnoise.background_noise_imaging(data_masked, mean_tissue_corrected, full_list_of_fittings, path_outputs,
                                                                                save_plot_backgroundNoise=True)
#peaks_classification = pd.read_csv("X:\\Ariadna\\PDAC\\MALDI\\nectar_outputs\\Peaks_classification.csv")


Final selection of peaks according to S/N ratio and diff. 

In [None]:
final_list = peaks_classification.loc[(peaks_classification['diff[Tis/bak] S/N'] >= 0) &
                                      (peaks_classification['ratio[Tis/bak] S/N'] >= 5)]

#final_list.to_csv("X:\\Ariadna\\PDAC\\MALDI\\nectar_outputs\\Compounds_of_interest_final_list.csv", index=False)

To create the single ion images for the final list of compounds of interest, run the next cell.

In [None]:
plotting.plot_sii_final_list(data, mean_tissue_corrected, final_list, path_outputs, save_fig=True)

The next cell allows you to save the reduced datacube in imzML, hdf5 or .mat format.

In [None]:
saver.save_final_DataCube(data_masked, final_list, path_outputs, save_imzml=True, save_hdf5=False, save_mat=False)

In [None]:
databasematching.database_matching(final_list, polarity, modality, path_outputs, ppm=30.)