###  Jupyter notebook for feature demonstration of `metabolinks` Python module

#### Metabolinks Github home: https://github.com/aeferreira/metabolinks

Install metabolinks by running

`pip install metabolinks`

## Peak alignment for peak tables contained in MS-Excel sheets

This notebook demonstrates peak alignment, based on m/z proximity.

A given **m/z ppm tolerance** defines the maximum relative deviation in the values of m/z from different peak lists to be considered the same peak in the resulting aligned peak table.

Each peak list is a table of |m/z , Signal| values. Intensities are not considered, just copied over to the final table, and m/z values are averaged in each aligned group.

Sample names and labels are copied over to the final table.

An alignment is performed for each Excel worksheet. This means that several realted peak tables should be contained in the same worksheet. In the provided example data, each excel worksheet refers to different sample extraction methods.

### Functions and parameters

One function does all the job: `align_spectra_in_excel()`

Parameters of this function are:

- The name of the Excel file where data comes from (the first parameter)
- `save_to_excel` the name of the Excel file where results will be written to
- `ppmtol` the tolerance, in ppm, for the deviation between m/z values of the same peak in different tables. Default is 1 ppm.
- `min_samples` reproducibility threshold: the minimum number of occurances of a given peak to be included in the aligned table. Default is 1.
- `sample_names` (optional) sample names to be assigned to samples (for downstream data analysis), overiding those found in the Excel file
- `labels` (optional) labels to be assigned to samples (for downstream data analysis)
- `header_row` the row number in Excel file where  data begins, that is, the row number for the top of the tables of peaks. Excel rows with smaller numbers are ignored and can contain comments and metadata. Open the provided example to have a look  at the top row and the format of data.


In [None]:
from metabolinks import align_spectra_in_excel, align
from metabolinks.peak_alignment import save_aligned_to_excel

In [None]:
ppmtol = 1.0
min_samples = 4

in_fname = 'data_groups.xlsx'
save_to_excel = 'aligned_by_group.xlsx'

header_row = 2
sample_names = 1

aligned = align_spectra_in_excel(in_fname, save_to_excel=save_to_excel,
                       ppmtol=ppmtol, min_samples=min_samples,
                       header_row=header_row, sample_names=sample_names)

In [None]:
for name in aligned:
    aligned[name].set_labels(name)

In [None]:
for name, a in aligned.items():
    print('--------', name, '----------------')
    print(a)

In [None]:
two_groups = [aligned[name] for name in aligned]

In [None]:
aligned_all = align(two_groups, min_samples=4, ppmtol=1.0)

In [None]:
save_aligned_to_excel('all_aligned.xlsx', {'all': aligned_all})

In [None]:
aligned_all.to_csv('aligned_all.csv', sep=',', with_labels=True)