# Convering imzML files with i2nca and m2aia

With this notebook, imzML files can be converted into other .imzML file types.
There are two different distinctions within imzML files.:

- **processed** or **continuous**:
This parameter references wheteher the mass axis of all spectra is shared. A shared axis is stored only once, otherwise, each pixel was its own axis.
 add here ims accesnsions
 add here ims accesnsions

- **profile** or **centroid**:
This set of parameters indicates if the spectra are stroed as a line spectra or if each signal in the spectrum is assigned to single centroid. A profile spectrum contains much more datapoints, but also shows the peakshapes. A centroided spectrum reduced a spectral peak to only a single datapoint.


# Imports

Before you can start the QC tools, load the required tools and libraries.

PLEASE RUN THE CELL BELOW WITHOUT CHANGES

In [1]:
# import statements
import m2aia as m2
from i2nca import report_pp_to_cp_imzml, convert_pp_to_cp_imzml, report_pp_to_cp 
from i2nca.convtools.conv_tools import write_pp_to_cp_imzml

# Processed Profile to Continous Profile

Some Mass spectrometers record the data in a very precise manner. Even though the mass data is recorded as a processed spectra, there are only very little technical variations between the mz data points in each spectrum. With the following conversion tool, we can test how large the standard deviation for one of these "pseudo-bis" is and how far the mean of two neighbouring "pseudo-bins" is spaced.
An underlying assuption for this test is that each pixel starts and ends with nearly the same mz value ad has the same number of data points

With the following code, we will create a pdf report that checks all our assumptions and plots them. This provides human-readable output whether we should continue with the conversion.

## Loading a dataset

To start a QC, you need to first load the imzML dataset. It needs to be loaded only once for all the

THE CELL BELOW NEEDS YOUR INPUT

Please change the following  variables in order to load the dataset:
- `file_path`: Please provide a path to the imzML file on your machine. It must end with the full filename (`name` and `".imzML"`).\
   Using the  `r"..."` notation allows to add Windows-style paths without escaping each backslash.


In [3]:
file_path = r"C:\Users\Jannik\Documents\Uni\Master_Biochem\4_Semester\M2aia\data\minidata\metabolomics_full_mz_processed_profile.imzML"
I = m2.ImzMLReader(file_path)

In [6]:
md = I.GetMetaData()

{'(original imzML value) [IMS:1000042] max count of pixels x': '1362',
 '(original imzML value) [IMS:1000043] max count of pixels y': '661',
 '(original imzML value) max count of pixels z': '1',
 '[IMS:1000031] processed': 'true',
 '[IMS:1000042] max count of pixels x': '20',
 '[IMS:1000043] max count of pixels y': '1',
 '[IMS:1000044] max dimension x': '27240',
 '[IMS:1000045] max dimension y': '13220',
 '[IMS:1000046] pixel size x': '0.02',
 '[IMS:1000047] pixel size y': '0.02',
 '[IMS:1000053] absolute position offset x': '0',
 '[IMS:1000054] absolute position offset y': '0',
 '[IMS:1000080] universally unique identifier': '{96ECC4EE-124A-4C60-B508-149681281DBB}',
 '[IMS:1000091] ibd SHA-1': 'E1E8E02AAA7197281BC35970D8874B55DBE811C2',
 '[IMS:1000101] intensityArray.external data': 'true',
 '[IMS:1000101] mzArray.external data': 'true',
 '[IMS:1000401] scanSettings1.top down': 'true',
 '[IMS:1000410] scanSettings1.meandering': 'true',
 '[IMS:1000480] scanSettings1.horizontal line sca

## Test the conversion method

To check if the condinitions we assumed apply, we can test the parsed file. This test creates a pdf file with informative graphics.
Convieniently, this fuction also returns the mean mz values for which it tested the pseudo-bin arrangement back to us for further use.

THE FOLLOWING CELL NEEDS YOUR INPUT

Please change the following variables in order to test the imzML file structure:

- `output_filepath`: Please provide a path on your machine. The output pdf file will be saved there.

- `coverage`: The coverage is a subsetting method for large datasets. A coverage of 0.3 means that the pixels get subsetted amounting to 30% of the full measurent. This allows faster computation for large datasets. For small datasets, the value should be 1.0.


In [11]:
output_filepath = file_path[:-6]

coverage = 0.1 # value between 0 and 1

reference_mzs = report_pp_to_cp(I, output_filepath, coverage)

report generated at:  D:\data\Jannik\Files_for_minidata\metabo\metabolomics_small_mz_processed_profilecontrol_report_pp_to_cp.pdf


You can now check at the specified file location how the conversion tests look. If you approve of the data, continue with the following paragraph.

<!---
Not yet implemented:
Otherwise, check a sparce data averaging method (like a clustering of the subsample mz values) to get reference mz values.
--> 



## Convert the file

With our refernce masses generated, we can create a new imzML file.


THE FOLLOWING CELL NEEDS YOUR INPUT

Please change the following variables in order to test the imzML file structure:

- `reference_mzs`: The previously generated set of refenc mz values.

- `output_filepath`: A file path to the output imzML file. 


In [17]:
# do the writing
write_pp_to_cp_imzml(I, reference_mzs, output_filepath)

'D:\\data\\Jannik\\Files_for_minidata\\metabo\\metabolomics_small_mz_processed_profile_conv_output_cont_profile.imzML'

# Workflow application

When doing this preprocessing in an automated way for multiple files, the steps mentioned above can be contracted into a single function. Here are two functions that allow a quick and easy access to this conversion.

- `report_pp_to_cp_imzml`: (file_path, output_path, coverage)\
This fuction directly cretes the coversion report and uses the generated mz values to write the continous profile file. It takes the following parameters:
  - `file_path`: File path of input imzML file
  - `output_path` *Optional Parameter*: If specified, the Report and imzML file are created as this path. Else, they are created to the same path of the input file 
  - `coverage`*Optional Parameter*: Uses a subsample accoding to the percentage provided. Defalus to 25% of the sample if ommitted.

<br/>

- `convert_pp_to_cp_imzml`: (file_path, output_path, pixel_nr)\
   This function converts the file, but does not test the conversion assuptions or creates a pdf report. It is meant for batch pocesses where a file is already tested.
   - `file_path`: File path of input imzML file
   - `output_path` *Optional Parameter*: If specified, the imzML file is created as this path. Else, it's created to the same path of the input file.
   - `pixel_nr` *Optional Parameter*: The subsample is created with the specified number of pixels. Defaults to 100 if not specified otherwise.


In [19]:
# examples for report_pp_to_cp_imzml

file_path = r"C:\path\to\...\file\location\...\filename.imzML"
output_path = r"C:\path\to\...\output\file\location\...\output_name"

coverage = 0.1

#report_pp_to_cp_imzml(file_path,output_path, coverage)
#report_pp_to_cp_imzml(file_path, coverage = 0.5)
#report_pp_to_cp_imzml(file_path)

In [21]:
# examples for convert_pp_to_cp_imzml

file_path = r"C:\path\to\...\file\location\...\filename.imzML"
output_path = r"C:\path\to\...\output\file\location\...\output_name"

#convert_pp_to_cp_imzml(file_path, output_path, pixel_nr = 150)
#convert_pp_to_cp_imzml(file_path, output_path = r"D:\path\to\file\testfile")
#convert_pp_to_cp_imzml(file_path)