 # Interactively guided file conversion #

This notebook is intended to guide you through the conversion of bruker files to imzML files that are useable to the Cardinal Workflow.

The suppurted input formats are:
* .imzML files
* .tsf files (by Bruker TimsTofFlex)

The supported output files are
* imzML files


## Creating a processed profile imzML file ##
If you would like to create a processed profile imzML file, run the following code block.
You need to adjust the following parameters: \
* `input_dir` : Add the input directory. Keep in mind to replac the "\" character with "\\". In this directory, both the .tsf and .bin_tsf file should be stored with the name `analysis.tsf`.
* `input_dir` : Optional file location to add, if you would like the output to be stored somewhere esle that in the same location. Otherwise, this can be ommitted.


In [None]:
from tsf_to_proc_prof_imzML import convert_tsf_to_proc_prof_imzml 

input_dir = "D:\\wittej\\data\\testdata\\original" # 
output_dir = "D:\\wittej\\data\\testdata\\"

output_dir = convert_tsf_to_proc_prof_imzml(input_dir, output_dir) # this function also kindly updates the output directory

print(f"File generated at {output_dir}")

## Creating a coninous profile imzML file ##

We can make an imzML file from the tsf dat that has the same mass axis. This file is smaller, since we only need to save the mass axis once.
To do this, we first wan to test is the masses are equally distributed. For this, we check in a set of pixels where data points were measured for each mass (the  `stepsize between pseudo-bin`) and then compare them across the pixels. We use the standard deviation as measure to see how much the data points scatter for each mass (the `std dev within pseudo-bin`).

lets check out how these compare first:

In [1]:
from tsf_to_cont_prof_imzML import tsf_check_spacing, write_tsf_to_cont_prof_imzml

# here, we wrtie down the path to our input data (and potentially also where we would like to store it after conversion)
input_file = "D:\\wittej\\data\\Tonsille.d"
output_dir = "D:\\wittej\\data\\tonsille_conv"

# this line of code gives us a graphical output of the mass scattering comparison. Additionally, an array of averaged masses is also stored
averaged_bins = tsf_check_spacing(input_file)

ModuleNotFoundError: No module named 'tsfdata'

The graphic we obtain by running the line of code above this compares the discrepancy of pseudo-bin spreading. 
We define a pseudo-bin as a group of aquisition points in the mz range of the instrument (e.g the recorded masses for one pixel at {249.08334 249.16667 249.25001 ...} ).
then we compare the stepsize of each pseudobin with their respective spread accross different pixels (e.g. the masses around 249.16667 for the pixels might vary by a small amount.

If the `stepsize between pseudo-bin` is significantly larger than the `std dev within pseudo-bin`, we can assume that our accuisition produced data of such a high reproducibility between pixels inside a run that these can be assumed as continous. We can then continue with writing them and using the average mass axis as refence. 

In [None]:
from tsf_to_cont_prof_imzML import write_tsf_to_cont_prof_imzml
output_file = write_tsf_to_cont_prof_imzml(input_file, averaged_bins, output_dir=output_dir)


print(f"File generated at {output_file}")

In [None]:
from tsf_to_cont_prof_imzML import imzML_check_spacing, write_imzML_to_cont_prof_imzml

input_file = "D:\\wittej\\data\\minidata\\S042_Processed_imzML\\S042_Processed.imzML"
output_dir = "D:\\wittej\\data\\minidata\\S042_Processed_imzML\\"


averaged_bins = imzML_check_spacing(input_file)

In [None]:
# check if the graph looked good

output_file = write_imzML_to_cont_prof_imzML(input_file, averaged_bins,
                                             polarity="positive", pixel_size="50", output_dir=output_dir)


print(f"File generated at {output_dir}")