# Creating QC reports with m2aia

With this notebook, Quality control reports can be generated from an .imzML file.
There are three different QCtools available:

- agnostic_qc:
Takes only the .imzML file as input and generates an output file at the specified location.
The file is evaluated and different metrics are applied to it.

- calibrant_qc:
Additionally to the imzML file, a file with masses of calibrants is provided.
this QC tool evaluates how the spectra look around each calibrant and calculates metrics. Additionally, the spectrum of each pixel gets evaluated and the nearest peak is set into relation on how far the peak is from the theoretical value

- region_qc:
Together with the imzML file, a region annotation is loaded. The QC calculates the metrics for each region and compares them.
If no region file is provided, one is automatically generated where connected pixels are grouped and a annotation file is generated.


# A very short introduction to jupyter
If you are not familiar with jupyter notebooks, they are a really cool way how to make code accessible for people with little to no experience in coding. Please think of this as a nice text document that allows you to send some defined code boxes into a happy place where they get taken care of.
Everything inside this document is organized in cells. Some cells contain a special kind of text (markdown) and some contain python code.
All you need to do is to "run" a cell to make the code inside do its funky stuff. You can do this ususally by hitting a play-button in the upper menu (it should say "Run Cell or Execute Cell") or by right-clicking a cell and using the "Run / Execute / ..." statement.

# Imports

Before you can start the QC tools, load the required tools and libraries.


PLEASE RUN THE CELL BELOW WITHOUT CHANGES

In [None]:
# import statements
import m2aia as m2
from i2nca import report_agnostic_qc,  report_calibrant_qc, report_regions_qc

# Loading a dataset

To start a QC, you need to first load the imzML dataset. It needs to be loaded only once for all the QCs (this also makes the process quicker)

THE CELL BELOW NEEDS YOUR INPUT

Please change the following  variables in order to load the dataset:
- `file_path`: Please provide a path to the imzML file on your machine. It must end with ".imzML"


In [None]:
file_path = r"C:\Users\Jannik\Documents\Uni\Master_Biochem\4_Semester\M2aia\data\exmpl_cont\conv_output_centroided.imzML"
I = m2.ImzMLReader(file_path)

# Agnostic QC

To make a agnostic QC, we can use the  `report_agnostic_qc` function. Running this function will create a QC report as a pdf at a specifeid location. Inside the agnostic QC, different reports, graphs and images will be created for the dataset that do not assume anything on the type of experiment that was performed (hence the name "agnostic")

THE FOLLOWING CELL NEEDS YOUR INPUT

Please change the variable `output_filepath` to a path on your machine. The output pdf will be saved there.

In [None]:
output_filepath = r"C:\Users\Jannik\Documents\Uni\Master_Biochem\4_Semester\M2aia\data\exmpl_cont\kidney"

report_agnostic_qc(I,output_filepath)

# Calibrant QC

In order to get a comparison between some calibrants (or other mz values that are of interest), the function  `report_calibrant_qc`creates a report where a comparison between the dataset and some refence mz values are compared.
For this, some additional parameters need to be specified.

THE FOLLOWING CELL NEEDS YOUR INPUT

Please change the following variables in order to create a Calibrant QC:
- `output_filepath`: Please provide a path on your machine. The output pdf will be saved there.

- `calibrants_csv_file`: Please provide a file with the calibrant masses. Use a csv file with ";" as delimiter. The column with the mz values must be called "mz" and the column with an identifier must be called "name".

- `distance`: A parameter for the calculation of accuracy metrics. Only datapoints inside an interval of `distance` around any of the `calibrants_file` values are considered for the determination of different metrics.

- `accuracy`: A parameter for the cutoff that is applied on accuracy images. This effectively controls the range in which the colors are displayed and can be manipulated seperately of the `distance` to allow for control over visual output.

- `coverage`: The coverage is a subsetting method for large datasets. A coverage of 0.3 means thatthe pixels get subsetted amounting to 30% of the full measurent. This allows faster computation for large datasets. For small datasets, the value should be 1.0


In [None]:
output_filepath = r"C:\Users\Jannik\Documents\Uni\Master_Biochem\4_Semester\M2aia\data\exmpl_cont\kidney"
calibrants_csv_file = r"C:\Users\Jannik\Documents\Uni\Master_Biochem\4_Semester\M2aia\data\exmpl_cont\calibrants_9AA.csv"

distance = 0.025 # interval in delta-mz
accuracy = 100  # interval in ppm
coverage = 1.0 # value between 0 and 1

report_calibrant_qc(I, output_filepath, calibrants_csv_file, 0.025, 50, 1)

# Region QC

The `report_region_qc` allows to compare some regions and see the differences in the data set with different metrics.
For this, an additional annotation is required separating the dataset into different regions. If this is not provided, the QC report will automatically generate a annotation all the pixels presented in the dataset by the conected object they belong to (neighbouring pixels are conuted as one object)

THE FOLLOWING CELL NEEDS YOUR INPUT

Please change the following variables in order to create a Calibrant QC:
- `output_filepath`: Please provide a path on your machine. The output pdf will be saved there.

- `region_tsv_file`: Please optionally provide a tsv file with annotations. The pixels need to be annotated with their x and y position and any keyword for their annotation. pixels with the same keyword get grouped into one region. the column with the x position must be called "x", the y position "y" and the column in which the groups are defined must be called "annotation". \
If no regions are annotated, set this paramer to `False`. Then, a annotation will be created automatically.


In [None]:
output_filepath = r"C:\Users\Jannik\Documents\Uni\Master_Biochem\4_Semester\M2aia\data\exmpl_cont\kidney"
region_tsv_file = r"C:\Users\Jannik\Documents\Uni\Master_Biochem\4_Semester\M2aia\data\exmpl_cont\calibrants_9AA.csv"

report_regions_qc(I,output_filepath, region_tsv_file)