# TMA and Batch QC Metrics

The purpose of this notebook is to run QC checks on a per-TMA and per-Batch/Cohort/Tissue level.

There are two parts which can be done in any order, depending on which type of QC effects are of interest.

In [None]:
import os
from toffy import qc_comp, qc_metrics_plots

## QC TMA Metrics

### 1. Select QC metrics and TMAs

Select any combination of the following three QC metrics:
1. `"Non-zero mean intensity"`
2. `"Total intensity"`
3. `"99.9% intensity value"`



In [None]:
qc_metrics = ["Non-zero mean intensity", "Total intensity", "99.9% intensity value"]

Specify the names of the relevant folders:
- `extracted_imgs_path`: The path to FOVs which contain Row and Column suffixes.
- `qc_tma_metrics_dir`: The path where the QC TMA metrics should be saved.


Here is what an example `extracted_imgs_path` directory may contain:

```sh
samples/
├── TONIC_TMA1_R1C1
├── TONIC_TMA1_R1C2
├── ...
├── TONIC_TMA2_R1C2
├── TONIC_TMA2_R7C10
└── ...

```

In [None]:
extracted_imgs_path = os.path.join(
    "D:\\extracted_images"
)

qc_tma_metrics_dir: str = os.path.join("./extracted/TONIC_Cohort/")

Specify the names of the TMAs of interest, note that the folders themselves should contain the Row number and Column number, for example `TONIC_TMA1_R1C2`, `TONIC_TMA2_R6C2`.

Every Row Column FOV can be acquired for TONIC TMA 1 if the proper substring, `TONIC_TMA1_` is listed in the `tmas` variable.

In [None]:
# Change the tmas to be a list of the tmas you want to run the QC on
tmas = [f"TONIC_TMA{n}_" for n in range(1, 3)]

In [None]:
qc_tmas = qc_comp.QCTMA(extracted_imgs_path, qc_tma_metrics_dir, qc_metrics=None)

In [None]:
qc_tmas.qc_tma_metrics(tmas)

You may want to exclude channels depending on their impact, the `channel_exclude` variable will filter out those channels when creating the ranked QC metrics.

The following channels will always be excluded from the TMA Metrics ranking below:
- Au
- Fe
- Na
- Ta
- Noodle

In [None]:
channel_exclude = ["chan_39", "chan_45"]

### 2. Get the average rank for the QC TMA metrics channel wise for each individual TMA.

In [None]:
qc_tmas.qc_tma_metrics_rank(tmas, channel_exclude=channel_exclude)

### 3. Plot the QC TMA Metrics

The following plot below depicts a heatmap of the TMA along with a histogram. 

The TMA QC metrics are processed first by a FOV wise ranking for each channel, then each channel is averaged for each FOV.

What we are looking for is that any particular region's average rank isn't higher than any other. An issue arises when, say all FOVs in the upper left corner of the TMA are systematically brighter than the others.


These plots get saved in a `figures` subfolder within `qc_tma_metrics_dir`.

<div align="center">
    <img src="img/nb6_ex_avg_tma_rank.png" />
</div>


In [None]:
qc_metrics_plots.qc_tmas_metrics_plot(qc_tmas=qc_tmas, tmas=tmas, save_figure=True, dpi=300)

## QC Batch Effect Metrics

The second half of this notebook is dedicated to looking at batch effects across different cohorts and / or tissues.

Specify the names of the relevant folders:
- `cohort_data_dir`: The path to FOVs which are suffixed by a `tissue`.
- `qc_cohort_metrics_dir`: The path where the QC Batch Effect metrics should be saved.


Here is what an example `cohort_data_dir` directory may contain:

```sh
cohort/
├── TONIC_TMA1_colon_bottom
├── TONIC_TMA1_ln_bottom
├── ...
├── TONIC_TMA2_NKI_Tonsil1
├── TONIC_TMA2_NKI_Tonsil2
└── ...

```

In [None]:
cohort_data_dir = "./data/extracted/"
qc_cohort_metrics_dir = "./data/extracted/cohort_metrics"

### 1. Select Tissues / Batches of Interest

`tissues` may contain a list of tissue suffixes to generate QC Batch Effects for.

In [1]:
tissues = ["ln_top", "ln_bottom", "spleen_top", "spleen_bottom", "tonsil_top", "tonsil_bottom"]

### 2. Compute the QC Batch Effect metrics for the set of tissues.

In [None]:
qc_batch = qc_comp.QCBatchEffect(cohort_data_dir, qc_cohort_metrics_dir, qc_metrics)

In [None]:
qc_batch.batch_effect_qc_metrics(tissues=tissues)

You may want to exclude channels depending on their impact, the `channel_exclude` variable will filter out those channels when creating the Batch Effect QC Metrics.

The following channels will always be excluded from the TMA Metrics ranking below:
- Au
- Fe
- Na
- Ta
- Noodle

If you only want to see a select few channels, you may set `channel_include` as a list of those desired channels, or leave it as `None` if you'd like to only exclude a few channels.

In [None]:
channel_exclude = ["chan_39", "Biotin", "PDL1", "chan_45"]
channel_include = None

### 3. Filter the Batch Effect QC Metrics

In [None]:
qc_batch.batch_effect_filtering(tissues=tissues, channel_include=channel_include, channel_exclude=channel_exclude)

### 4. Batch Effect Violin / Swarm Plot

Below, is an example of the Violin / Swarm plot. For each channel, the FOVs are points distributed vertically along each Channel. We suggest to mainly utilize this plot for a few channels, as it can quickly become overwhelming.

<div align="center">
    <img src="img/nb6_ex_batch_effect_violin_swarm.png" />
</div>


In [None]:
qc_metrics_plots.qc_batch_effect_violin(qc_batch, tissues=tissues, save_figure=True)

### 5. Batch Effect Heatmap

The following plot below is a heatmap for each FOV associated with a particular tissue, in this case it is "ln_bottom". We can have FOVs from different cohorts as well, (like TONIC, and SPAIN) and compare across cohorts.

The QC metrics used to generate this heatmap are normalized by dividing each value by the average of each row, then taking the `log2` of it. Therefore, a value of 1 would be interpreted as be 2 times greater than the row average, and a value of -1 would be 2 times less than the row average.

These plots get saved in a `figures` subfolder within `qc_cohort_metrics_dir`.

<div align="center">
    <img src="img/nb6_ex_batch_effect_heatmap.png" />
</div>


In [None]:
qc_metrics_plots.qc_batch_effect_heatmap(qc_batch, tissues=tissues, save_figure=True)