# Median Spectra
It is very common for spectral data to reduce the complexity by performing an analysis based on median spectra first. Media spectra are computed per image an annotated region and are computed by calculating the median per channel across all values of the annotated region. For the HSI data of the HeiPorSPECTRAL dataset, this yields a 100-dimensional vector per image and annotated region. Since those vectors are commonly needed, they are precomputed for each dataset for easy access and its usage is shown here.

In [1]:
%load_ext autoreload
%autoreload 2

import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from sklearn.decomposition import PCA

from htc import DataPath, add_std_fill, median_table, tivita_wavelengths

## Median Table

The central entry point for loading median spectra is the [`median_table()`](../htc/utils/helper_functions.py) function. If you already have the data paths at hand from which you want to get the median spectra from, you can simply provide it with a list of those paths:

In [2]:
df = median_table(
    paths=[
        DataPath.from_image_name("P086#2021_04_15_09_22_02@polygon#annotator1"),
        DataPath.from_image_name("P086#2021_04_15_09_22_02@polygon#annotator2"),
    ]
)
df.head()

Unnamed: 0,image_name,subject_name,timestamp,label_name,median_spectrum,std_spectrum,median_normalized_spectrum,std_normalized_spectrum,n_pixels,median_sto2,...,SW_Version,Fremdlichterkennung_Fremdlicht erkannt?,Fremdlichterkennung_PixelmitFremdlicht,Fremdlichterkennung_Breite LED Rot,Fremdlichterkennung_Breite LED Gruen,Fremdlichterkennung_Grenzwert Pixelanzahl,Fremdlichterkennung_Intensity Grenzwert,Aufnahme_Aufnahmemodus,camera_name,annotation_name
0,P086#2021_04_15_09_22_02,P086,2021_04_15_09_22_02,spleen,"[0.04814308, 0.045386504, 0.042081352, 0.03791...","[0.017346695, 0.013489561, 0.011369096, 0.0100...","[0.003749312, 0.0035218908, 0.0032769735, 0.00...","[0.001293977, 0.000981235, 0.0008071454, 0.000...",14370,0.438706,...,1.6.0.1,False,0.0,50.0,50.0,100.0,7.0,Reflektanz,0202-00118_correct-1,polygon#annotator1
1,P086#2021_04_15_09_22_02,P086,2021_04_15_09_22_02,spleen,"[0.04673159, 0.043991182, 0.040661428, 0.03656...","[0.018447485, 0.014333139, 0.01225138, 0.01095...","[0.0036932975, 0.0034690085, 0.0032145875, 0.0...","[0.0014224425, 0.0010762984, 0.00089630607, 0....",21437,0.424235,...,1.6.0.1,False,0.0,50.0,50.0,100.0,7.0,Reflektanz,0202-00118_correct-1,polygon#annotator2


The output is a table and the `median_normalized_spectrum` column contains the precomputed median spectra. For each row, there is a list with the 100 normalized reflectance values. It is often easier to work the median spectra by concatenating all vectors to a matrix.

In [3]:
median_spectra = np.stack(df["median_normalized_spectrum"])
median_spectra.shape

(2, 100)

Let's calculate the average median spectra of our selection:

In [4]:
np.mean(median_spectra, axis=0).shape

(100,)

Instead of providing a list of all data paths, it is also possible to retrieve the median spectra for all images of a dataset via the name of the dataset.

In [5]:
df = median_table(dataset_name="HeiPorSPECTRAL")
df.head()

Unnamed: 0,image_name,subject_name,timestamp,label_index,label_name,median_spectrum,std_spectrum,median_normalized_spectrum,std_normalized_spectrum,n_pixels,...,SW_Version,Fremdlichterkennung_Fremdlicht erkannt?,Fremdlichterkennung_PixelmitFremdlicht,Fremdlichterkennung_Breite LED Rot,Fremdlichterkennung_Breite LED Gruen,Fremdlichterkennung_Grenzwert Pixelanzahl,Fremdlichterkennung_Intensity Grenzwert,Aufnahme_Aufnahmemodus,camera_name,annotation_name
0,P086#2021_04_15_11_38_04,P086,2021_04_15_11_38_04,0,stomach,"[0.298144, 0.2833822, 0.2614737, 0.23648293, 0...","[0.17259361, 0.16893886, 0.16671665, 0.1658238...","[0.0063450593, 0.006019631, 0.005552271, 0.005...","[0.001271573, 0.0011821607, 0.0011256082, 0.00...",34203,...,1.6.0.1,False,6.0,50.0,50.0,100.0,7.0,Reflektanz,0202-00118_correct-1,polygon#annotator1
1,P086#2021_04_15_11_38_04,P086,2021_04_15_11_38_04,0,stomach,"[0.28387108, 0.26972988, 0.24867523, 0.2245793...","[0.06143358, 0.05755748, 0.05417795, 0.0519581...","[0.006142241, 0.005821412, 0.005361867, 0.0048...","[0.0010590239, 0.0009730612, 0.00091588503, 0....",42426,...,1.6.0.1,False,6.0,50.0,50.0,100.0,7.0,Reflektanz,0202-00118_correct-1,polygon#annotator2
2,P086#2021_04_15_11_38_04,P086,2021_04_15_11_38_04,0,stomach,"[0.27330333, 0.259961, 0.23922777, 0.2155934, ...","[0.06579228, 0.061461516, 0.057752986, 0.05486...","[0.0060464954, 0.0057365242, 0.005281097, 0.00...","[0.001107502, 0.0010132897, 0.00094938924, 0.0...",51609,...,1.6.0.1,False,6.0,50.0,50.0,100.0,7.0,Reflektanz,0202-00118_correct-1,polygon#annotator3
3,P086#2021_04_15_11_38_26,P086,2021_04_15_11_38_26,0,stomach,"[0.2836912, 0.26937896, 0.2474714, 0.2232979, ...","[0.15054636, 0.14754522, 0.14422333, 0.1433670...","[0.0063025337, 0.005978748, 0.0055006156, 0.00...","[0.0011646454, 0.0010641138, 0.0010057053, 0.0...",40668,...,1.6.0.1,False,0.0,50.0,50.0,100.0,7.0,Reflektanz,0202-00118_correct-1,polygon#annotator1
4,P086#2021_04_15_11_38_26,P086,2021_04_15_11_38_26,0,stomach,"[0.26858327, 0.2554159, 0.23456676, 0.21172526...","[0.057310455, 0.05293087, 0.049723584, 0.04716...","[0.0060783997, 0.005771285, 0.005312352, 0.004...","[0.0010364179, 0.00093886565, 0.00088119856, 0...",49729,...,1.6.0.1,False,0.0,50.0,50.0,100.0,7.0,Reflektanz,0202-00118_correct-1,polygon#annotator2


In the case of the HeiPorSPECTRAL dataset, this includes all available annotations because no default annotation is set for this dataset (the `dataset_settings.json` file does not contain an `annotation_name_default` key). The [`median_table()`](../htc/utils/helper_functions.py) function has more options to customize the resulting table like restring the output to specific labels (cf. documentation).

The table does not only contain the median spectra but also every other available information for the images. This is useful to make selections based on the metadata or simply to explore the data.

In [6]:
df["situs"].unique()

array([1, 2, 3, 4])

## Hierarchical Aggregation
It is very common to make a selection of the median table and then visualize the resulting median table. However, it is important to hierarchically aggregate the data first. In the following, we compare the median spectra for the three different angles while aggregating first across the annotations and then across images per subject.

In [7]:
# Average the median spectra across all annotators
dfg = df.groupby(["angle", "subject_name", "image_name"], as_index=False).agg(
    median_normalized_spectrum=pd.NamedAgg(
        column="median_normalized_spectrum",
        aggfunc=lambda x: np.mean(np.stack(x), axis=0),
    ),
)
# Average the median spectra across all images of one subject
dfg = dfg.groupby(["angle", "subject_name"], as_index=False).agg(
    median_normalized_spectrum=pd.NamedAgg(
        column="median_normalized_spectrum",
        aggfunc=lambda x: np.mean(np.stack(x), axis=0),
    ),
)
dfg

Unnamed: 0,angle,subject_name,median_normalized_spectrum
0,-25,P086,"[0.0055898433, 0.0052947137, 0.0048663896, 0.0..."
1,-25,P087,"[0.005636148, 0.0054836264, 0.0052139885, 0.00..."
2,-25,P088,"[0.0054585948, 0.005202231, 0.004885306, 0.004..."
3,-25,P089,"[0.0055314223, 0.005321091, 0.0050316732, 0.00..."
4,-25,P090,"[0.0054063853, 0.0050607696, 0.004734265, 0.00..."
5,-25,P091,"[0.005126051, 0.0049619856, 0.00480613, 0.0045..."
6,-25,P092,"[0.006394498, 0.006092998, 0.0057552275, 0.005..."
7,-25,P093,"[0.0056002075, 0.0054592527, 0.005240095, 0.00..."
8,-25,P094,"[0.004935036, 0.0046973573, 0.0044305148, 0.00..."
9,-25,P095,"[0.005486307, 0.0052765626, 0.0049359836, 0.00..."


We will use this table to report an averaged median spectra curve per angle (average from all subjects). The standard deviation is indicated by the variation across subjects.

In [8]:
fig = go.Figure()
angle_colors = {
    -25: "#008B8B",
    0: "#8B008B",
    25: "#8FBC8F",
}

for angle in dfg["angle"].unique():
    dfa = dfg[dfg["angle"] == angle]

    mean = np.mean(np.stack(dfa["median_normalized_spectrum"]), axis=0)
    std = np.std(np.stack(dfa["median_normalized_spectrum"]), axis=0)

    fig = add_std_fill(
        fig,
        mid_line=mean,
        std_range=std,
        x=tivita_wavelengths(),
        linecolor=angle_colors[angle],
        label=str(angle),
    )

fig.update_layout(margin=dict(l=0, r=0, t=40, b=0))
fig.update_layout(height=350, width=600, template="plotly_white")
fig.update_layout(title_text="Median spectra for different angle", title_x=0.5)
fig.update_layout(legend_title_text="angle")
fig.update_yaxes(title="<b>normalized reflectance [a.u.]</b>")
fig.update_xaxes(title="<b>wavelength [nm]</b>")

## PCA
It is very common to visualize the high-dimensional median spectra with the help of PCA. This can be used to compare different groups or properties in the data (we have many examples of PCA plots in our papers). In general, we prefer PCA plots over non-linear visualization techniques like t-SNE or UMAP since PCA plots are much easier to interpret.

In [9]:
dfp = df[df.label_name == "kidney"].copy()  # For simplicity, we show only one label in the plot
dfp["angle"] = dfp["angle"].astype(str)  # Plotly should consider the angle as a categorical variable

# Compute the PCA
pca = PCA(n_components=2, random_state=1337)
X = np.stack(dfp["median_normalized_spectrum"])
X_pca = pca.fit_transform(X)
dfp["pca_1"] = X_pca[:, 0]
dfp["pca_2"] = X_pca[:, 1]

# Visualize the result
fig = px.scatter(dfp, x="pca_1", y="pca_2", color="angle", symbol="subject_name", hover_data=["situs", "repetition"])
fig.update_layout(title_text="PCA of kidney to visualize situs, angle and repetition", title_x=0.5)
fig.update_xaxes(title=f"<b>PC 1 [{pca.explained_variance_ratio_[0]:.2f}]</b>")
fig.update_yaxes(title=f"<b>PC 2 [{pca.explained_variance_ratio_[1]:.2f}]</b>")
fig.update_layout(template="plotly_white", height=600, width=800)