# Example LOOCV Results

In this notebook we will run the example script: **example_loocv_script.py**, which will, similar to the tutorial notebook, run the GTM-decon engine in a leave-one-out manner on the example paired single cell and bulk data. For more details, you can see the either the tutorial notebook or the **example_loocv_script.py** source code itself.

In [None]:
!python3 example_loocv_script.py

### Loading Data and Imports

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import json
import os
from scipy.stats import pearsonr, spearmanr, linregress

In [None]:
batches = ['D1', 'D2', 'D3', 'D4', 'H1', 'H2', 'H3']

Gather GTM-decon inferred cell-type proportions for each LOOCV batch. 

In [None]:
l = []

for batch in batches:
    results_path = os.path.join('tutorial_results', batch, 'gatheredResults.csv')
    
    l.append(pd.read_csv(results_path, index_col=0))
    
GTM_proportions = pd.concat(l)

In [None]:
GTM_proportions

### Load Proportions File

We use the single-cell proportions for each sample as a proxy for the ground-truth cell-type proportions.

In [None]:
proportions = pd.read_csv("../data/example_proportions.csv", index_col=0)

In [None]:
proportions

## Computing Spearman R
We can compute the Spearman Correlation Coefficient for our inferred cell-type proportions and the real single-cell proportions of the left out batches (as a proxy for the ground truth cell-type proportions.

In [None]:
spearman_results = []
for b in proportions.index:
    spearman_results.append(spearmanr(
        GTM_proportions.loc[b][proportions.columns],
        proportions.loc[b],
    )[0])

In [None]:
sns.set(rc={'figure.figsize': (2, 3), 'figure.dpi': 100})
sns.boxplot(spearman_results)
plt.title('GTM-decon LOOCV SCC')
# plt.xlabel('')
plt.xticks([])
plt.ylabel('Spearman R')


## Plotting Cell-type specific $R^2$

In order to evaluate possible biases, we can see how our model performs on the basis of each celltype. Here we will show these results for the 6 most common cell-types in the example data. Similar to Figure 2b.

In [None]:
cell_types_to_plot = sorted(['alpha cell', 'ductal cell', 'beta cell', 'gamma cell', 'acinar cell', 'delta cell'])

In [None]:
sns.set(rc={'figure.figsize': (3.5, 12),'figure.dpi': 100})
fig, axes = plt.subplots(6, 1)
plt.subplots_adjust(hspace=0.6, wspace=0.6)

for index, celltype in enumerate(cell_types_to_plot):
    k, n, r, _, _ = linregress(
        GTM_proportions[celltype],
        proportions[celltype],
    )
    sns.scatterplot(
        x = GTM_proportions[celltype],
        y = proportions[celltype],
        ax = axes[index],
        s=5,
    )

    ymin, ymax = axes[index].get_ylim()
    xmin, xmax = axes[index].get_xlim()
    axes[index].axline((0, n), slope=k, label=f'$R^2$={r:.3f}', color="#cc8963", lw=2, alpha=0.5)
    axes[index].legend(framealpha=0, fontsize=8, loc='upper center', 
                    bbox_to_anchor = [0.5, 1.25])

    axes[index].set_ylim(ymin, ymax)
    axes[index].set_xlim(xmin, xmax)
    axes[index].set_ylabel(celltype)
    axes[index].set_xlabel('')
    
fig.supylabel('GTM-decon Inferred Proportions')
fig.supxlabel('Ground Truth Proportions')
fig.suptitle('GTM-decon Cell-Type Specific $R^2$')

plt.tight_layout()