# Akoya Academy Webinar: PhenoImager Analysis
This notebook contains the necessary libraries and steps to import the single cell data from QuPath into Python. The files exported from QuPath are stored under `data/` folder.
## Step-2: Evaluating singal quality
Step-2 covers the quality evaluation using the top-20 bottom 10% ratio. For each marker in each channel, we compute the avg. of top 20 highest expressing cells for that marker, and the avg. expression in the bottom 10% of the cells for that marker and use the ratio of top20/btm10 as a proxy for signal-to-noise (SNR) ratio. A value greater than 10 can result in reliable anlaysis. More information can be found in this paper: [Multiplex Immunofluorescence and Multispectral Imaging: Forming the Basis of a Clinical Test Platform for Immuno-Oncology
](https://www.frontiersin.org/articles/10.3389/fmolb.2021.674747/full)


### Import necessary libararies

In [2]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import anndata as ad
import pandas as pd
import scanpy as sc
import numpy as np

import seaborn as sns; sns.set(color_codes=True)

#ignore warnings
import warnings
warnings.filterwarnings('ignore')


In [7]:
#Function to compute SNR for each protein in each sample
def computeTop20Btm10(ad):
    '''
    Compute the ratio of top 20th percentile to bottom 10th percentile for each protein in each sample
    Input: anndata object
    Output: a dataframe with sampleID, Protein, ratio of top20/btm10
    '''
    top20btm10DF = pd.DataFrame(columns=['ImageID','Protein','top20btm10'])
    # for each sample
    for sID in ad.obs.ImageID.sort_values().unique():
        subAD = ad[ad.obs.ImageID == sID]
        for x in subAD.var_names:
            aX = subAD[:,x].X.flatten()
            # compute 20 largest values in aX
            top20 = np.sort(aX)[-20:]
            # compute the mean of bottom 10th percentile of aX
            btm10 = np.sort(aX)[:int(len(aX)*0.1)]
            #print(aX.shape, top20.shape, btm10.shape)
            top20btm10 = np.mean(top20)/np.mean(btm10)
            top20btm10DF = top20btm10DF.append({'ImageID':sID,'Protein':x,'top20btm10':top20btm10}, ignore_index=True)
    return top20btm10DF
        

In [8]:
pd.set_option('display.float_format', lambda x: '%.4f' % x)
computeTop20Btm10(ad.read_h5ad('data/adata.h5ad'))

Unnamed: 0,ImageID,Protein,top20btm10
0,NSCLC_S1,CD8,inf
1,NSCLC_S1,CD4,inf
2,NSCLC_S1,CD3E,3306.3506
3,NSCLC_S1,CD20,771.2482
4,NSCLC_S1,PanCK,436.5684
5,NSCLC_S1,CD68,959.6335
6,NSCLC_S2,CD8,inf
7,NSCLC_S2,CD4,73434.0859
8,NSCLC_S2,CD3E,1826.7528
9,NSCLC_S2,CD20,64.1688


As the table above shows, all markers have a high top20-bottom10 ratio. A value of infinity means that the marker has 0 background. While this step is optional, it is important to evaluate data quality before phentyping.