## CRISPR gene dependency scores  *-- Standard Z-score, Robust Z-score, Ranking*


In our project, we employ three approaches to identify SL interaction. The main idea is to compare the A1 gene dependecy scores. Here, we would mainly focus on the standard Z-score, robust Z-score, and the ranking of the gene dependecy scores.

**Input**
- BROAD CRISPR gene effect data: CRISPR_broad_paralog.csv

**Output**
- CRISPR gene dependecy scores: 
    - crispr_broad_paralog_standardZscore.csv
    - crispr_broad_paralog_robustZscore.csv
    - crispr_broad_paralog_ranking.csv

In [1]:
## Import modules
import numpy as np
import pandas as pd

In [2]:
## Load the dataset 
crispr_broad = pd.read_csv('/Users/amy/Desktop/SyntheticLethalityProject/1_data_processing/04_paralog_genes/crispr_broad_paralog.csv', index_col = None)
crispr_broad[:2]

Unnamed: 0,BROAD_ID,SangerModelID,1,2,9,10,12,13,19,20,...,196403,196410,393046,196463,196483,196513,196527,196541,131034,327657
0,ACH-000001,SIDM00105,-0.102725,0.058246,0.11052,-0.12292,-0.150353,0.025719,0.181498,0.023061,...,-0.222214,0.044005,,-0.012041,-0.644718,0.060917,0.106389,0.110456,0.086526,0.133952
1,ACH-000004,SIDM00594,0.008878,-0.099297,0.227454,0.129266,-0.088875,0.222941,0.084131,0.055002,...,0.042162,-0.20925,0.041585,-0.129117,-1.565798,0.034261,-0.0408,0.258221,-0.081413,0.130143


**1. Standard Z-score**

$$\text{Standard }Z\text{-score} = \frac{{X_i – \mu}}{{\sigma}}$$

In [3]:
stand_z_gene_effect = crispr_broad.drop(['BROAD_ID', 'SangerModelID'], axis = 1).apply(lambda x: ((x-x.mean())/x.std(ddof=0))) 
stand_z_gene_effect = pd.concat([crispr_broad[['BROAD_ID', 'SangerModelID']], stand_z_gene_effect], axis = 1)
stand_z_gene_effect.to_csv('/Users/amy/Desktop/SyntheticLethalityProject/2_data_analysis/02_CRISPR_gene_dependency_scores_processing/crispr_broad_paralog_standardZscore.csv', index = False)
stand_z_gene_effect[:2]

Unnamed: 0,BROAD_ID,SangerModelID,1,2,9,10,12,13,19,20,...,196403,196410,393046,196463,196483,196513,196527,196541,131034,327657
0,ACH-000001,SIDM00105,-0.537421,0.024031,0.018168,-1.401754,-1.594301,-0.977586,1.914906,0.422602,...,-0.764161,1.014185,,-0.079585,0.816686,0.268747,1.1263,0.314622,0.267682,0.521102
1,ACH-000004,SIDM00594,0.538675,-1.603198,1.272691,0.995278,-0.973359,0.822347,0.979121,0.705908,...,1.478348,-1.250806,1.642221,-1.131069,-2.337275,-0.005882,-0.35896,1.783857,-1.440998,0.48212


**2. Robust Z-score**

$$\text{Robust }Z\text{-score} = \frac{{X_i – \tilde{X}}}{{MAD}}, \text{where }MAD = Median(|X_i - \tilde{X}|)$$

where

$$MAD = Median(|X_i - \tilde{X}|)$$

In [4]:
# calculate the robust z score of gene dependency value 
def robust_Zscore(x):
    
    '''Function for robust Z-score calculation'''
    
    median = np.median(x)
    MAD = np.median(np.abs(x-np.median(x)))
    robust_Zscore = (0.6745 * (x - median))/MAD
    
    return robust_Zscore
    
    
robust_z_gene_effect = crispr_broad.drop(['BROAD_ID', 'SangerModelID'], axis = 1).apply(robust_Zscore)
robust_z_gene_effect = pd.concat([crispr_broad[['BROAD_ID', 'SangerModelID']], robust_z_gene_effect], axis = 1)
robust_z_gene_effect.to_csv('/Users/amy/Desktop/SyntheticLethalityProject/2_data_analysis/02_CRISPR_gene_dependency_scores_processing/crispr_broad_paralog_robustZscore.csv', index = False)
robust_z_gene_effect[:2]

Unnamed: 0,BROAD_ID,SangerModelID,1,2,9,10,12,13,19,20,...,196403,196410,393046,196463,196483,196513,196527,196541,131034,327657
0,ACH-000001,SIDM00105,-0.652094,0.0,-0.046453,-1.680939,-1.950109,-1.124576,2.132993,0.499161,...,-1.105391,1.032899,,-0.067963,0.846485,0.321882,1.314592,0.317297,0.33715,0.662487
1,ACH-000004,SIDM00594,0.621513,-1.893056,1.281874,1.207144,-1.214567,0.985801,1.07658,0.836073,...,2.115634,-1.462395,,-1.19574,-2.792801,0.0,-0.487554,2.043047,-1.465485,0.615543


**3. Ranking**

In [5]:
rank_gene_effect = crispr_broad.drop(['BROAD_ID', 'SangerModelID'], axis = 1).apply(lambda x: x.rank(ascending=False))
rank_gene_effect = pd.concat([crispr_broad[['BROAD_ID', 'SangerModelID']], rank_gene_effect], axis = 1)
rank_gene_effect.to_csv('/Users/amy/Desktop/SyntheticLethalityProject/2_data_analysis/02_CRISPR_gene_dependency_scores_processing/crispr_broad_paralog_ranking.csv', index=False)
rank_gene_effect[:2]

Unnamed: 0,BROAD_ID,SangerModelID,1,2,9,10,12,13,19,20,...,196403,196410,393046,196463,196483,196513,196527,196541,131034,327657
0,ACH-000001,SIDM00105,519.0,352.0,367.0,651.0,664.0,601.0,17.0,220.0,...,594.0,91.0,,377.0,134.0,247.0,78.0,254.0,256.0,184.0
1,ACH-000004,SIDM00594,184.0,665.0,69.0,94.0,600.0,124.0,99.0,140.0,...,39.0,631.0,22.0,617.0,683.0,352.0,478.0,31.0,665.0,195.0
