## Unified CCLs model ID as BROAD ID

In this step, **model ID (either Sanger Model ID or BROAD ID) would be unified as BROAD ID**. 

**Input**
- **Cell line model annotation (e.g., Sanger model ID, BROAD ID etc.)**: model_list_20230307.csv (Version: 23Q2; https://depmap.org/portal/download/all/)
- Transient Sanger CNV data in previous step: **cnv_sanger_entrezID.csv**
- Transient CRISPR gene effect data in previous step: **crispr_broad_entrezID.csv**

**Output**
- Sanger CNV data with mapped BROAD ID: **cnv_sanger_entrezID_broadID.csv**
- CRISPR gene effect data with mapped BROAD ID: **crispr_broad_entrezID_broadID.csv**

In [1]:
## Import modules
import numpy as np
import pandas as pd

In [2]:
## Import model information for downstream data integration
model = pd.read_csv('/Users/amy/Desktop/SyntheticLethalityProject/sources/model_list_20230307.csv', index_col = None)

## Subset the data for useful columns
model_col = model[['model_id', 'BROAD_ID']]
model_col = model_col.rename(columns = {'model_id': 'SangerModelID', 'BROAD_ID': 'ModelID'})
model_col = model_col.dropna()

In [3]:
## Previous Sanger CNV data and CRISPR gene effect data
cnv_sanger = pd.read_csv('/Users/amy/Desktop/SyntheticLethalityProject/1_data_processing/01_entrez_ID_mapping/cnv_sanger_entrezID.csv', index_col = None, low_memory=False)
crispr_broad = pd.read_csv('/Users/amy/Desktop/SyntheticLethalityProject/1_data_processing/01_entrez_ID_mapping/crispr_broad_entrezID.csv', index_col = None)

In [4]:
## Merge BROAD ID to each data
crispr_broad = pd.merge(model_col, crispr_broad, on = ['ModelID'], how = 'right')
cnv_sanger = pd.merge(model_col, cnv_sanger, on = ['SangerModelID'], how = 'right')
## NA may be presented in Model ID. Hence, we should drop these NA value out. 
cnv_sanger = cnv_sanger.dropna(subset = ['ModelID'])
crispr_broad = crispr_broad.dropna(subset = ['SangerModelID'])

## Rename the column 
cnv_sanger = cnv_sanger.rename(columns={'ModelID':'BROAD_ID'})
crispr_broad = crispr_broad.rename(columns={'ModelID':'BROAD_ID'})


## After mapping to the model identifier list
## The overview of CNV data
print("Number of cell lines in CNV data:", cnv_sanger.shape[0])
print("Number of unique cell lines from Sanger source:", cnv_sanger[cnv_sanger.source == 'Sanger'].BROAD_ID.unique().shape[0])
print("Number of unique cell lines from Broad source:", cnv_sanger[cnv_sanger.source == 'Broad'].BROAD_ID.unique().shape[0])

## The overview of CRISPR data 
print("Number of cell lines in CRISPR data:", crispr_broad.shape[0])
print("Number of unique cell lines in CRISPR data:", crispr_broad.BROAD_ID.unique().shape[0])

Number of cell lines in CNV data: 1344
Number of unique cell lines from Sanger source: 1020
Number of unique cell lines from Broad source: 324
Number of cell lines in CRISPR data: 992
Number of unique cell lines in CRISPR data: 992


In [5]:
cnv_sanger[:2]

Unnamed: 0,SangerModelID,BROAD_ID,source,symbol,1,29974,2,144568,127550,53947,...,9183,55055,11130,7789,158586,79364,79699,7791,23140,26009
0,SIDM00499,ACH-000956,Broad,,Neutral,Neutral,Loss,Neutral,Neutral,Neutral,...,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral,Gain,Neutral,Neutral
1,SIDM00499,ACH-000956,Sanger,,Neutral,Neutral,Gain,Gain,Neutral,Neutral,...,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral


In [6]:
crispr_broad[:2]

Unnamed: 0,SangerModelID,BROAD_ID,1,29974,2,144568,127550,53947,51146,8086,...,55055,11130,7789,158586,79364,440590,79699,7791,23140,26009
0,SIDM00105,ACH-000001,-0.102725,0.058595,0.058246,-0.041881,-0.088661,0.170335,-0.015254,-0.223691,...,-0.084055,-0.084184,0.131495,0.238702,0.201712,-0.250381,0.045612,0.044154,0.146801,-0.473583
1,SIDM00594,ACH-000004,0.008878,-0.077633,-0.099297,0.03012,-0.080334,-0.112404,0.298774,-0.125139,...,-0.066673,-0.443145,0.183618,0.058936,0.108711,0.056322,-0.355712,0.13531,0.200408,-0.07615


Save the file with mapped with BROAD ID for further processing.

In [7]:
## Sanger CNV
cnv_sanger.to_csv('/Users/amy/Desktop/SyntheticLethalityProject/1_data_processing/02_BROAD_ID_mapping/cnv_sanger_entrezID_broadID.csv', index=False)
## BROAD CRISPR gene effect
crispr_broad.to_csv('/Users/amy/Desktop/SyntheticLethalityProject/1_data_processing/02_BROAD_ID_mapping/crispr_broad_entrezID_broadID.csv', index=False)