## Download CERES metrics

We download CERES scores from DepMap 20Q3 public release. The data are available at https://depmap.org/portal/download/.

> DepMap, Broad (2020): DepMap 20Q3 Public. figshare. Dataset doi:10.6084/m9.figshare.12931238.v1.

The CERES score was developed to estimate gene dependencies from CRISPR screens accounting for copy number impact.

> Robin M. Meyers, Jordan G. Bryan, James M. McFarland, Barbara A. Weir, ... David E. Root, William C. Hahn, Aviad Tsherniak. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nature Genetics 2017 October 49:1779–1784. doi:10.1038/ng.3984

In [1]:
import pathlib
import pandas as pd
from urllib.request import urlretrieve

In [2]:
figshare_base_url = "https://ndownloader.figshare.com/files/"

file_info = {
    "ceres": {"file_id": "24613292", "output_file": "ceres.csv"},
    "sample_id": {"file_id": "24613394", "output_file": "depmap_sample_info.csv"},
}

output_dir = pathlib.Path("data")
output_dir.mkdir(exist_ok=True)

In [3]:
for data in file_info:
    file_id = file_info[data]["file_id"]
    output_file = file_info[data]["output_file"]

    download_url = f"{figshare_base_url}/{file_id}"
    output_file = pathlib.Path(f"{output_dir}/{output_file}")

    urlretrieve(download_url, output_file)

## Preview downloaded data

In [4]:
# Load CERES scores
output_file = file_info["ceres"]["output_file"]
df = pd.read_csv(f"{output_dir}/{output_file}")

print(df.shape)
df.head(3)

(789, 18120)


Unnamed: 0,DepMap_ID,A1BG (1),A1CF (29974),A2M (2),A2ML1 (144568),A3GALT2 (127550),A4GALT (53947),A4GNT (51146),AAAS (8086),AACS (65985),...,ZWILCH (55055),ZWINT (11130),ZXDA (7789),ZXDB (158586),ZXDC (79364),ZYG11A (440590),ZYG11B (79699),ZYX (7791),ZZEF1 (23140),ZZZ3 (26009)
0,ACH-000004,0.181037,0.088391,-0.198152,-0.017571,0.042043,-0.192351,0.353854,-0.4422,0.290573,...,-0.12464,-0.46713,,,0.255396,0.23999,-0.408827,0.290927,0.224651,-0.135151
1,ACH-000005,-0.09025,0.239763,0.189079,0.160173,-0.190621,-0.335628,0.248029,-0.583917,-0.066306,...,-0.199184,-0.411868,-0.160226,-0.066143,0.223034,-0.076115,-0.102586,0.073836,0.026112,-0.250565
2,ACH-000007,0.071568,0.075337,-0.062467,0.157945,0.102321,0.141652,0.067589,-0.474084,-0.005594,...,-0.094879,-0.277592,-0.051759,0.119408,0.213447,-0.010327,-0.351529,0.085902,-0.392298,-0.440062


In [5]:
# Load sample ID
output_file = file_info["sample_id"]["output_file"]
sample_df = pd.read_csv(f"{output_dir}/{output_file}")

print(sample_df.shape)
sample_df.head(3)

(1804, 25)


Unnamed: 0,DepMap_ID,stripped_cell_line_name,CCLE_Name,Alias,COSMICID,sex,source,Achilles_n_replicates,cell_line_NNMD,culture_type,...,primary_disease,Subtype,age,Sanger_Model_ID,WTSI_Master_Cell_ID,depmap_public_comments,lineage,lineage_subtype,lineage_sub_subtype,lineage_molecular_subtype
0,ACH-000001,NIHOVCAR3,NIHOVCAR3_OVARY,OVCAR3,905933.0,Female,ATCC,,,,...,Ovarian Cancer,"Adenocarcinoma, high grade serous",60.0,SIDM00105,2201.0,,ovary,ovary_adenocarcinoma,high_grade_serous,
1,ACH-000002,HL60,HL60_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE,,905938.0,Female,ATCC,,,,...,Leukemia,"Acute Myelogenous Leukemia (AML), M3 (Promyelo...",35.0,SIDM00829,55.0,,blood,AML,M3,
2,ACH-000003,CACO2,CACO2_LARGE_INTESTINE,"CACO2, CaCo-2",,Male,ATCC,,,,...,Colon/Colorectal Cancer,Adenocarcinoma,,SIDM00891,,,colorectal,colorectal_adenocarcinoma,,
