# Chapter 1: Set up data

Before executing the chapters that follow, some input datasets need to be downloaded to the data directory from their sources, and some need to be modified slightly. To do this, please follow the steps below.

## Get data

### 1. Download the Cancer Cell Line Encyclopedia (CCLE) datasets from the CCLE web site

Watch [this quick video](https://www.youtube.com/watch?v=iTFm3n_JRw8&feature=youtu.be) or follow these steps:

1. Go [here](https://portals.broadinstitute.org/ccle_legacy/toa/termsOfAccess/20/) to begin making an account with CCLE.
2. Click "Accept" and enter your account information
3. Go the the "Browse" tab and select "Data"
4. At the bottom of the page beneath "Publication specific data files" click"Show available data"
5. Click "CCLE_MUT_EXPR_RPPA_OncoGPS.zip" to download
6. Unzip "CCLE_MUT_EXPR_RPPA_OncoGPS.zip" and move the 4 files inside to the onco-gps-paper-analysis/data directory.

### 2. Download the gene dependencies (RNAi) dataset from the Achilles project web site

Watch [this quick video](https://www.youtube.com/watch?v=wj0cJC9-XYw&feature=youtu.be) or follow these steps:

1. Go [here](https://portals.broadinstitute.org/achilles/users/sign_up) to make an account with Project Achilles.
2. Confirm your email with the confirmation link in an email you'll receive from Project Achilles.
3. Go [here](https://portals.broadinstitute.org/achilles/datasets/15/download) and click the "ExpandedGeneZSolsCleaned.csv" link to download the gene dependencies dataset
4. Move the downloaded "ExpandedGeneZSolsCleaned.csv" file to the onco-gps-paper-analysis/data directory

Now, run the cell below to rename the dataset.

### 3. Download the CTRP v2 datasets from the Broad Institute

1. Go here: ftp://caftpd.nci.nih.gov/pub/OCG-DCC/CTD2/Broad/CTRPv2.2_2015_pub_CancerDisc_5_1210/ 
2. Click "CTRPv2.2_2015_pub_CancerDisc_5_1210.zip" to download the CTRP dataset
3. Move the downloaded "CTRPv2.2_2015_pub_CancerDisc_5_1210.zip" file to the onco-gps-analysis-paper/data directory
4. Run the cell below to prepare the CTRP dataset for analysis.

## Prepare data

Run the cells below.

### 1. Set up notebook and import [CCAL](https://github.com/KwatME/ccal)

In [5]:
from notebook_environment import *

%load_ext autoreload
%autoreload 2
%matplotlib inline

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### 2. Unzip and prepare datasets

In [6]:
df = pd.read_csv('../data/ExpandedGeneZSolsCleaned.csv', index_col=0)
ccal.write_gct(df, '../data/achilles__gene_x_ccle_cellline.gct')

ccal.unzip('../data/CTRPv2.2_2015_pub_CancerDisc_5_1210.zip')
ccal.unzip('../data/gene_set__gene_set_x_ccle_cellline.gct.zip')

# Read compound data
auc = pd.read_table('../data/v22.data.auc_sensitivities.txt')
print(auc.shape)

cpd = pd.read_table('../data/v22.meta.per_compound.txt', index_col=0)
print(cpd.shape)

ccl = pd.read_table('../data/v22.meta.per_cell_line.txt', index_col=0)
print(ccl.shape)

# Make dict for faster ID-to-name look up
cpd_d = cpd['cpd_name'].to_dict()
ccl_d = ccl['ccl_name'].to_dict()

# Make empty compound-x-cellline matrix
compound_x_cellline = pd.DataFrame(
    index=sorted(set(cpd['cpd_name'])), columns=sorted(set(ccl['ccl_name'])))
print(compound_x_cellline.shape)

# Populate compound-x-cellline matrix
for i, (i_cpd, i_ccl, a) in auc.iterrows():

    # Get compound name
    cpd_n = cpd_d[i_cpd]

    # Get cellline name
    ccl_n = ccl_d[i_ccl]

    # Get current AUC
    a_ = compound_x_cellline.loc[cpd_n, ccl_n]

    # If the current AUC is not set, set with this AUC
    if pd.isnull(a_):
        compound_x_cellline.loc[cpd_n, ccl_n] = a

    # If this AUC is smaller than the current AUC, set with this AUC
    elif a < a_:

        print('Updating AUC of compound {} on cellline {}: {:.3f} ==> {:.3f}'.
              format(cpd_n, ccl_n, a_, a))

        compound_x_cellline.loc[cpd_n, ccl_n] = a

# Update cellline names to match CCLE cellline names
columns = list(compound_x_cellline.columns)

# Read CCLE cellline annotations
a = pd.read_table('../data/CCLE_sample_info_file_2012-10-18.txt', index_col=0)

# Get CCLE cellline names
for i, ccl_n in enumerate(compound_x_cellline.columns):

    matches = []

    for ccle_n in a.index:
        if ccl_n.lower() == ccle_n.lower().split('_')[0]:
            matches.append(ccle_n)

    if 0 == len(matches):
        print('0 match: {}; matching substring ...'.format(ccl_n))

        for ccle_n in a.index:

            if ccl_n.lower() in ccle_n.lower():

                print('\t{} ==> {}.'.format(ccl_n, ccle_n))
                matches.append(ccle_n)

    if 1 == len(matches):

        print('{} ==> {}.'.format(ccl_n, matches[0]))
        columns[i] = matches[0]

    else:
        print('1 < matches: {} ==> {}'.format(ccl_n, matches))

# Update with CCLE cellline names
compound_x_cellline.columns = columns

# Write .gct file
ccal.write_gct(compound_x_cellline,
               '../data/ctd2__compound_x_ccle_cellline.gct')

compound_x_cellline

(260496, 3)
(481, 9)
(664, 8)
(481, 645)
Updating AUC of compound ML311 on cellline CCFSTTG1: 15.000 ==> 14.352
Updating AUC of compound ML311 on cellline NCIH460: 13.004 ==> 12.245
Updating AUC of compound ML311 on cellline IGROV1: 13.147 ==> 12.092
Updating AUC of compound ML311 on cellline OAW28: 13.205 ==> 11.837
Updating AUC of compound ML311 on cellline ASPC1: 13.483 ==> 13.200
Updating AUC of compound ML311 on cellline CAL51: 12.028 ==> 11.655
Updating AUC of compound ML311 on cellline NCIH1869: 14.149 ==> 13.606
Updating AUC of compound ML311 on cellline LOXIMVI: 13.386 ==> 12.332
Updating AUC of compound ML311 on cellline MKN74: 13.969 ==> 12.234
Updating AUC of compound zebularine on cellline CCFSTTG1: 14.999 ==> 14.634
Updating AUC of compound zebularine on cellline A549: 14.507 ==> 13.312
Updating AUC of compound zebularine on cellline NCIH460: 14.581 ==> 12.742
Updating AUC of compound zebularine on cellline ASPC1: 15.699 ==> 13.944
Updating AUC of compound zebularine on c

Updating AUC of compound paclitaxel on cellline ASPC1: 9.750 ==> 9.291
Updating AUC of compound paclitaxel on cellline CAL51: 9.707 ==> 5.014
Updating AUC of compound paclitaxel on cellline A375: 11.572 ==> 7.637
Updating AUC of compound paclitaxel on cellline LOXIMVI: 14.997 ==> 7.762
Updating AUC of compound paclitaxel on cellline MKN74: 9.152 ==> 6.965
Updating AUC of compound hyperforin on cellline CCFSTTG1: 15.447 ==> 14.839
Updating AUC of compound hyperforin on cellline NCIH1299: 15.580 ==> 14.640
Updating AUC of compound hyperforin on cellline A375: 18.677 ==> 14.182
Updating AUC of compound hyperforin on cellline NCIH1869: 14.920 ==> 14.371
Updating AUC of compound hyperforin on cellline MKN74: 14.038 ==> 12.211
Updating AUC of compound brefeldin A on cellline CCFSTTG1: 14.679 ==> 12.948
Updating AUC of compound brefeldin A on cellline OAW28: 9.371 ==> 7.765
Updating AUC of compound brefeldin A on cellline SUIT2: 9.336 ==> 9.333
Updating AUC of compound brefeldin A on cellline

Updating AUC of compound PX-12 on cellline CCFSTTG1: 15.116 ==> 14.770
Updating AUC of compound PX-12 on cellline A549: 14.621 ==> 14.587
Updating AUC of compound PX-12 on cellline NCIH520: 14.270 ==> 13.087
Updating AUC of compound PX-12 on cellline IGROV1: 14.234 ==> 13.212
Updating AUC of compound PX-12 on cellline OAW28: 14.310 ==> 12.950
Updating AUC of compound PX-12 on cellline LOXIMVI: 14.278 ==> 13.498
Updating AUC of compound PX-12 on cellline KE39: 14.126 ==> 13.446
Updating AUC of compound PX-12 on cellline MKN74: 14.744 ==> 14.205
Updating AUC of compound PD318088 on cellline CCFSTTG1: 14.840 ==> 14.451
Updating AUC of compound PD318088 on cellline A549: 11.127 ==> 9.559
Updating AUC of compound PD318088 on cellline NCIH460: 14.742 ==> 14.265
Updating AUC of compound PD318088 on cellline IGROV1: 11.728 ==> 9.794
Updating AUC of compound PD318088 on cellline OAW28: 13.666 ==> 11.608
Updating AUC of compound PD318088 on cellline CAL51: 14.591 ==> 13.512
Updating AUC of compo

Updating AUC of compound fulvestrant on cellline SUIT2: 14.637 ==> 14.606
Updating AUC of compound BRD-A86708339 on cellline CCFSTTG1: 15.157 ==> 13.818
Updating AUC of compound BRD-A86708339 on cellline NCIH1869: 11.099 ==> 9.681
Updating AUC of compound BRD-A86708339 on cellline SKUT1: 10.673 ==> 8.740
Updating AUC of compound CID-5951923 on cellline NCIH1299: 15.854 ==> 15.721
Updating AUC of compound CID-5951923 on cellline NCIH460: 14.711 ==> 14.060
Updating AUC of compound CID-5951923 on cellline ASPC1: 14.094 ==> 13.453
Updating AUC of compound CID-5951923 on cellline CAL51: 16.163 ==> 15.276
Updating AUC of compound CID-5951923 on cellline NCIH1869: 14.176 ==> 13.611
Updating AUC of compound CID-5951923 on cellline MKN74: 14.649 ==> 14.581
Updating AUC of compound CID-5951923 on cellline SKUT1: 14.646 ==> 14.587
Updating AUC of compound FQI-1 on cellline NCIH460: 14.086 ==> 13.842
Updating AUC of compound FQI-1 on cellline IGROV1: 13.281 ==> 12.697
Updating AUC of compound FQI-

Updating AUC of compound QS-11 on cellline AGS: 16.464 ==> 14.627
Updating AUC of compound AT-406 on cellline CCFSTTG1: 14.762 ==> 14.484
Updating AUC of compound AT-406 on cellline A549: 14.299 ==> 13.299
Updating AUC of compound AT-406 on cellline NCIH520: 14.112 ==> 11.507
Updating AUC of compound AT-406 on cellline IGROV1: 16.035 ==> 14.843
Updating AUC of compound AT-406 on cellline LOXIMVI: 15.256 ==> 11.978
Updating AUC of compound SU11274 on cellline A549: 14.070 ==> 13.611
Updating AUC of compound SU11274 on cellline IGROV1: 14.074 ==> 12.604
Updating AUC of compound SU11274 on cellline OAW28: 14.266 ==> 14.230
Updating AUC of compound SU11274 on cellline ASPC1: 16.544 ==> 15.136
Updating AUC of compound SU11274 on cellline SUIT2: 15.025 ==> 14.042
Updating AUC of compound SU11274 on cellline CAL51: 13.336 ==> 12.741
Updating AUC of compound SU11274 on cellline A375: 15.000 ==> 13.861
Updating AUC of compound SU11274 on cellline NCIH1869: 15.000 ==> 14.302
Updating AUC of comp

Updating AUC of compound GSK1059615 on cellline OAW28: 11.180 ==> 10.260
Updating AUC of compound GSK1059615 on cellline NCIH1869: 11.295 ==> 10.489
Updating AUC of compound narciclasine on cellline CCFSTTG1: 11.376 ==> 10.191
Updating AUC of compound narciclasine on cellline NCIH460: 9.691 ==> 9.513
Updating AUC of compound narciclasine on cellline NCIH520: 6.082 ==> 5.831
Updating AUC of compound narciclasine on cellline IGROV1: 8.225 ==> 7.324
Updating AUC of compound narciclasine on cellline CAL51: 6.596 ==> 6.438
Updating AUC of compound narciclasine on cellline NCIH1869: 10.225 ==> 9.034
Updating AUC of compound narciclasine on cellline KE39: 11.160 ==> 9.613
Updating AUC of compound narciclasine on cellline MKN74: 10.097 ==> 8.734
Updating AUC of compound AM-580 on cellline CCFSTTG1: 14.905 ==> 14.807
Updating AUC of compound AM-580 on cellline NCIH460: 15.317 ==> 15.175
Updating AUC of compound AM-580 on cellline CAL51: 15.659 ==> 14.019
Updating AUC of compound AM-580 on celll

Updating AUC of compound BRD-K11533227 on cellline NCIH460: 14.256 ==> 13.092
Updating AUC of compound BRD-K11533227 on cellline OAW28: 14.788 ==> 14.615
Updating AUC of compound BRD-K11533227 on cellline ASPC1: 14.910 ==> 13.576
Updating AUC of compound BRD-K11533227 on cellline NCIH1869: 14.937 ==> 14.459
Updating AUC of compound BRD-K11533227 on cellline DU145: 14.761 ==> 14.473
Updating AUC of compound BRD-K11533227 on cellline AGS: 17.971 ==> 14.396
Updating AUC of compound BRD-K11533227 on cellline KE39: 14.679 ==> 14.400
Updating AUC of compound BRD-K11533227 on cellline MKN74: 16.101 ==> 13.012
Updating AUC of compound PF-184 on cellline CCFSTTG1: 14.584 ==> 14.156
Updating AUC of compound PF-184 on cellline A549: 13.250 ==> 12.833
Updating AUC of compound PF-184 on cellline NCIH460: 12.950 ==> 11.792
Updating AUC of compound PF-184 on cellline NCIH520: 13.162 ==> 13.160
Updating AUC of compound PF-184 on cellline IGROV1: 12.988 ==> 12.444
Updating AUC of compound PF-184 on cel

Updating AUC of compound PRIMA-1 on cellline IGROV1: 14.590 ==> 12.537
Updating AUC of compound PRIMA-1 on cellline OAW28: 14.626 ==> 11.977
Updating AUC of compound PRIMA-1 on cellline NCIH1869: 14.266 ==> 14.044
Updating AUC of compound PRIMA-1 on cellline LOXIMVI: 14.784 ==> 12.017
Updating AUC of compound PRIMA-1 on cellline KE39: 14.074 ==> 12.750
Updating AUC of compound PRIMA-1 on cellline MKN74: 14.829 ==> 13.200
Updating AUC of compound phloretin on cellline NCIH520: 14.130 ==> 13.370
Updating AUC of compound phloretin on cellline IGROV1: 14.381 ==> 13.284
Updating AUC of compound phloretin on cellline OAW28: 14.484 ==> 12.067
Updating AUC of compound phloretin on cellline ASPC1: 14.441 ==> 14.335
Updating AUC of compound phloretin on cellline SUIT2: 14.689 ==> 14.499
Updating AUC of compound phloretin on cellline NCIH1869: 14.324 ==> 14.287
Updating AUC of compound phloretin on cellline LOXIMVI: 14.486 ==> 14.416
Updating AUC of compound phloretin on cellline DU145: 13.250 ==

Updating AUC of compound BRD-K19103580 on cellline A549: 14.641 ==> 13.837
Updating AUC of compound BRD-K19103580 on cellline IGROV1: 15.156 ==> 14.777
Updating AUC of compound BRD-K19103580 on cellline OAW28: 14.439 ==> 13.782
Updating AUC of compound BRD-K19103580 on cellline ASPC1: 14.525 ==> 13.854
Updating AUC of compound BRD-K19103580 on cellline CAL51: 15.265 ==> 14.880
Updating AUC of compound BRD-K19103580 on cellline SKUT1: 14.630 ==> 14.587
Updating AUC of compound gossypol on cellline A549: 14.065 ==> 13.844
Updating AUC of compound gossypol on cellline IGROV1: 13.789 ==> 12.809
Updating AUC of compound gossypol on cellline OAW28: 13.450 ==> 12.659
Updating AUC of compound gossypol on cellline ASPC1: 12.778 ==> 12.711
Updating AUC of compound gossypol on cellline SUIT2: 14.059 ==> 13.059
Updating AUC of compound prochlorperazine on cellline CCFSTTG1: 16.898 ==> 14.710
Updating AUC of compound prochlorperazine on cellline A549: 14.155 ==> 13.523
Updating AUC of compound proc

Updating AUC of compound BRD8899 on cellline NCIH460: 15.235 ==> 15.102
Updating AUC of compound BRD8899 on cellline IGROV1: 15.000 ==> 14.562
Updating AUC of compound BRD8899 on cellline ASPC1: 14.741 ==> 14.209
Updating AUC of compound BRD8899 on cellline CAL51: 15.858 ==> 14.931
Updating AUC of compound CR-1-31B on cellline CCFSTTG1: 9.680 ==> 7.883
Updating AUC of compound CR-1-31B on cellline A549: 8.079 ==> 7.582
Updating AUC of compound CR-1-31B on cellline NCIH460: 11.246 ==> 10.306
Updating AUC of compound CR-1-31B on cellline IGROV1: 6.821 ==> 6.532
Updating AUC of compound CR-1-31B on cellline A375: 8.367 ==> 5.731
Updating AUC of compound CR-1-31B on cellline KE39: 10.317 ==> 9.315
Updating AUC of compound CR-1-31B on cellline SKUT1: 9.147 ==> 7.994
Updating AUC of compound 1S,3R-RSL-3 on cellline CCFSTTG1: 10.675 ==> 10.527
Updating AUC of compound 1S,3R-RSL-3 on cellline NCIH460: 14.746 ==> 14.022
Updating AUC of compound 1S,3R-RSL-3 on cellline NCIH520: 11.586 ==> 11.141

Updating AUC of compound axitinib on cellline NCIH460: 14.505 ==> 13.773
Updating AUC of compound axitinib on cellline IGROV1: 13.488 ==> 9.813
Updating AUC of compound axitinib on cellline OAW28: 14.022 ==> 13.803
Updating AUC of compound axitinib on cellline ASPC1: 13.210 ==> 11.070
Updating AUC of compound axitinib on cellline SUIT2: 14.451 ==> 14.413
Updating AUC of compound axitinib on cellline CAL51: 12.932 ==> 11.579
Updating AUC of compound axitinib on cellline NCIH1869: 13.803 ==> 11.917
Updating AUC of compound axitinib on cellline LOXIMVI: 12.633 ==> 12.436
Updating AUC of compound axitinib on cellline MKN74: 14.180 ==> 13.926
Updating AUC of compound axitinib on cellline SKUT1: 12.734 ==> 12.614
Updating AUC of compound KX2-391 on cellline CCFSTTG1: 11.464 ==> 10.585
Updating AUC of compound KX2-391 on cellline A549: 9.572 ==> 9.382
Updating AUC of compound KX2-391 on cellline NCIH460: 11.476 ==> 11.256
Updating AUC of compound KX2-391 on cellline IGROV1: 9.944 ==> 7.458
Up

Updating AUC of compound FQI-2 on cellline NCIH460: 14.187 ==> 13.880
Updating AUC of compound FQI-2 on cellline NCIH520: 9.701 ==> 9.589
Updating AUC of compound FQI-2 on cellline IGROV1: 12.845 ==> 12.033
Updating AUC of compound FQI-2 on cellline OAW28: 14.269 ==> 11.028
Updating AUC of compound FQI-2 on cellline ASPC1: 14.300 ==> 13.169
Updating AUC of compound FQI-2 on cellline CAL51: 12.482 ==> 10.529
Updating AUC of compound FQI-2 on cellline A375: 15.084 ==> 12.711
Updating AUC of compound FQI-2 on cellline NCIH1869: 16.523 ==> 14.181
Updating AUC of compound FQI-2 on cellline LOXIMVI: 12.673 ==> 12.185
Updating AUC of compound FQI-2 on cellline MKN74: 13.193 ==> 12.862
Updating AUC of compound FQI-2 on cellline SKUT1: 13.039 ==> 13.036
Updating AUC of compound dacarbazine on cellline A549: 14.999 ==> 14.190
Updating AUC of compound dacarbazine on cellline IGROV1: 14.568 ==> 14.467
Updating AUC of compound dacarbazine on cellline OAW28: 14.852 ==> 14.799
Updating AUC of compoun

Updating AUC of compound LY-2183240 on cellline CCFSTTG1: 13.857 ==> 13.413
Updating AUC of compound LY-2183240 on cellline A549: 13.668 ==> 13.593
Updating AUC of compound LY-2183240 on cellline IGROV1: 13.036 ==> 12.149
Updating AUC of compound LY-2183240 on cellline OAW28: 13.912 ==> 12.196
Updating AUC of compound LY-2183240 on cellline ASPC1: 13.298 ==> 11.631
Updating AUC of compound LY-2183240 on cellline CAL51: 12.784 ==> 12.229
Updating AUC of compound LY-2183240 on cellline NCIH1869: 14.427 ==> 13.099
Updating AUC of compound LY-2183240 on cellline LOXIMVI: 12.318 ==> 11.907
Updating AUC of compound LY-2183240 on cellline KE39: 13.605 ==> 13.400
Updating AUC of compound LY-2183240 on cellline MKN74: 12.516 ==> 12.359
Updating AUC of compound nakiterpiosin on cellline CCFSTTG1: 13.078 ==> 12.859
Updating AUC of compound nakiterpiosin on cellline A549: 11.540 ==> 11.170
Updating AUC of compound nakiterpiosin on cellline NCIH460: 12.401 ==> 11.882
Updating AUC of compound nakite

Updating AUC of compound betulinic acid on cellline CCFSTTG1: 17.579 ==> 17.180
Updating AUC of compound betulinic acid on cellline IGROV1: 15.498 ==> 15.046
Updating AUC of compound betulinic acid on cellline OAW28: 17.411 ==> 16.151
Updating AUC of compound betulinic acid on cellline ASPC1: 18.348 ==> 14.987
Updating AUC of compound betulinic acid on cellline CAL51: 15.851 ==> 15.000
Updating AUC of compound betulinic acid on cellline NCIH1869: 17.448 ==> 15.316
Updating AUC of compound betulinic acid on cellline LOXIMVI: 16.965 ==> 15.000
Updating AUC of compound betulinic acid on cellline AGS: 15.661 ==> 15.000
Updating AUC of compound betulinic acid on cellline SKUT1: 15.134 ==> 14.627
Updating AUC of compound BRD-K45681478 on cellline CCFSTTG1: 12.887 ==> 12.188
Updating AUC of compound BRD-K45681478 on cellline NCIH460: 15.022 ==> 13.199
Updating AUC of compound BRD-K45681478 on cellline IGROV1: 13.982 ==> 13.397
Updating AUC of compound BRD-K45681478 on cellline OAW28: 14.460 =

Updating AUC of compound NVP-TAE684 on cellline NCIH1299: 14.224 ==> 12.111
Updating AUC of compound NVP-TAE684 on cellline A549: 9.980 ==> 9.856
Updating AUC of compound NVP-TAE684 on cellline NCIH460: 13.012 ==> 12.915
Updating AUC of compound NVP-TAE684 on cellline IGROV1: 10.756 ==> 8.890
Updating AUC of compound NVP-TAE684 on cellline ASPC1: 9.347 ==> 9.246
Updating AUC of compound NVP-TAE684 on cellline CAL51: 9.460 ==> 9.067
Updating AUC of compound NVP-TAE684 on cellline NCIH1869: 11.458 ==> 10.463
Updating AUC of compound NVP-TAE684 on cellline LOXIMVI: 10.757 ==> 9.905
Updating AUC of compound NVP-TAE684 on cellline DU145: 11.136 ==> 11.115
Updating AUC of compound NVP-TAE684 on cellline SKUT1: 12.211 ==> 11.484
Updating AUC of compound necrostatin-7 on cellline CCFSTTG1: 15.000 ==> 14.423
Updating AUC of compound necrostatin-7 on cellline NCIH1299: 16.675 ==> 14.856
Updating AUC of compound necrostatin-7 on cellline OAW28: 14.639 ==> 14.151
Updating AUC of compound necrostat

Updating AUC of compound tivozanib on cellline A549: 13.400 ==> 12.251
Updating AUC of compound tivozanib on cellline IGROV1: 13.538 ==> 11.249
Updating AUC of compound tivozanib on cellline SUIT2: 14.491 ==> 13.358
Updating AUC of compound tivozanib on cellline A375: 14.066 ==> 13.936
Updating AUC of compound tivozanib on cellline SKUT1: 13.375 ==> 12.364
Updating AUC of compound SRT-1720 on cellline NCIH1299: 14.815 ==> 14.525
Updating AUC of compound SRT-1720 on cellline A549: 14.347 ==> 13.353
Updating AUC of compound SRT-1720 on cellline NCIH460: 14.790 ==> 14.373
Updating AUC of compound SRT-1720 on cellline IGROV1: 15.000 ==> 13.623
Updating AUC of compound SRT-1720 on cellline ASPC1: 16.147 ==> 15.431
Updating AUC of compound SRT-1720 on cellline NCIH1869: 16.662 ==> 14.686
Updating AUC of compound SRT-1720 on cellline LOXIMVI: 15.902 ==> 13.913
Updating AUC of compound SRT-1720 on cellline DU145: 15.489 ==> 15.324
Updating AUC of compound SRT-1720 on cellline AGS: 15.453 ==> 1

Updating AUC of compound ABT-737 on cellline CCFSTTG1: 13.631 ==> 13.541
Updating AUC of compound ABT-737 on cellline NCIH520: 13.848 ==> 13.842
Updating AUC of compound ABT-737 on cellline SUIT2: 14.713 ==> 14.611
Updating AUC of compound ABT-737 on cellline CAL51: 14.702 ==> 14.053
Updating AUC of compound ABT-737 on cellline NCIH1869: 13.976 ==> 12.525
Updating AUC of compound ABT-737 on cellline DU145: 14.341 ==> 13.998
Updating AUC of compound ABT-737 on cellline SKUT1: 14.712 ==> 14.687
Updating AUC of compound PLX-4032 on cellline OAW28: 15.600 ==> 15.000
Updating AUC of compound PLX-4032 on cellline ASPC1: 14.888 ==> 11.685
Updating AUC of compound PLX-4032 on cellline CAL51: 15.231 ==> 15.104
Updating AUC of compound PLX-4032 on cellline A375: 15.797 ==> 11.367
Updating AUC of compound PLX-4032 on cellline NCIH1869: 15.511 ==> 13.489
Updating AUC of compound PLX-4032 on cellline MKN74: 15.763 ==> 14.247
Updating AUC of compound PLX-4032 on cellline SKUT1: 16.519 ==> 14.435
Upd

Updating AUC of compound SB-225002 on cellline CCFSTTG1: 13.806 ==> 13.409
Updating AUC of compound SB-225002 on cellline A549: 13.564 ==> 12.784
Updating AUC of compound SB-225002 on cellline NCIH460: 14.337 ==> 13.268
Updating AUC of compound SB-225002 on cellline OAW28: 13.535 ==> 12.406
Updating AUC of compound SB-225002 on cellline ASPC1: 14.570 ==> 14.354
Updating AUC of compound SB-225002 on cellline SUIT2: 15.000 ==> 14.639
Updating AUC of compound SB-225002 on cellline CAL51: 12.285 ==> 12.236
Updating AUC of compound SB-225002 on cellline A375: 13.746 ==> 12.704
Updating AUC of compound SB-225002 on cellline NCIH1869: 13.594 ==> 12.647
Updating AUC of compound SB-225002 on cellline MKN74: 12.874 ==> 12.438
Updating AUC of compound CAY10594 on cellline CCFSTTG1: 14.873 ==> 14.773
Updating AUC of compound CAY10594 on cellline NCIH1299: 15.132 ==> 14.752
Updating AUC of compound CAY10594 on cellline NCIH460: 14.140 ==> 14.108
Updating AUC of compound CAY10594 on cellline CAL51: 

Updating AUC of compound BRD-K64610608 on cellline NCIH1299: 15.986 ==> 15.796
Updating AUC of compound BRD-K64610608 on cellline NCIH460: 14.694 ==> 13.714
Updating AUC of compound BRD-K64610608 on cellline SUIT2: 15.729 ==> 15.000
Updating AUC of compound BRD-K64610608 on cellline CAL51: 15.293 ==> 14.607
Updating AUC of compound BRD-K64610608 on cellline A375: 17.416 ==> 15.628
Updating AUC of compound BRD-K64610608 on cellline NCIH1869: 14.856 ==> 14.221
Updating AUC of compound BRD-K64610608 on cellline DU145: 14.747 ==> 14.636
Updating AUC of compound BRD-K64610608 on cellline MKN74: 15.365 ==> 13.244
Updating AUC of compound BRD-K64610608 on cellline SKUT1: 15.675 ==> 14.915
Updating AUC of compound tretinoin on cellline CCFSTTG1: 14.810 ==> 14.611
Updating AUC of compound tretinoin on cellline A549: 14.541 ==> 13.760
Updating AUC of compound tretinoin on cellline NCIH460: 15.377 ==> 14.492
Updating AUC of compound tretinoin on cellline NCIH520: 14.399 ==> 13.283
Updating AUC of

Updating AUC of compound GDC-0879 on cellline A549: 14.903 ==> 14.808
Updating AUC of compound GDC-0879 on cellline A549: 14.808 ==> 14.703
Updating AUC of compound GDC-0879 on cellline NCIH520: 14.907 ==> 14.568
Updating AUC of compound GDC-0879 on cellline IGROV1: 14.971 ==> 14.429
Updating AUC of compound GDC-0879 on cellline OAW28: 18.047 ==> 15.148
Updating AUC of compound GDC-0879 on cellline ASPC1: 14.283 ==> 14.115
Updating AUC of compound GDC-0879 on cellline CAL51: 16.117 ==> 14.775
Updating AUC of compound GDC-0879 on cellline A375: 14.189 ==> 10.612
Updating AUC of compound GDC-0879 on cellline NCIH1869: 16.786 ==> 15.382
Updating AUC of compound GDC-0879 on cellline LOXIMVI: 13.283 ==> 12.617
Updating AUC of compound pevonedistat on cellline CCFSTTG1: 17.900 ==> 16.551
Updating AUC of compound pevonedistat on cellline NCIH460: 12.669 ==> 11.531
Updating AUC of compound pevonedistat on cellline OAW28: 17.885 ==> 13.992
Updating AUC of compound pevonedistat on cellline ASPC1

Updating AUC of compound Ch-55 on cellline CCFSTTG1: 14.046 ==> 14.009
Updating AUC of compound Ch-55 on cellline NCIH460: 14.986 ==> 12.827
Updating AUC of compound Ch-55 on cellline IGROV1: 13.911 ==> 13.011
Updating AUC of compound Ch-55 on cellline OAW28: 15.580 ==> 13.226
Updating AUC of compound Ch-55 on cellline CAL51: 15.000 ==> 14.716
Updating AUC of compound Ch-55 on cellline LOXIMVI: 14.685 ==> 13.801
Updating AUC of compound N9-isopropylolomoucine on cellline CCFSTTG1: 14.220 ==> 14.091
Updating AUC of compound N9-isopropylolomoucine on cellline A549: 13.162 ==> 12.717
Updating AUC of compound N9-isopropylolomoucine on cellline NCIH460: 13.885 ==> 13.494
Updating AUC of compound N9-isopropylolomoucine on cellline IGROV1: 13.218 ==> 12.589
Updating AUC of compound N9-isopropylolomoucine on cellline OAW28: 13.749 ==> 12.952
Updating AUC of compound N9-isopropylolomoucine on cellline ASPC1: 14.050 ==> 13.026
Updating AUC of compound N9-isopropylolomoucine on cellline LOXIMVI: 

Updating AUC of compound omacetaxine mepesuccinate on cellline CCFSTTG1: 8.050 ==> 7.203
Updating AUC of compound omacetaxine mepesuccinate on cellline IGROV1: 7.993 ==> 6.960
Updating AUC of compound omacetaxine mepesuccinate on cellline LOXIMVI: 4.516 ==> 4.307
Updating AUC of compound YM-155 on cellline CCFSTTG1: 13.458 ==> 12.854
Updating AUC of compound YM-155 on cellline OAW28: 11.017 ==> 10.549
Updating AUC of compound YM-155 on cellline ASPC1: 10.619 ==> 10.050
Updating AUC of compound YM-155 on cellline CAL51: 9.685 ==> 8.967
Updating AUC of compound YM-155 on cellline A375: 11.185 ==> 11.068
Updating AUC of compound YM-155 on cellline NCIH1869: 8.413 ==> 8.158
Updating AUC of compound YM-155 on cellline KE39: 11.139 ==> 11.090
Updating AUC of compound A-804598 on cellline CCFSTTG1: 14.726 ==> 14.623
Updating AUC of compound A-804598 on cellline ASPC1: 14.541 ==> 14.212
Updating AUC of compound A-804598 on cellline LOXIMVI: 15.034 ==> 13.931
Updating AUC of compound EX-527 on 

Updating AUC of compound PF-543 on cellline ASPC1: 14.321 ==> 13.833
Updating AUC of compound PF-543 on cellline CAL51: 14.279 ==> 14.207
Updating AUC of compound PF-543 on cellline A375: 14.863 ==> 13.813
Updating AUC of compound PF-543 on cellline NCIH1869: 16.917 ==> 14.277
Updating AUC of compound PF-543 on cellline AGS: 14.288 ==> 14.148
Updating AUC of compound PF-543 on cellline MKN74: 14.946 ==> 13.749
Updating AUC of compound PF-543 on cellline SKUT1: 14.489 ==> 14.427
Updating AUC of compound BRD-K80183349 on cellline CCFSTTG1: 15.274 ==> 14.034
Updating AUC of compound BRD-K80183349 on cellline A549: 14.633 ==> 14.428
Updating AUC of compound BRD-K80183349 on cellline A549: 14.428 ==> 13.959
Updating AUC of compound BRD-K80183349 on cellline NCIH460: 15.290 ==> 14.724
Updating AUC of compound BRD-K80183349 on cellline IGROV1: 13.950 ==> 12.723
Updating AUC of compound BRD-K80183349 on cellline OAW28: 13.872 ==> 13.454
Updating AUC of compound BRD-K80183349 on cellline SUIT2:

Updating AUC of compound PL-DI on cellline NCIH1869: 11.714 ==> 11.696
Updating AUC of compound PL-DI on cellline LOXIMVI: 10.673 ==> 10.006
Updating AUC of compound PL-DI on cellline MKN74: 11.367 ==> 10.876
Updating AUC of compound skepinone-L on cellline CCFSTTG1: 14.807 ==> 14.362
Updating AUC of compound skepinone-L on cellline ASPC1: 14.806 ==> 14.420
Updating AUC of compound skepinone-L on cellline NCIH1869: 14.565 ==> 14.546
Updating AUC of compound skepinone-L on cellline SKUT1: 13.820 ==> 13.421
Updating AUC of compound nelarabine on cellline CCFSTTG1: 15.527 ==> 14.890
Updating AUC of compound nelarabine on cellline NCIH1299: 15.544 ==> 15.248
Updating AUC of compound nelarabine on cellline A549: 14.374 ==> 13.655
Updating AUC of compound nelarabine on cellline NCIH460: 15.011 ==> 13.261
Updating AUC of compound nelarabine on cellline ASPC1: 15.809 ==> 14.176
Updating AUC of compound nelarabine on cellline CAL51: 15.246 ==> 14.902
Updating AUC of compound nelarabine on celll

Updating AUC of compound momelotinib on cellline LOXIMVI: 12.140 ==> 11.913
Updating AUC of compound momelotinib on cellline DU145: 12.750 ==> 12.374
Updating AUC of compound momelotinib on cellline MKN74: 12.496 ==> 11.711
Updating AUC of compound momelotinib on cellline SKUT1: 13.240 ==> 13.129
Updating AUC of compound istradefylline on cellline CCFSTTG1: 16.305 ==> 14.732
Updating AUC of compound istradefylline on cellline ASPC1: 15.742 ==> 14.283
Updating AUC of compound istradefylline on cellline NCIH1869: 14.481 ==> 14.275
Updating AUC of compound fingolimod on cellline NCIH460: 12.981 ==> 12.836
Updating AUC of compound fingolimod on cellline IGROV1: 12.607 ==> 11.923
Updating AUC of compound fingolimod on cellline NCIH1869: 11.965 ==> 11.941
Updating AUC of compound fingolimod on cellline SKUT1: 13.155 ==> 12.632
Updating AUC of compound bortezomib on cellline CCFSTTG1: 13.104 ==> 12.744
Updating AUC of compound bortezomib on cellline NCIH1299: 13.193 ==> 11.994
Updating AUC of

Updating AUC of compound GSK461364 on cellline A549: 6.725 ==> 4.181
Updating AUC of compound GSK461364 on cellline NCIH460: 9.742 ==> 8.819
Updating AUC of compound GSK461364 on cellline IGROV1: 7.396 ==> 6.293
Updating AUC of compound GSK461364 on cellline OAW28: 12.427 ==> 8.454
Updating AUC of compound GSK461364 on cellline CAL51: 5.836 ==> 3.334
Updating AUC of compound GSK461364 on cellline A375: 11.852 ==> 5.731
Updating AUC of compound GSK461364 on cellline NCIH1869: 11.197 ==> 11.101
Updating AUC of compound GSK461364 on cellline LOXIMVI: 6.277 ==> 5.862
Updating AUC of compound GSK461364 on cellline KE39: 12.227 ==> 10.315
Updating AUC of compound GSK461364 on cellline MKN74: 9.813 ==> 6.785
Updating AUC of compound bexarotene on cellline CCFSTTG1: 15.203 ==> 15.145
Updating AUC of compound bexarotene on cellline A549: 14.992 ==> 12.288
Updating AUC of compound bexarotene on cellline NCIH1299: 16.665 ==> 15.055
Updating AUC of compound bexarotene on cellline OAW28: 16.266 ==>

Updating AUC of compound VAF-347 on cellline CCFSTTG1: 15.000 ==> 14.893
Updating AUC of compound VAF-347 on cellline NCIH1869: 14.879 ==> 13.386
Updating AUC of compound VAF-347 on cellline LOXIMVI: 15.065 ==> 14.442
Updating AUC of compound VAF-347 on cellline MKN74: 15.448 ==> 14.062
Updating AUC of compound VAF-347 on cellline SKUT1: 15.000 ==> 14.848
Updating AUC of compound pifithrin-mu on cellline A549: 15.495 ==> 14.473
Updating AUC of compound pifithrin-mu on cellline NCIH460: 15.344 ==> 14.878
Updating AUC of compound pifithrin-mu on cellline NCIH520: 14.758 ==> 12.998
Updating AUC of compound pifithrin-mu on cellline IGROV1: 16.408 ==> 14.457
Updating AUC of compound pifithrin-mu on cellline OAW28: 15.486 ==> 13.497
Updating AUC of compound pifithrin-mu on cellline ASPC1: 15.918 ==> 14.994
Updating AUC of compound pifithrin-mu on cellline NCIH1869: 14.523 ==> 14.460
Updating AUC of compound pifithrin-mu on cellline LOXIMVI: 14.868 ==> 14.453
Updating AUC of compound pifithri

Unnamed: 0,22RV1_PROSTATE,2313287_STOMACH,253J_URINARY_TRACT,253JBV_URINARY_TRACT,42MGBA_CENTRAL_NERVOUS_SYSTEM,5637_URINARY_TRACT,639V_URINARY_TRACT,647V_URINARY_TRACT,769P_KIDNEY,786O_KIDNEY,...,WM983B_SKIN,YAPC_PANCREAS,YD10B_UPPER_AERODIGESTIVE_TRACT,YD15_SALIVARY_GLAND,YD38_UPPER_AERODIGESTIVE_TRACT,YD8_UPPER_AERODIGESTIVE_TRACT,YH13_CENTRAL_NERVOUS_SYSTEM,YKG1_CENTRAL_NERVOUS_SYSTEM,ZR751_BREAST,ZR7530_BREAST
16-beta-bromoandrosterone,14.565,12.668,14.839,14.586,14.917,14.14,15.446,14.585,14.252,14.749,...,14.607,15.524,16.285,16.173,15,14.468,15.064,,15.597,13.764
"1S,3R-RSL-3",9.2997,10.231,7.5951,15,6.2174,7.7638,13.301,8.0628,7.5991,5.9814,...,11.351,13.108,11.267,11.708,9.6433,5.7127,7.0035,10.699,12.43,6.2003
3-Cl-AHPC,10.259,10.418,11.077,10.379,8.9201,9.3751,10.052,9.3562,9.5917,10.551,...,11.452,12.207,9.949,9.5841,11.884,12.84,11.865,13.934,14.273,11.769
968,14.376,15.11,,13.36,,14.045,15.448,,14.915,,...,,,16.233,15.618,15.383,,,,,
A-804598,,,,,14.147,,,15.312,,14.498,...,14.278,15,,,,14.548,14.678,,15,14.074
AA-COCF3,12.043,9.3923,12.875,12.806,14.543,12.344,12.671,15.486,13.28,14.793,...,13.122,15.286,16.419,15,13.769,12.527,15.328,10.381,15,11.939
ABT-199,,,,,15.172,,,15.126,,14.608,...,13.041,15.729,,,,14.539,15.654,15.115,15,14.683
ABT-737,12.965,14.623,13.252,13.602,14.98,12.746,15.739,,13.42,12.96,...,13.442,15,14.638,14.698,14.806,,,14.809,14.476,14.405
AC55649,13.845,13.779,14.604,14.805,,13.304,15.679,15,14.751,14.746,...,14.228,15,13.788,15.787,15,14.144,14.474,14.691,15.366,14.76
AGK-2,15.266,16.757,15.176,17.642,10.679,15,14.703,15,15.958,14.633,...,13.522,15.572,13.647,14.184,15.121,14.326,16.055,16.273,15,14.676


### 3. Check that all datasets exist

If you get an error running the cell below, get the dataset the error says you're missing, and run it again.

In [7]:
for fn in [
        'gene_x_kras_isogenic_and_imortalized_celllines.gct',
        'mutation__gene_x_ccle_cellline.gct',
        'rpkm__gene_x_ccle_cellline.gct',
        'gene_set__gene_set_x_ccle_cellline.gct',
        'regulator__gene_set_x_ccle_cellline.gct',
        'rppa__protein_x_ccle_cellline.gct',
        'achilles__gene_x_ccle_cellline.gct',
        'ctd2__compound_x_ccle_cellline.gct',
        'annotation__feature_x_ccle_cellline.gct',
]:
    assert fn in os.listdir('../data/'), 'Missing {}!'.format(fn)

### 4. Make the CCLE data object

In [8]:
# Make the CCLE data object used in coming chapters.

ccle = {
    'Mutation': {
        'df': ccal.read_gct('../data/mutation__gene_x_ccle_cellline.gct'),
        'emphasis': 'high',
        'data_type': 'binary'
    },
    'Gene Expression': {
        'df': ccal.read_gct('../data/rpkm__gene_x_ccle_cellline.gct'),
        'emphasis': 'high',
        'data_type': 'continuous'
    },
    'Gene Set': {
        'df': ccal.read_gct('../data/gene_set__gene_set_x_ccle_cellline.gct'),
        'emphasis': 'high',
        'data_type': 'continuous'
    },
    'Regulator Gene Set': {
        'df': ccal.read_gct('../data/regulator__gene_set_x_ccle_cellline.gct'),
        'emphasis': 'high',
        'data_type': 'continuous'
    },
    'Protein Expression': {
        'df': ccal.read_gct('../data/rppa__protein_x_ccle_cellline.gct'),
        'emphasis': 'high',
        'data_type': 'continuous'
    },
    'Gene Dependency (Achilles)': {
        'df': ccal.read_gct('../data/achilles__gene_x_ccle_cellline.gct'),
        'emphasis': 'low',
        'data_type': 'continuous'
    },
    'Drug Sensitivity (CTD^2)': {
        'df': ccal.read_gct('../data/ctd2__compound_x_ccle_cellline.gct'),
        'emphasis': 'low',
        'data_type': 'continuous'
    },
    'Primary Site': {
        'df':
        ccal.make_membership_df_from_categorical_series(
            ccal.read_gct('../data/annotation__feature_x_ccle_cellline.gct')
            .loc['Site Primary']),
        'emphasis':
        'high',
        'data_type':
        'binary'
    }
}

with gzip.open('../data/ccle.pickle.gz', 'wb') as f:

    pickle.dump(ccle, f)

### Go to the [next chapter(2)](2 Generate oncogenic-activation signature.ipynb)