<a href="https://colab.research.google.com/github/cmap/lincs-workshop-2020/blob/main/notebooks/data_access/cmapBQ_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# cmapBQ Tutorial

Tutorial notebook for the `cmapBQ` package. `cmapBQ` allows for targeted retrieval of relevant gene expression data from the resources provided by The Broad Institute and LINCS Project


Documentation available on [Read The Docs](https://cmapbq.readthedocs.io)

Source code available on [Github](https://github.com/cmap/cmapBQ/)

## Setup

### Package installation

The cmapBQ package is available from `pip` and can be installed using the command below. Documentation is available on [Read The Docs](https://cmapbq.readthedocs.io/en/latest/)

In [1]:
!pip -q install --upgrade cmapBQ

[?25l[K     |██▏                             | 10kB 23.3MB/s eta 0:00:01[K     |████▍                           | 20kB 26.6MB/s eta 0:00:01[K     |██████▌                         | 30kB 26.8MB/s eta 0:00:01[K     |████████▊                       | 40kB 13.8MB/s eta 0:00:01[K     |███████████                     | 51kB 11.6MB/s eta 0:00:01[K     |█████████████                   | 61kB 10.4MB/s eta 0:00:01[K     |███████████████▎                | 71kB 5.7MB/s eta 0:00:01[K     |█████████████████▌              | 81kB 6.2MB/s eta 0:00:01[K     |███████████████████▋            | 92kB 6.2MB/s eta 0:00:01[K     |█████████████████████▉          | 102kB 6.5MB/s eta 0:00:01[K     |████████████████████████        | 112kB 6.5MB/s eta 0:00:01[K     |██████████████████████████▏     | 122kB 6.5MB/s eta 0:00:01[K     |████████████████████████████▍   | 133kB 6.5MB/s eta 0:00:01[K     |██████████████████████████████▌ | 143kB 6.5MB/s eta 0:00:01[K     |█████████████████████

### Standard Imports

In [2]:
import os
import pandas as pd
import numpy as np
import seaborn as sns
import requests

import matplotlib.pyplot as plt

#%load_ext google.colab.data_table #For Google Colab

### Credentials Setup and Package imports

Getting demo credentials from S3. To access BigQuery, a service account JSON credentials file must be obtained. Running the `cmap.config.setup_credentials(credentials_path)` function will point the toolkit to the credentials connected to your Google Account. 

More information about service accounts are available here: [Getting started with authentication](https://cloud.google.com/docs/authentication/getting-started)

In [3]:

import requests

# URL with credentials
url = ('https://s3.amazonaws.com/data.clue.io/api/bq_creds/BQ-demo-credentials.json')

response = requests.get(url)
credentials_filepath='/content/BQ-demo-credentials.json'

with open(credentials_filepath, 'w') as f:
  f.write(response.text)



Pointing cmapBQ to credentials file

In [4]:
import cmapBQ.query as cmap_query
import cmapBQ.config as cmap_config

#credentials_filepath='/content/YOUR_JSON_KEY.json'
# Set up credentials
cmap_config.setup_credentials(credentials_filepath)
bq_client = cmap_config.get_bq_client()

# Metadata Overview

![](https://raw.githubusercontent.com/cmap/lincs-workshop-2020/main/assets/BQ_metadata_schema.jpg)

## BigQuery Table Information

### The data hosted on BigQuery is organized in the following tables

<div style="font-size: 10pt;line-height:18px;font-weight:normal">
    
**compoundinfo:** <br> Metadata for all unique compounds included in the data release. Each row contains information about a unique compound such as MoA, target, etc. 
    
**instinfo:**  <br> Sample level metadata includes information for each replicate including experimental parameters such as timepoint and dose

**siginfo:**  <br> Signature (replicate collapsed) level 5 metadata. Includes experimental parameters such as timepoint and dose as well as metrics for bioactivity such as `tas` for [Transcriptional Activity Score](https://clue.io/connectopedia/signature_quality_metrics) and `cc_q75` for Replicate Correlation

**L1000 Level3:**  <br> Gene expression (GEX, Level 2) are normalized to invariant gene set curves and quantile normalized across each plate. Here, the data from each perturbagen treatment is referred to as a profile, experiment, or instance. Additional values for 11,350 additional genes not directly measured in the L10000 assay are inferred based on the normalized values for the 978 landmark genes.

    
**L1000 Level4:**  <br> Z-scores for each gene based on Level 3 with respect to the entire plate population. This comparison of profiles to their appropriate population control generates a list of differentially expressed genes.

**L1000 Level5:** <br> Replicate-collapsed z-score vectors based on Level 4. Replicate collapse generates one differential expression vector, which we term a signature. Connectivity analyses are performed on signatures.
    
**geneinfo:** <br> Metadata for gene_ids included in the data release. Each row contains mappings between gene_symbol, ensemble_id, gene_id as well as information about gene_type

**cellinfo:** <br> Metadata for cell lines included in the data release. Each row contains information such as cell_iname, ccle_name or cell_lineage

**genetic_pertinfo**: <br> Contains information related to genetic perturbagens such as type ['oe', 'sh', 'xpr'] and relevant gene_id, ensemble_id 


# Data Access

## Get Table Schema Information

In [5]:
cmap_query.list_tables()

cellinfo: cmap-big-table.cmap_lincs_public_views.cellinfo
compoundinfo: cmap-big-table.cmap_lincs_public_views.compoundinfo
geneinfo: cmap-big-table.cmap_lincs_public_views.geneinfo
genetic_pertinfo: cmap-big-table.cmap_lincs_public_views.genetic_pertinfo
instinfo: cmap-big-table.cmap_lincs_public_views.instinfo
level3: cmap-big-table.cmap_lincs_public_views.L1000_Level3
level4: cmap-big-table.cmap_lincs_public_views.L1000_Level4
level5: cmap-big-table.cmap_lincs_public_views.L1000_Level5
siginfo: cmap-big-table.cmap_lincs_public_views.siginfo


In [6]:
cmap_query.get_table_info(bq_client, 'cmap-big-table.cmap_lincs_public_views.compoundinfo') 

Unnamed: 0,column_name,data_type
0,pert_id,STRING
1,cmap_name,STRING
2,target,STRING
3,moa,STRING
4,canonical_smiles,STRING
5,inchi_key,STRING
6,compound_aliases,STRING


In [7]:
config = cmap_config.get_default_config()
compoundinfo_table = config.tables.compoundinfo

QUERY = ( 'SELECT moa, ' 
'COUNT(DISTINCT(pert_id)) AS count ' 
'FROM `cmap-big-table.cmap_lincs_public_views.compoundinfo` ' 
'GROUP BY moa')

cmap_query.run_query(bq_client, QUERY).result().to_dataframe()

Unnamed: 0,moa,count
0,,31262
1,CAR agonist,2
2,ALK inhibitor,7
3,Akt inhibitor,13
4,BCL inhibitor,11
...,...,...
653,Telomerase reverse transcriptase expression in...,1
654,Gonadotropin releasing factor hormone receptor...,2
655,Gonadotropin releasing factor hormone receptor...,1
656,"Precursor for food preservatives, plasticizers...",1


## Raw SQL Queries

`cmapBQ.query.list_tables()` function will display table adresses of default tables for usage in SQL queries

In [8]:
import cmapBQ.query as cmap_query

cmap_query.list_tables()

cellinfo: cmap-big-table.cmap_lincs_public_views.cellinfo
compoundinfo: cmap-big-table.cmap_lincs_public_views.compoundinfo
geneinfo: cmap-big-table.cmap_lincs_public_views.geneinfo
genetic_pertinfo: cmap-big-table.cmap_lincs_public_views.genetic_pertinfo
instinfo: cmap-big-table.cmap_lincs_public_views.instinfo
level3: cmap-big-table.cmap_lincs_public_views.L1000_Level3
level4: cmap-big-table.cmap_lincs_public_views.L1000_Level4
level5: cmap-big-table.cmap_lincs_public_views.L1000_Level5
siginfo: cmap-big-table.cmap_lincs_public_views.siginfo


Raw SQL queries can be run on the public datasets as shown below. Syntax follows that of Google Biqquery, available here: https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax

Runs BigQuery query job.

    cmapBQ.query.run_query(client, query)
  
    Parameters
            client – BigQuery client object
            query – Query to run as a string
    Returns
        QueryJob object



### Example SQL Query 

In [9]:
## This query may take up to a minute
query = "SELECT COUNT(DISTINCT(sig_id)) as num_level5_sigs FROM cmap-big-table.cmap_lincs_public_views.siginfo"


# a QueryJob object is returned which is why result() and to_dataframe() are required.
cmap_query.run_query(query=query, client=bq_client).result().to_dataframe()

Unnamed: 0,num_level5_sigs
0,1202656


## Functions -- Data Preview

In [10]:
cmap_query.list_cmap_compounds(bq_client)

Unnamed: 0,cmap_name
0,L-theanine
1,L-citrulline
2,BRD-A18795974
3,BRD-A27924917
4,BRD-A35931254
...,...
33622,TAS-301
33623,goserelin-acetate
33624,triptorelin
33625,T-98475


In [11]:
cmap_query.list_cmap_targets(bq_client)

Unnamed: 0,target,count
0,,31262
1,NR1I3,3
2,ACVR1,3
3,AKT3,7
4,AKT1,10
...,...,...
886,WASL,1
887,EIF2S1,2
888,MTTP,1
889,HSD3B2,1


In [12]:
cmap_query.list_cmap_moas(bq_client)

Unnamed: 0,moa,count
0,,31262
1,CAR agonist,2
2,ALK inhibitor,7
3,Akt inhibitor,13
4,BCL inhibitor,11
...,...,...
653,Telomerase reverse transcriptase expression in...,1
654,Gonadotropin releasing factor hormone receptor...,2
655,Gonadotropin releasing factor hormone receptor...,1
656,"Precursor for food preservatives, plasticizers...",1


## Functions -- Data Retrieval

### cmap_cell

Query cellinfo table

    cmapBQ.query.cmap_cell(client, cell_iname=None, cell_alias=None, ccle_name=None, primary_disease=None, cell_lineage=None, cell_type=None, table=None, verbose=False)

    Parameters
            client – Bigquery Client
            cell_iname – List of cell_inames
            cell_alias – List of cell aliases
            ccle_name – List of ccle_names
            primary_disease – List of primary_diseases
            cell_lineage – List of cell_lineages
            cell_type – List of cell_types [link text](https://)
            table – table to query. This by default points to the siginfo table and normally should not be changed.
            verbose – Print query and table address.
    Returns
        Pandas DataFrame



In [13]:
cell_lineage = 'skin'
core_cell_lines = ['A375', 'A549', 'HCC515', 'HEPG2', 'MCF7', 'PC3', 'VCAP', 'HT29', 'HA1E']

cell_table = cmap_query.cmap_cell(
    bq_client,
    cell_iname = core_cell_lines, 
    primary_disease=None,
#    cell_lineage=cell_lineage,
    verbose=False
)

cell_table.head(10)

Unnamed: 0,cell_iname,cellosaurus_id,donor_age,donor_age_death,donor_disease_age_onset,doubling_time,growth_medium,provider_catalog_id,feature_id,cell_type,donor_ethnicity,donor_sex,donor_tumor_phase,cell_lineage,primary_disease,subtype,provider_name,growth_pattern,ccle_name,cell_alias
0,HCC515,CVCL_5136,,,,,,,,tumor,Unknown,F,Unknown,lung,lung cancer,carcinoma,,adherent,HCC515_LUNG,HCC0515
1,HA1E,,,,,60.0,MEM-ALPHA (Invitrogen A1049001) supplemented w...,,,normal,Unknown,Unknown,Unknown,kidney,normal kidney sample,normal kidney sample,,unknown,HA1E_KIDNEY,
2,A549,CVCL_0023,58.0,,,48.0,F-12K ATCC catalog # 3-24,CCL-185,c-4,tumor,Caucasian,M,Primary,lung,lung cancer,non small cell carcinoma,ATCC,adherent,A549_LUNG,A 549
3,A375,CVCL_0132,54.0,,,36.0,DMEM Invitrogen catalog # 11995-65,CRL-1619,c-127,tumor,Unknown,F,Metastatic,skin,skin cancer,melanoma,ATCC,adherent,A375_SKIN,A 375|A-375
4,HT29,CVCL_0320,44.0,,,36.0,McCoy's 5A Invitrogen catalog # 166-82,HTB-38,c-272,tumor,Caucasian,F,Primary,large_intestine,colon cancer,adenocarcinoma,ATCC,adherent,HT29_LARGE_INTESTINE,HT 29
5,HEPG2,CVCL_0027,15.0,,,84.0,EMEM ATCC catalog # 3-23,HB-8065,,tumor,Caucasian,M,Primary,liver,liver cancer,carcinoma,ATCC,adherent,HEPG2_LIVER,Hep G2|HEP-G2
6,MCF7,CVCL_0031,40.0,,,72.0,EMEM ATCC catalog # 3-23,HTB-22,c-438,tumor,Caucasian,F,Metastatic,breast,breast cancer,adenocarcinoma,ATCC,adherent,MCF7_BREAST,IBMF-7
7,PC3,CVCL_0035,62.0,,,72.0,F-12K ATCC catalog # 3-24,CRL-1435,c-214,tumor,Caucasian,M,Metastatic,prostate,prostate cancer,adenocarcinoma,ATCC,mix,PC3_PROSTATE,PC.3|PC-3
8,VCAP,CVCL_2235,,,,220.0,DMEM ATCC catalog # 3-22,,,tumor,Caucasian,M,Metastatic,prostate,prostate cancer,adenocarcinoma,ATCC,adherent,VCAP_PROSTATE,Vcap


### cmap_genes

**Query geneinfo table. Geneinfo contains information about genes including ids, symbols, types, ensembl_ids, etc.**

    cmapBQ.query.cmap_genes(client, gene_id=None, gene_symbol=None, ensembl_id=None, gene_title=None, gene_type=None, feature_space='landmark', src=None, table=None, verbose=False)

    Parameters
          client – Bigquery Client
          gene_id – list of gene_ids
          gene_symbol – list of gene_symbols
          ensembl_id – list of ensembl_ids
          gene_title – list of gene_titles
          gene_type – list of gene_types
          feature_space –
                Common featurespaces to extract. ‘rid’ overrides selection
                Choices: [‘landmark’, ‘bing’, ‘aig’]
                landmark: 978 landmark genes
                bing: Best-inferred set of 10,174 genes
                aig: All inferred genes including 12,328 genes
                Default is landmark.
          src – list of gene sources
          table – table to query. This by default points to the siginfo table and normally should not be changed.
          verbose – Print query and table address.
    Returns
          Pandas DataFrame

In [15]:
#small sample of genes
gene_symbol_list = ['EGFR', 'NR3C1', 'MDM2']
gene_id_list = [1956, 2908, 4193] 

gene_table = cmap_query.cmap_genes(
    bq_client, 
    #gene_id=gene_id_list, 
    gene_symbol=gene_symbol_list, 
    #feature_space='landmark', 
    feature_space='aig',
    #verbose=True
  )

gene_table

Unnamed: 0,gene_id,gene_symbol,ensembl_id,gene_title,gene_type,src,feature_space
0,1956,EGFR,ENSG00000146648,epidermal growth factor receptor,protein-coding,NCBI,landmark
1,2908,NR3C1,ENSG00000113580,nuclear receptor subfamily 3 group C member 1,protein-coding,NCBI,landmark
2,4193,MDM2,ENSG00000135679,MDM2 proto-oncogene,protein-coding,NCBI,best inferred


### cmap_genetic_perts

**Query genetic_pertinfo table**


    cmapBQ.query.cmap_genetic_perts(client, pert_id=None, cmap_name=None, gene_id=None, gene_title=None, ensemble_id=None, table=None, verbose=False)

    Parameters
            client – Bigquery Client
            pert_id – List of pert_ids
            cmap_name – List of cmap_names
            gene_id – List of type INTEGER corresponding to gene_ids
            gene_title – List of gene_titles
            ensemble_id – List of ensumble_ids
            table – table to query. This by default points to the siginfo table and normally should not be changed.
            verbose – Print query and table address.
    Returns: 
        Pandas Dataframe

In [16]:
#small sample of genes
gene_symbol_list = ['EGFR', 'NR3C1', 'MDM2']
gene_id_list = [1956, 2908, 4193] 

genetic_perts_table = cmap_query.cmap_genetic_perts(bq_client,
    pert_id=None,
    cmap_name=None,
    gene_id=gene_id_list,
    gene_title=None,
    verbose=True
)

genetic_perts_table.sample(10)

Table: 
 cmap-big-table.cmap_lincs_public_views.genetic_pertinfo
Query:
 SELECT * FROM cmap-big-table.cmap_lincs_public_views.genetic_pertinfo WHERE gene_id in UNNEST([1956, 2908, 4193])


Unnamed: 0,pert_id,cmap_name,pert_type,gene_id,gene_title,ensembl_id,gene_type,feature_space
44,EGFR_L858R,EGFR,trt_oe,1956,epidermal growth factor receptor,ENSG00000146648,protein-coding,landmark
95,TRCN0000355728,MDM2,trt_sh,4193,MDM2 proto-oncogene,ENSG00000135679,protein-coding,best inferred
30,BRDN0000554094,EGFR,trt_oe,1956,epidermal growth factor receptor,ENSG00000146648,protein-coding,landmark
96,TRCN0000003380,MDM2,trt_sh,4193,MDM2 proto-oncogene,ENSG00000135679,protein-coding,best inferred
82,2908_R23K,NR3C1,trt_oe,2908,nuclear receptor subfamily 3 group C member 1,ENSG00000113580,protein-coding,landmark
89,CGS001-2908,NR3C1,trt_sh.cgs,2908,nuclear receptor subfamily 3 group C member 1,ENSG00000113580,protein-coding,landmark
2,BRDN0000465000,EGFR,trt_oe,1956,epidermal growth factor receptor,ENSG00000146648,protein-coding,landmark
70,TRCN0000121067,EGFR,trt_sh,1956,epidermal growth factor receptor,ENSG00000146648,protein-coding,landmark
25,BRDN0000553938,EGFR,trt_oe,1956,epidermal growth factor receptor,ENSG00000146648,protein-coding,landmark
74,BRDN0001054761,EGFR,trt_xpr,1956,epidermal growth factor receptor,ENSG00000146648,protein-coding,landmark


### cmap_compounds



**Query compoundinfo table for various field by providing lists of compounds, moa, targets, etc. ‘AND’ operator used for multiple conditions.**


    cmapBQ.query.cmap_compounds(client, pert_id=None, cmap_name=None, moa=None, target=None, compound_aliases=None, limit=None, verbose=False)

    Parameters
            client – BigQuery Client
            pert_id – List of pert_ids
            cmap_name – List of cmap_names
            target – List of targets
            moa – List of MoAs
            compound_aliases – List of compound aliases
            limit – Maximum number of rows to return
            verbose – Print query and table address.
    Returns
        Pandas Dataframe matching queries

In [17]:
target = 'EGFR'
moa = 'EGFR inhibitor'

compound_table = cmap_query.cmap_compounds(
    bq_client,
    pert_id=None,
    cmap_name=None, 
    moa='MDM inhibitor', 
    target=None, 
    compound_aliases=None, 
    limit=None, 
    verbose=False
  )

compound_table
## Do we need to be able to query by canonical smiles or inchi_keys? 

Unnamed: 0,pert_id,cmap_name,target,moa,canonical_smiles,inchi_key,compound_aliases
0,BRD-K84987553,MDM2-inhibitor,MDM2,MDM inhibitor,OB(O)c1ccc(cc1)C(=O)/C=C/c2ccc(I)cc2,BYMGWCQXSPGCMW-XCVCLJGOSA-N,MDM-2-INHIBITOR
1,BRD-A12230535,nutlin-3,MDM2,MDM inhibitor,COc1ccc(C2=NC(C(N2C(=O)N2CCNC(=O)C2)c2ccc(Cl)c...,BDUHCSBCVGXTJM-UHFFFAOYSA-N,NUTLIN-3A
2,BRD-K00317371,RITA,MDM2,MDM inhibitor,OCc1ccc(s1)-c1ccc(o1)-c1ccc(CO)s1,KZENBFUSKMWCJF-UHFFFAOYSA-N,rita
3,BRD-K64925568,AMG-232,MDM2,MDM inhibitor,CC(C)[C@@H](CS(=O)(=O)C(C)C)N1[C@@H]([C@H](C[C...,DRLCSJFKKILATL-YWCVFVGNSA-N,
4,BRD-K17349619,HLI-373,MDM2,MDM inhibitor,CN(C)CCCNc1c2ccccc2n(C)c2nc(=O)n(C)c(=O)c12,LNRUPMPQQGPSQT-UHFFFAOYSA-N,
5,BRD-K65924316,serdemetan,MDM2,MDM inhibitor,C(Cc1c[nH]c2ccccc12)Nc1ccc(Nc2ccncc2)cc1,CEGSUKYESLWKJP-UHFFFAOYSA-N,
6,BRD-K60219430,serdemetan,MDM2,MDM inhibitor,C(Cc1c[nH]c2ccccc12)Nc1cccc(Nc2ccncc2)c1,JCKLHFMOFAYQHE-UHFFFAOYSA-N,
7,BRD-K93095519,SJ-172550,MDM4,MDM inhibitor,CCOc1cc(cc(Cl)c1OCC(=O)OC)C=C1C(=O)N(N=C/1C)c1...,RKKFQJXGAQWHBZ-YVLHZVERSA-N,
8,BRD-A16035238,SAR405838,MDM2,MDM inhibitor,CC(C)(C)CC1NC(C(c2cccc(Cl)c2F)C11C(=O)Nc2cc(Cl...,IDKAKZRYYDCJDU-UHFFFAOYSA-N,
9,BRD-K73255294,nutlin-3,MDM2,MDM inhibitor,COc1ccc(C2=N[C@@H]([C@@H](N2C(=O)N2CCNC(=O)C2)...,BDUHCSBCVGXTJM-IZLXSDGUSA-N,


In [18]:
compound_table.cmap_name.unique()

array(['MDM2-inhibitor', 'nutlin-3', 'RITA', 'AMG-232', 'HLI-373',
       'serdemetan', 'SJ-172550', 'SAR405838'], dtype=object)

### cmap_profiles

**Query per sample metadata, corresponds to level 3 and level 4 data, AND operator used for multiple conditions.**

    cmapBQ.query.cmap_profiles(client, sample_id=None, pert_id=None, cmap_name=None, cell_iname=None, build_name=None, return_fields='priority', limit=None, table=None, verbose=False)
    
    Parameters
            client – Bigquery client
            sample_id – list of sample_ids
            pert_id – list of pert_ids
            cmap_name – list of cmap_name
            build_name – list of builds
            return_fields – [‘priority’, ‘all’]
            limit – Maximum number of rows to return
            table – table to query. This by default points to the siginfo table and normally should not be changed.
            verbose – Print query and table address.
    Returns
        Pandas Dataframe



In [None]:
list_of_sample_ids = [
  ''
]

list_of_cmap_names = [
    'afatinib',
    'dacomitinib', 
    'dovitinib',
    'erlotinib',
    'gefitinib'
]

instinfo_table = cmap_query.cmap_profiles(
    bq_client,
    sample_id=None,
    return_fields='all', 
    cmap_name=list_of_cmap_names 
)

instinfo_table.head(10)

### cmap_sig

**Query level 5 metadata table**

    cmapBQ.query.cmap_sig(client, sig_id=None, pert_id=None, cmap_name=None, cell_iname=None, build_name=None, return_fields='priority', limit=None, table=None, verbose=False)
    Parameters
            client – Bigquery Client
            sig_id – list of sig_ids
            pert_id – list of pert_ids
            cmap_name – list of cmap_name, formerly pert_iname
            cell_iname – list of cell names
            build_name – list of builds
            return_fields – [‘priority’, ‘all’]
            limit – Maximum number of rows to return
            table – table to query. This by default points to the level 5 siginfo table and normally should not be changed.
            verbose – Print query and table address.
    Returns
        Pandas Dataframe



In [None]:
list_of_sig_ids = [
  ''
]

list_of_cmap_names = [
    'afatinib',
    'dacomitinib', 
    'dovitinib',
    'erlotinib',
    'gefitinib'
]


siginfo_table = cmap_query.cmap_sig(
    bq_client,
    sig_id = None, 
    cell_iname = core_cell_lines, 
    cmap_name = list_of_cmap_names,
    return_fields='priority'
)



### cmap_matrix

**Query for numerical data for signature-gene level data.**

    cmapBQ.query.cmap_matrix(client, data_level='level5', feature_space='landmark', rid=None, cid=None, verbose=False, chunk_size=1000, table=None, limit=1000)

    Parameters
            client – Bigquery Client
            data_level – Data level requested. IDs from siginfo file correspond to ‘level5’. Ids from instinfo are available in ‘level3’ and ‘level4’. Choices are [‘level5’, ‘level4’, ‘level3’]
            rid – Row ids
            cid – Column ids
            verbose – Run in verbose mode
            chunk_size – Runs queries in stages to avoid query character limit. Default 1,000
            table – Table address to query. Overrides ‘data_level’ parameter. Generally should not be used.
            verbose – Print query and table address.
    Returns
        GCToo object



In [None]:
list_of_sig_ids = list(siginfo_table.sample(10)['sig_id'])
list_of_sample_ids = list(instinfo_table.sample(10)['sample_id'])

data = cmap_query.cmap_matrix(
    bq_client,
    cid=list_of_sample_ids,
    rid=None,
    feature_space='landmark',
    data_level='level5'
)

data.data_df