In [1]:
import pandas as pd
import drug2cell as d2c

Load the human targets ChEMBL data frame created in the initial parsing notebook.

In [2]:
original = pd.read_pickle("chembl_30_merged_genesymbols_humans.pkl")

Drug2cell's filtering functions allow for subsetting the pchembl threshold for each category of a column of choice. We'll be using the `target_class` column, and basing our values on https://druggablegenome.net/ProteinFam

In [3]:
#pChEMBL is -log10() as per https://chembl.gitbook.io/chembl-interface-documentation/frequently-asked-questions/chembl-data-questions#what-is-pchembl
thresholds_dict={
    'none':7.53, #30nM
    'NHR':7, #100nM
    'GPCR':7, #100nM
    'Ion Channel':5, #10uM
    'Kinase':6, #1uM
}

We'll add some more criteria to the filtering. For a comprehensive list of available options, consult the documentation.

In [4]:
filtered_df = d2c.chembl.filter_activities(
    dataframe=original,
    drug_max_phase=4,
    assay_type='F',
    add_drug_mechanism=True,
    remove_inactive=True,
    include_active=True,
    pchembl_target_column="target_class",
    pchembl_threshold=thresholds_dict
)
print(filtered_df.shape)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


(38619, 55)


Now that we have our data frame subset to the drugs and targets of interest, we can convert them into a dictionary that can  be used by drug2cell. The exact form distributed with the package was created like so:

In [5]:
chembldict = d2c.chembl.create_drug_dictionary(
    filtered_df,
    drug_grouping='ATC_level'
)
chembldict['B03']

{'CHEMBL1622|FOLIC ACID': ['HSD17B10'],
 'CHEMBL1963684|PEGINESATIDE ACETATE': ['EPOR'],
 'CHEMBL3707314|METHOXY POLYETHYLENE GLYCOL-EPOETIN BETA': ['EPOR'],
 'CHEMBL3039545|LUSPATERCEPT': ['TGFB1', 'TGFB3', 'TGFB2'],
 'CHEMBL1201566|DARBEPOETIN ALFA': ['EPOR'],
 'CHEMBL2109092|EPOETIN BETA': ['EPOR'],
 'CHEMBL1705709|SODIUM FEREDETATE': ['NFE2L2']}

This results in a nested dictionary structure - a dictionary of categories, holding dictionaries of drugs, holding lists of targets. Drug2cell knows how to operate with this sort of structure as well as its normal groups:targets dictionary, but you need to specify `nested=True` in the scoring/enrichment/overrepresentation functions whenever you pass this structure.