In [None]:
%load_ext autoreload
%autoreload 2
import mpld3
mpld3.enable_notebook()

from package.cc import ChemicalChecker
import os

os.environ['CC_CONFIG'] = 'config.json'
cc_local = ChemicalChecker()

We will start by creating the space objects that will help us connect with the data to create the visualizations:

In [None]:
# Mechanism of Action (B1)
MoA = cc_local.get_signature('char4', 'full', 'B1.001')
# Therapeutic Areas (E1)
ATC = cc_local.get_signature('char4', 'full', 'E1.001')
# Side effects (E3)
side_effects = cc_local.get_signature('char4', 'full', 'E3.001')

This objects allow us to better explore the data. Following the naming use in packages such as sklearn, these objects have a 'fit' method to train the instance, generating all the data needed for the visualizations. Unfortunately, this process is data-intensive and computationally very expensive, as it involves performing a Fisher's exact test for each feature for each of the molecules. In a big space such as B4 this involves doing this computation 2 billion times (631027 molecules by 4635 features), not to mention the amounts of memory needed to store the results. For such reasons, the code is designed ad hoc to be ran in our HPC facilities. Nonetheless, it is possible to generate the visualizations using preprocessed data. To generate molecule visualizations, we just need to run the 'predict' method, giving a query molecule. This query can be input as an InChI key, SMILES or molecule name. Here we will analyse Atenolol, a beta-blocking agent used to treat high blood pressure and heart-associated chest pain.

In [None]:
%matplotlib inline

# Mechanism of action
_, df = MoA.predict('Atenolol')
df

Calling this method generates a dataframe containing all the features enriched within the molecule's neighbourhood and a confidence score that are p-values transformed so they range from 0 to 1. It also produces an interactive figure showing the areas where these features are positively enriched, plus the position of the molecule and its neighbours. The legend is interactive allowing to choose which areas to display. You can also use the controls below the figure to pan, zoom in and out, and return to the original view. As shown, this molecule is contained by a region that is rich in beta-blocking agents, as itself. We can also take a look to the therapeutic areas and side effects that are enriched within its neighbourhood.

In [None]:
# Therapeutic areas
_, df = ATC.predict('Atenolol')
df

In [None]:
# Side effects
_, df = side_effects.predict('Atenolol')
df

The tool also allows to analyse molecules that are not present in the repository. Let's take a look at AMG 337, a highly selective inhibitor of
the MET receptor.

In [None]:
AMG337 = 'WKIALOXOAZXGMG-UHFFFAOYSA-N'

# Mechanism of action
_, df = MoA.predict(AMG337)
df

The MET receptor is, indeed, a hepatocyte growth factor receptor and a tyrosine kinase. Let's take a look at the therapeutic areas and side effects enriched within the surroundings of this molecule.

In [None]:
# Therapeutic areas
_, df = ATC.predict(AMG337)
df

The analysis correctly catalogs AMG 337 as a protein kinase inhibitor. Protein kinase inhibitors are classified as antineoplastic and immunomodulating agents according to the ATC classification. We can see that the ATC hierarchy is preserved in the plot, as the are enriched in protein kinase inhibitors is enclosed within antineoplastic and immunomodulating agents, which are antineoplastic agents.

In [None]:
# Side effects
_, df = side_effects.predict(AMG337)
df