## 12 Tasks available in the tox21 dataset and their implications on cells

[Link to the Dataset](https://paperswithcode.com/dataset/tox21-1)

**NR-AR:** This task measures the ability of a chemical to activate 
the androgen receptor, a nuclear hormone receptor involved in regulating male sexual development.

**NR-AR-LBD:** This task measures the ability of a chemical to activate the ligand-binding domain of the androgen receptor.

**NR-AhR:** This task measures the ability of a chemical to activate the aryl hydrocarbon receptor, a nuclear hormone receptor involved in regulating the metabolism of xenobiotics.

**NR-Aromatase:** This task measures the ability of a chemical to inhibit aromatase, an enzyme involved in the biosynthesis of estrogen.

**NR-ER**: This task measures the ability of a chemical to activate the estrogen receptor, a nuclear hormone receptor involved in regulating female sexual development.

**NR-ER-LBD:** This task measures the ability of a chemical to activate the ligand-binding domain of the estrogen receptor.

**NR-PPAR-gamma:** This task measures the ability of a chemical to activate peroxisome proliferator-activated receptor gamma, a nuclear hormone receptor involved in regulating glucose and lipid metabolism.

**SR-ARE:** **bold text** This task measures the ability of a chemical to activate the antioxidant response element, a transcription factor involved in regulating cellular oxidative stress.

**SR-ATAD5:** This task measures the ability of a chemical to inhibit ATAD5, a protein involved in DNA repair and cell cycle control.

**SR-HSE:** This task measures the ability of a chemical to activate the heat shock response element, a transcription factor involved in regulating cellular stress response.

**SR-MMP:** This task measures the ability of a chemical to inhibit matrix metalloproteinases, a family of enzymes involved in tissue remodeling and repair.

**SR-p53:** This task measures the ability of a chemical to activate the p53 tumor suppressor protein, a transcription factor involved in regulating cell cycle arrest and apoptosis.


## Installing dependencies

In [None]:
!pip install deepchem

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting deepchem
  Downloading deepchem-2.7.1-py3-none-any.whl (693 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m693.2/693.2 KB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
Collecting scipy<1.9
  Downloading scipy-1.8.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (41.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.6/41.6 MB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
Collecting rdkit
  Downloading rdkit-2022.9.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (29.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m29.4/29.4 MB[0m [31m33.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: scipy, rdkit, deepchem
  Attempting uninstall: scipy
    Found existing installation: scipy 1.10.1
    Uninstalling scipy-1.10.1:
      Successfully uninstalled scipy-1.10.1
Successfully installed deepchem

In [None]:
!pip install pubchempy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pubchempy
  Downloading PubChemPy-1.0.4.tar.gz (29 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pubchempy
  Building wheel for pubchempy (setup.py) ... [?25l[?25hdone
  Created wheel for pubchempy: filename=PubChemPy-1.0.4-py3-none-any.whl size=13834 sha256=4421c83ce7f092d5de23ec2fed21ddee1f57bf61e1b430b664740077a0010da5
  Stored in directory: /root/.cache/pip/wheels/b0/8c/ba/3b00b89931153bf5a4eaa8e73bd1b0319a879cc45175326854
Successfully built pubchempy
Installing collected packages: pubchempy
Successfully installed pubchempy-1.0.4


## Training DeepChem on Tox21

In [None]:
import deepchem as dc

tox21_tasks, tox21_datasets, transformers = dc.molnet.load_tox21(featurizer='GraphConv')
train_dataset, valid_dataset, test_dataset = tox21_datasets

model = dc.models.GraphConvModel(n_tasks=len(tox21_tasks), mode='classification', dropout=0.2)



In [None]:
print('Number of training samples:', len(train_dataset))
print('Number of test samples:', len(test_dataset))
print('Number of features:', train_dataset.get_data_shape())
print('Number of tasks:', len(tox21_tasks))
print('Task names:', tox21_tasks)
print('Toxicity distribution in the training set:', train_dataset.y.mean(axis=0))
print('Toxicity distribution in the test set:', test_dataset.y.mean(axis=0))

Number of training samples: 6264
Number of test samples: 784
Number of features: ()
Number of tasks: 12
Task names: ['NR-AR', 'NR-AR-LBD', 'NR-AhR', 'NR-Aromatase', 'NR-ER', 'NR-ER-LBD', 'NR-PPAR-gamma', 'SR-ARE', 'SR-ATAD5', 'SR-HSE', 'SR-MMP', 'SR-p53']
Toxicity distribution in the training set: [0.04007024 0.03081098 0.09402937 0.03320562 0.10344828 0.04789272
 0.0210728  0.11462324 0.03128991 0.04485951 0.11350575 0.0440613 ]
Toxicity distribution in the test set: [0.03443878 0.02423469 0.11734694 0.05994898 0.08928571 0.02678571
 0.02806122 0.1505102  0.04209184 0.05994898 0.12244898 0.09183673]


In [None]:
model.fit(train_dataset, nb_epoch=50)

0.60320068359375

In [None]:
metric = dc.metrics.Metric(dc.metrics.roc_auc_score)
print('Test set ROC AUC:', model.evaluate(test_dataset, [metric], transformers)['roc_auc_score'])

Test set ROC AUC: 0.6982155567135009


## Testing on a single ingredient

In [None]:
import pubchempy as pcp
import numpy as np
import pandas as pd
from rdkit import Chem

In [None]:
ingredient_name = 'water'
result = pcp.get_compounds(ingredient_name, 'name')[0]
compund = Chem.MolFromSmiles(result.canonical_smiles)
print('CID:', result.cid)
print('Name:', result.iupac_name)
print('SMILES:', result.canonical_smiles)

CID: 962
Name: oxidane
SMILES: O


In [None]:
molecule = Chem.MolFromSmiles(result.canonical_smiles)
featurizer = dc.feat.graph_features.ConvMolFeaturizer()
features = featurizer([molecule])
toxicity = model.predict_on_batch(np.array(features))

In [None]:
print("Toxicity:", toxicity.mean(axis=1)[0][0] * 100)

Toxicity: 58.28824043273926


## Experimentation

### References:
1. Huang, R., Xia, M., Nguyen, D.T., Zhao, T., Sakamuru, S., Zhao, J., Shahane, S.A., Rossoshek, A., Zhu, H., Austin, C.P., et al. (2016). Tox21 Challenge to Build Predictive Models of Nuclear Receptor and Stress Response Pathways as Mediated by Exposure to Environmental Chemicals and Drugs. Front Environ Sci 3, 85. 
https://doi.org/10.3389/fenvs.2015.00085

2. Ruuskanen, J., & Michelsen, O. (2020). Machine learning in ecotoxicology: predictions of acute toxicity towards aquatic organisms. Environmental Science: Processes & Impacts, 22(9), 1929-1943. https://doi.org/10.1039/D0EM00207F

3. Fourches, D., Muratov, E., Tropsha, A. (2015). Trust, but verify: On the importance of chemical structure curation in cheminformatics and QSAR modeling research. Journal of Chemical Information and Modeling, 55(10), 1894-1901. https://doi.org/10.1021/acs.jcim.5b00227

### Aquatic
NR_Aromatase, NR_ER, NR_ER_LBD, and NR_PPAR_gamma are nuclear receptor targets that are known to be important for the endocrine system of aquatic organisms. SR_ARE is a target for antioxidant response elements, which are important for the response of organisms to oxidative stress. SR_ATAD5 is a target for DNA repair, which is important for maintaining genomic stability in aquatic organisms. SR_HSE is a target for the heat shock response, which is important for the response of organisms to thermal stress. SR_MMP is a target for matrix metalloproteinases, which are important for the breakdown of extracellular matrix components in aquatic organisms.

In [None]:
aquatic_task = ['NR_Aromatase', 'NR_ER', 'NR_ER_LBD', 'NR_PPAR_gamma', 'SR_ARE', 'SR_ATAD5', 'SR_HSE', 'SR_MMP']

### Environment
Tasks related to androgen and mineralocorticoid receptor are excluded since these receptors are primarily involved in regulating physiological functions in humans rather than environmental toxicity.

**NR_Aromatase:** This enzyme is involved in the production of estrogens, which are important hormones for both plants and animals.

**NR_ER:** This receptor is activated by estrogens, and it plays a role in a variety of biological processes, including cell growth, differentiation, and apoptosis.

**NR_ER_LBD:** This is a ligand-binding domain of the ER receptor, and it is involved in the regulation of gene expression.

**NR_PPAR_gamma:** This receptor is activated by fatty acids, and it plays a role in the regulation of metabolism, inflammation, and cell growth.

**SR_ARE:** This is a short-repeat element that is involved in the regulation of genes that are involved in the response to stress.

**SR_ATAD5:** This protein is involved in DNA repair, and it is important for maintaining genomic stability.

**SR_HSE:** This protein is involved in the heat shock response, and it is important for the response of organisms to thermal stress.

**SR_MMP:** These enzymes are involved in the breakdown of extracellular matrix components, and they are important for a variety of biological processes, including cell migration, wound healing, and bone remodeling.


The disruption of any of these processes can have a negative impact on soil and land. For example, the disruption of the ER pathway can lead to an increase in the growth of weeds and invasive plants. The disruption of the PPAR_gamma pathway can lead to an increase in the production of reactive oxygen species, which can damage soil and plant tissues. The disruption of the ARE pathway can lead to an increase in the susceptibility of plants to stress. The disruption of the ATAD5 pathway can lead to an increase in the mutation rate, which can lead to the development of genetic diseases. The disruption of the HSE pathway can lead to an increase in the susceptibility of plants to heat stress. The disruption of the MMP pathway can lead to an increase in the erosion of soil and the degradation of plant tissues.

In [None]:
environment_task = ['NR_Aromatase', 'NR_ER', 'NR_ER_LBD', 'NR_PPAR_gamma', 'SR_ARE', 'SR_ATAD5', 'SR_HSE', 'SR_MMP']

### Human
Androgen and mineralocorticoid receptors

In [None]:
human_task = ['NR-AR', 'NR-AR-LBD', 'NR-Aromatase', 'NR-ER', 'NR-ER-LBD', 'NR-PPAR-gamma', 'SR-ARE', 'SR-ATAD5', 'SR-HSE', 'SR-MMP']

### Testing for 1 ingredient

In [None]:
import deepchem as dc
from deepchem.models import GraphConvModel
import pubchempy as pcp
import numpy as np
import pandas as pd
from rdkit import Chem

In [None]:
ingredient_name = 'sodium lauryl sulphate'
result = pcp.get_compounds(ingredient_name, 'name')[0]
compound = Chem.MolFromSmiles(result.canonical_smiles)

In [None]:
tox21_tasks, tox21_datasets, transformers = dc.molnet.load_tox21(featurizer='GraphConv')
train_dataset, valid_dataset, test_dataset = tox21_datasets
model = GraphConvModel(len(tox21_tasks), mode='classification')
model.fit(train_dataset, nb_epoch=50)



0.28897264480590823

In [None]:
metric = dc.metrics.Metric(dc.metrics.roc_auc_score)
print('Test set ROC AUC:', model.evaluate(test_dataset, [metric], transformers)['roc_auc_score'])

Test set ROC AUC: 0.7121732398362588


In [None]:
def predict_toxicity(compound):
    featurizer = dc.feat.graph_features.ConvMolFeaturizer()
    features = featurizer([compound])
    environment_preds = model.predict_on_batch(features)[0][environment_task_indices]
    aquatic_preds = model.predict_on_batch(features)[0][aquatic_task_indices]
    human_preds = model.predict_on_batch(features)[0][human_task_indices]
    return environment_preds, aquatic_preds, human_preds

In [None]:
print('Task names:', tox21_tasks)

Task names: ['NR-AR', 'NR-AR-LBD', 'NR-AhR', 'NR-Aromatase', 'NR-ER', 'NR-ER-LBD', 'NR-PPAR-gamma', 'SR-ARE', 'SR-ATAD5', 'SR-HSE', 'SR-MMP', 'SR-p53']


In [None]:
human_task = ['NR-AR', 'NR-AR-LBD', 'NR-Aromatase', 'NR-ER', 'NR-ER-LBD']
environment_task = ['NR-AhR', 'NR-PPAR-gamma', 'SR-ARE', 'SR-ATAD5', 'SR-HSE', 'SR-MMP', 'SR-p53']
aquatic_task = ['NR-Aromatase', 'NR-ER', 'NR-ER-LBD', 'NR-PPAR-gamma', 'SR-ARE', 'SR-ATAD5', 'SR-HSE', 'SR-MMP']

In [None]:
environment_task_indices = [tox21_tasks.index(task) for task in environment_task]
aquatic_task_indices = [tox21_tasks.index(task) for task in aquatic_task]
human_task_indices = [tox21_tasks.index(task) for task in human_task]

In [None]:
environment_preds, aquatic_preds, human_preds = predict_toxicity(compound)
print('Environment toxicity predictions:', environment_preds)
print('Aquatic toxicity predictions:', aquatic_preds)
print('Human toxicity predictions:', human_preds)

Environment toxicity predictions: [[9.7210050e-01 2.7899554e-02]
 [9.9986506e-01 1.3510328e-04]
 [9.8973483e-01 1.0265206e-02]
 [9.9742776e-01 2.5722699e-03]
 [9.8038578e-01 1.9614194e-02]
 [9.9314499e-01 6.8550431e-03]
 [9.9979335e-01 2.0664126e-04]]
Aquatic toxicity predictions: [[9.9078655e-01 9.2133377e-03]
 [9.9559152e-01 4.4085332e-03]
 [9.9901003e-01 9.8994607e-04]
 [9.9986506e-01 1.3510328e-04]
 [9.8973483e-01 1.0265206e-02]
 [9.9742776e-01 2.5722699e-03]
 [9.8038578e-01 1.9614194e-02]
 [9.9314499e-01 6.8550431e-03]]
Human toxicity predictions: [[9.9822330e-01 1.7767993e-03]
 [9.9944508e-01 5.5490050e-04]
 [9.9078655e-01 9.2133377e-03]
 [9.9559152e-01 4.4085332e-03]
 [9.9901003e-01 9.8994607e-04]]


In [None]:
environment_preds.mean(axis=0)

array([0.99035037, 0.00964972], dtype=float32)

In [None]:
human_preds.mean(axis=0)

array([0.9966113, 0.0033887], dtype=float32)

In [None]:
aquatic_preds.mean(axis=0)

array([0.99324334, 0.0067567 ], dtype=float32)

### Testing for a product

In [None]:
import requests, json

def get_compound_url(compound_name):
    # fetch PubChem CID for the given compound name
    url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/{compound_name}/cids/JSON"
    response = requests.get(url)
    if response.status_code == 200:
        data = json.loads(response.content)
        if "IdentifierList" in data:
            cid = data["IdentifierList"]["CID"][0]
            return f"https://pubchem.ncbi.nlm.nih.gov/compound/{cid}"
        else:
            return f"https://pubchem.ncbi.nlm.nih.gov/"
    else:
      return f"https://pubchem.ncbi.nlm.nih.gov/"

In [None]:
get_compound_url("cetyl alcohol")

'https://pubchem.ncbi.nlm.nih.gov/compound/2682'

In [None]:
!unzip tox_pred.zip 

Archive:  tox_pred.zip
   creating: content/tox_pred/
  inflating: content/tox_pred/ckpt-4.index  
  inflating: content/tox_pred/ckpt-1.index  
  inflating: content/tox_pred/ckpt-3.index  
  inflating: content/tox_pred/ckpt-3.data-00000-of-00001  
  inflating: content/tox_pred/ckpt-1.data-00000-of-00001  
  inflating: content/tox_pred/ckpt-2.data-00000-of-00001  
  inflating: content/tox_pred/checkpoint  
  inflating: content/tox_pred/ckpt-4.data-00000-of-00001  
  inflating: content/tox_pred/ckpt-2.index  


In [None]:
cetaphil = """Water, Glycerin, Propylene glycol, Glyceryl stearate, Cetyl alcohol, Stearyl alcohol, Dimethicone, Cyclomethicone""".lower()
ings = cetaphil.split(", ")

In [None]:
import deepchem as dc
from deepchem.models import GraphConvModel
import pubchempy as pcp
import numpy as np
import pandas as pd
from rdkit import Chem



In [None]:
# don't run
tox21_tasks, tox21_datasets, transformers = dc.molnet.load_tox21(featurizer='GraphConv')
train_dataset, valid_dataset, test_dataset = tox21_datasets
model = GraphConvModel(len(tox21_tasks), mode='classification', model_dir='tox_pred')
model.fit(train_dataset, nb_epoch=50)



0.2629950332641602

In [None]:
tox21_tasks = ['NR-AR',
 'NR-AR-LBD',
 'NR-AhR',
 'NR-Aromatase',
 'NR-ER',
 'NR-ER-LBD',
 'NR-PPAR-gamma',
 'SR-ARE',
 'SR-ATAD5',
 'SR-HSE',
 'SR-MMP',
 'SR-p53']

In [None]:
### for using later
model = dc.models.GraphConvModel(len(tox21_tasks), mode='classification', model_dir='content/tox_pred')
model.restore()

In [None]:
human_task = ['NR-AR', 'NR-AR-LBD', 'NR-Aromatase', 'NR-ER', 'NR-ER-LBD']
environment_task = ['NR-AhR', 'NR-PPAR-gamma', 'SR-ARE', 'SR-ATAD5', 'SR-HSE', 'SR-MMP', 'SR-p53']
aquatic_task = ['NR-Aromatase', 'NR-ER', 'NR-ER-LBD', 'NR-PPAR-gamma', 'SR-ARE', 'SR-ATAD5', 'SR-HSE', 'SR-MMP']

In [None]:
environment_task_indices = [tox21_tasks.index(task) for task in environment_task]
aquatic_task_indices = [tox21_tasks.index(task) for task in aquatic_task]
human_task_indices = [tox21_tasks.index(task) for task in human_task]

In [None]:
def predict_toxicity(compound):
    featurizer = dc.feat.graph_features.ConvMolFeaturizer()
    features = featurizer([compound])
    composite_preds =  model.predict_on_batch(features)[0]
    environment_preds = composite_preds[environment_task_indices]
    aquatic_preds = composite_preds[aquatic_task_indices]
    human_preds = composite_preds[human_task_indices]
    return environment_preds, aquatic_preds, human_preds, composite_preds

In [None]:
# [ing.lower() for ing in ["Water", "Sodium Laureth Sulfate", "Glycol Distearate", "Cocamidopropyl Betaine", "Sodium Chloride", "Fragrance (Parfum)", "Glycerin", "Dimethicone", "Dimethiconol", "Acrylates/Beheneth-25 Methacrylate Copolymer", "Styrene/Acrylates Copolymer", "Guar Hydroxypropyltrimonium Chloride", "Citric Acid", "Tetrasodium Edta", "Amodimethicone", "Dmdm Hydantoin", "Peg-45M", "Tea-Dodecylbenzenesulfonate, Cocamide Mea", "Lysine Hcl", "Arginine", "Peg-9M", "Cetrimonium Chloride", "Ppg-9", "Propylene Glycol", "Methylchloroisothiazolinone", "Methylisothiazolinone", "Mica", "Titanium Dioxide"]]
list_of_ingredients = set(ings)
compounds = {}
exemptions = set(['water', 'sodium chloride', 'potassium chloride', 'magnesium chloride', 'calcium chloride', 'sodium hydroxide', 'potassium hydroxide', 'ammonium hydroxide', 'hydrochloric acid', 'sulfuric acid', 'nitric acid', 'acetic acid', 'citric acid', 'lactic acid', 'benzoic acid', 'salicylic acid', 'urea', 'glycerin', 'propylene glycol', 'ethanol', 'isopropyl alcohol', 'hexylene glycol', 'butylene glycol', 'propanediol', 'polyethylene glycol (PEG)', 'sorbitol', 'xylitol', 'sucralose', 'saccharin', 'aspartame', 'titanium dioxide', 'iron oxide'])
ingredients = list(list_of_ingredients - exemptions)
exempts = list(list_of_ingredients - set(ingredients))
print(exempts)

['propylene glycol', 'glycerin', 'water']


In [None]:
for ingredient_name in ingredients:
  try:
    result = pcp.get_compounds(ingredient_name, 'name')[0]
    compound = Chem.MolFromSmiles(result.canonical_smiles)
    compounds[compound] = ingredient_name
  except:
    print("Left:", ingredient_name)

In [None]:
compounds

{<rdkit.Chem.rdchem.Mol at 0x7f8f1c1d1660>: 'stearyl alcohol',
 <rdkit.Chem.rdchem.Mol at 0x7f8f1c1d1430>: 'cyclomethicone',
 <rdkit.Chem.rdchem.Mol at 0x7f8f1c1d1580>: 'dimethicone',
 <rdkit.Chem.rdchem.Mol at 0x7f8f1c1d14a0>: 'glyceryl stearate',
 <rdkit.Chem.rdchem.Mol at 0x7f8f1c1d13c0>: 'cetyl alcohol'}

In [None]:
summary = {}
overall = 0
ingredient_count = len(compounds)
aqua_tot, hum_tot, env_tot = 0, 0, 0
for compound, name in compounds.items():
  environment_preds, aquatic_preds, human_preds, composite_preds = predict_toxicity(compound)
  env, aqua, hum, comp = environment_preds.mean(axis=0)[0], aquatic_preds.mean(axis=0)[0], human_preds.mean(axis=0)[0], composite_preds.mean(axis=0)[0]
  overall += comp
  aqua_tot += aqua
  env_tot += env
  #hum_tot += hum
  summary[name.capitalize()] = [aqua, env, get_compound_url(name)]#, hum]
summary["Overall"] = overall / ingredient_count
summary["Aquatic"] = aqua_tot / ingredient_count
#summary["Human"] = hum_tot / ingredient_count
summary["Environment"] = env_tot / ingredient_count

for ing in exempts:
  summary[ing.capitalize()] = [0, 0, get_compound_url(ing)]


In [None]:
from pprint import pprint
print("""Scheme------------
     Name : [Aquatic, Environment, URL]
     Three separate entities: Aquatic, Environment, Overall
     """)
pprint(summary)

Scheme------------
     Name : [Aquatic, Environment, URL]
     Three separate entities: Aquatic, Environment, Overall
     
{'Aquatic': 0.8653411030769348,
 'Cetyl alcohol': [0.78995514,
                   0.8417698,
                   'https://pubchem.ncbi.nlm.nih.gov/compound/2682'],
 'Cyclomethicone': [0.90578717,
                    0.994877,
                    'https://pubchem.ncbi.nlm.nih.gov/compound/10913'],
 'Dimethicone': [0.90537465,
                 0.97879666,
                 'https://pubchem.ncbi.nlm.nih.gov/compound/24764'],
 'Environment': 0.9197543621063232,
 'Glycerin': [0, 0, 'https://pubchem.ncbi.nlm.nih.gov/compound/753'],
 'Glyceryl stearate': [0.9360534,
                       0.9419845,
                       'https://pubchem.ncbi.nlm.nih.gov/compound/24699'],
 'Overall': 0.8996385216712952,
 'Propylene glycol': [0, 0, 'https://pubchem.ncbi.nlm.nih.gov/compound/1030'],
 'Stearyl alcohol': [0.78953516,
                     0.8413439,
                     'http

In [None]:
print(dc.__version__)

2.7.1


In [None]:
!zip -r /content/tox_pred.zip /content/tox_pred

  adding: content/tox_pred/ (stored 0%)
  adding: content/tox_pred/ckpt-4.index (deflated 81%)
  adding: content/tox_pred/ckpt-1.index (deflated 81%)
  adding: content/tox_pred/ckpt-3.index (deflated 81%)
  adding: content/tox_pred/ckpt-3.data-00000-of-00001 (deflated 45%)
  adding: content/tox_pred/ckpt-1.data-00000-of-00001 (deflated 46%)
  adding: content/tox_pred/ckpt-2.data-00000-of-00001 (deflated 45%)
  adding: content/tox_pred/checkpoint (deflated 69%)
  adding: content/tox_pred/ckpt-4.data-00000-of-00001 (deflated 45%)
  adding: content/tox_pred/ckpt-2.index (deflated 81%)


In [None]:
from google.colab import files
files.download("/content/tox_pred.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Human

In [None]:
tasks, datasets, transformers = dc.molnet.load_hiv(featurizer='GraphConv')



In [None]:
train_dataset, valid_dataset, test_dataset = datasets
model_h = GraphConvModel(len(tasks), mode='classification')
model_h.fit(train_dataset)
metric = dc.metrics.Metric(dc.metrics.roc_auc_score, np.mean)
scores = model_h.evaluate(test_dataset, [metric], transformers)
print(scores)



{'mean-roc_auc_score': 0.7344116340601403}


In [None]:
featurizer = dc.feat.graph_features.ConvMolFeaturizer()
cetaphil_on_humans = {}
for compound, name in compounds.items():
  features = featurizer([compound])
  human_skin_pred =  model.predict_on_batch(features)[0].mean(axis=0)[0]
  cetaphil_on_humans[name.capitalize()] = human_skin_pred

In [None]:
pprint(cetaphil_on_humans)

{'Butylparaben': 0.6054374,
 'Cetyl alcohol': 0.8463711,
 'Methylparaben': 0.8001178,
 'Propylparaben': 0.66281444,
 'Sodium lauryl sulfate': 0.94492215,
 'Stearyl alcohol': 0.84578824}
