# Caco-2 (Cell Effective Permeability), Wang et al.

### Dataset Description: The human colon epithelial cancer cell line, Caco-2, is used as an in vitro model to simulate the human intestinal tissue. The experimental result on the rate of drug passing through the Caco-2 cells can approximate the rate at which the drug permeates through the human intestinal tissue.

### Task Description: Regression. Given a drug SMILES string, predict the Caco-2 cell effective permeability.

### Dataset Statistics: 906 drugs.

### Metric: MAE

## Leaderboard

| Rank | Model                       | Contact        | Link          | #Params   | MAE           |
|------|-----------------------------|----------------|---------------|-----------|---------------|
| 1    | MapLight                    | Jim Notwell    | GitHub, Paper | N/A       | 0.276 ± 0.005 |
| 2    | BaseBoosting                | Andrew Li      | GitHub, Paper | 365,713   | 0.285 ± 0.005 |
| 3    | MolMapNet-D                 | Shen Wan Xiang | GitHub, Paper | 407,617   | 0.287 ± 0.005 |
| 4    | MapLight + GNN              | Jim Notwell    | GitHub, Paper | N/A       | 0.287 ± 0.005 |
| 5    | XGBoost                     | Andrew Li      | GitHub, Paper | 12        | 0.289 ± 0.011 |
| 6    | Basic ML                    | Nilavo Boral   | GitHub, Paper | N/A       | 0.321 ± 0.005 |
| 7    | Chemprop-RDKit              | Kyle Swanson   | GitHub, Paper | N/A       | 0.330 ± 0.024 |
| 8    | Euclia ML model             | Euclia         | GitHub, Paper | 50        | 0.341 ± 0.004 |
| 9    | Chemprop                    | Kyle Swanson   | GitHub, Paper | N/A       | 0.344 ± 0.015 |
| 10   | RDKit2D + MLP (DeepPurpose) | Kexin Huang    | GitHub, Paper | 633,409   | 0.393 ± 0.024 |
| 11   | AttentiveFP                 | Kexin Huang    | GitHub, Paper | 300,806   | 0.401 ± 0.032 |
| 12   | CNN (DeepPurpose)           | Kexin Huang    | GitHub, Paper | 226,625   | 0.446 ± 0.036 |
| 13   | ContextPred                 | Kexin Huang    | GitHub, Paper | 2,067,053 | 0.502 ± 0.036 |
| 14   | NeuralFP                    | Kexin Huang    | GitHub, Paper | 480,193   | 0.530 ± 0.102 |
| 15   | AttrMasking                 | Kexin Huang    | GitHub, Paper | 2,067,053 | 0.546 ± 0.052 |
| 16   | GCN                         | Kexin Huang    | GitHub, Paper | 191,810   | 0.599 ± 0.104 |
| 17   | Morgan + MLP (DeepPurpose)  | Kexin Huang    | GitHub, Paper | 1,477,185 | 0.908 ± 0.060 |

In [5]:
import pandas as pd
from deepmol.pipeline import Pipeline

pipeline = Pipeline.load('caco/trial_54')
pipeline.steps[-1][1].model.__dict__

{'model_dir_is_temp': True,
 'model_dir': '/tmp/tmpmdw6xezf',
 'model': AttentiveFP(
   (model): AttentiveFPPredictor(
     (gnn): AttentiveFPGNN(
       (init_context): GetContext(
         (project_node): Sequential(
           (0): Linear(in_features=33, out_features=100, bias=True)
           (1): LeakyReLU(negative_slope=0.01)
         )
         (project_edge1): Sequential(
           (0): Linear(in_features=44, out_features=100, bias=True)
           (1): LeakyReLU(negative_slope=0.01)
         )
         (project_edge2): Sequential(
           (0): Dropout(p=0.0, inplace=False)
           (1): Linear(in_features=200, out_features=1, bias=True)
           (2): LeakyReLU(negative_slope=0.01)
         )
         (attentive_gru): AttentiveGRU1(
           (edge_transform): Sequential(
             (0): Dropout(p=0.0, inplace=False)
             (1): Linear(in_features=100, out_features=100, bias=True)
           )
           (gru): GRUCell(100, 100)
         )
       )
       (gnn_

In [4]:
pipeline.steps

[('standardizer',
  <deepmol.base.transformer.PassThroughTransformer at 0x7f528362d4b0>),
 ('featurizer',
  <deepmol.compound_featurization.deepchem_featurizers.MolGraphConvFeat at 0x7f528362d990>),
 ('model', <deepmol.models.deepchem_models.DeepChemModel at 0x7f2ee24f2230>)]

In [2]:
# read results
results = pd.read_csv('caco/tdc_test_set_results.txt', sep=',', header=None, dtype={0: int, 1: float, 2: float})
# set columns
results.columns = ['trial_id', 'mean', 'std']
results
# order res

Unnamed: 0,trial_id,mean,std
0,0,0.458,0.004
1,1,0.572,0.031
2,2,0.618,0.062
3,4,0.464,0.008
4,5,1.365,0.214
5,6,0.449,0.021
6,8,0.596,0.028
7,9,1.085,0.051
8,10,0.37,0.018
9,11,0.368,0.029


In [3]:
# order results by mean (std in case of tie)
results = results.sort_values(by=['mean', 'std'], ascending=True)
results

Unnamed: 0,trial_id,mean,std
43,77,0.348,0.004
48,90,0.35,0.02
38,70,0.353,0.006
10,12,0.36,0.013
17,32,0.361,0.025
14,22,0.363,0.032
37,66,0.368,0.009
9,11,0.368,0.029
8,10,0.37,0.018
18,33,0.381,0.014


In [4]:
# load best trial pipeline (rank #10)
best_trial_id = int(results.iloc[0]['trial_id'])
pipeline = Pipeline.load(f"caco/trial_{best_trial_id}/")

[21:06:34] Initializing Normalizer
2023-11-11 21:06:34.588336: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2023-11-11 21:06:34.588369: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: JOAOPC
2023-11-11 21:06:34.588387: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: JOAOPC
2023-11-11 21:06:34.588490: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: 525.147.5
2023-11-11 21:06:34.588511: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: 525.147.5
2023-11-11 21:06:34.588516: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:309] kernel version seems to match DSO: 525.147.5


In [5]:
pipeline.steps

[('standardizer',
  <deepmol.base.transformer.PassThroughTransformer at 0x7f5a71cebfa0>),
 ('padder', <deepmol.base.transformer.DatasetTransformer at 0x7f590dbe3b20>),
 ('model', <deepmol.models.deepchem_models.DeepChemModel at 0x7f59dabe15d0>)]