## Prepare your data set (smiles)

In [1]:
# Cyclosporine, Number of substructures: 11
# %PPB: 90%
pep1 = "CC[C@H]1C(=O)N(CC(=O)N([C@H](C(=O)N[C@H](C(=O)N([C@H](C(=O)N[C@H](C(=O)N[C@@H](C(=O)N([C@H](C(=O)N([C@H](C(=O)N([C@H](C(=O)N([C@H](C(=O)N1)[C@@H]([C@H](C)C/C=C/C)O)C)C(C)C)C)CC(C)C)C)CC(C)C)C)C)C)CC(C)C)C)C(C)C)CC(C)C)C)C"

# Octreotide, Number of substructures: 6 
# %PPB: 65%
pep2 = "C[C@H]([C@H]1C(=O)N[C@@H](CSSC[C@@H](C(=O)N[C@H](C(=O)N[C@@H](C(=O)N[C@H](C(=O)N1)CCCCN)CC2=CNC3=CC=CC=C32)CC4=CC=CC=C4)NC(=O)[C@@H](CC5=CC=CC=C5)N)C(=O)N[C@H](CO)[C@@H](C)O)O"

smiles_list = [pep1, pep2]

## Calculate the predicted value of% PPB
+ We are unable to disclose the trained weights of the prediction model 
+ **You need to add the trained weights file of the model such as "model_weight/model.npz"**

In the paper, we used descriptors computed from MOE, but since MOE is a commercial software, CycPeptPPB used descriptors computed by RDKit.

Therefore, the predicted values for the two example compounds shown here are different from the values described in the paper.

You can change the variables ***use_augmentation*** and ***use_CyclicConv*** to specify the model to use:
+ *use_augmentation*=False, *use_CyclicConv*=False → Baseline model (1DCNN)  
+ *use_augmentation*=False, *use_CyclicConv*=True  → CycPeptPPB model 1 (CyclicConv)
+ *use_augmentation*=True, *use_CyclicConv*=False  → CycPeptPPB model 2 (Augmentated 1DCNN)
+ *use_augmentation*=True, *use_CyclicConv*=True   → CycPeptPPB model 3 (Augmentated CyclicConv)

In [2]:
import os
os.chdir('src')

import draw_saliency_2Dmol
import get_output

In [3]:
use_augmentation = True
use_CyclicConv = False
weight_path = "model_weight/model.npz"

pred_PPB = get_output.predict(smiles_list, use_augmentation=use_augmentation, use_CyclicConv=use_CyclicConv, weight_path=weight_path)[0]


The prediction model you have selected is: CycPeptPPB model 2 (Augmentated 1DCNN).




In [4]:
print(pred_PPB)

[94.36, 60.99]


## Contributions (Saliency Score) of each substructure to PPB rate prediction
+ The calculation of the Saliency Score is only feasible when CyclicConv is not used (Baseline model & CycPeptPPB model 2).

In [6]:
if not use_CyclicConv:
    from IPython.display import display_pdf
    import cairosvg

    i = 1

    path = 'saliency_figures/' + 'input_' + str(i)

    cairosvg.svg2pdf(url=path, write_to=path+'.pdf')

    with open(path+'.pdf',"rb") as f:
        display_pdf(f.read(),raw=True)