How to predict DTI of new compounds with trained protein models？ #5

Bigrock-dd · 2020-11-23T01:07:23Z

Thanks！

Bigrock-dd · 2020-11-23T01:40:21Z

Another question，you have proposed three network options in your article. Where are the files of the other two networks?
Looking forward to your reply！

tuncadogan · 2020-11-29T19:17:37Z

Hi, thank you for your interest and sorry for the late reply. I cannot see your original question at the moment but I believe you asked how can you produce predictions with a DEEPScreen model you trained.

The answer: We have not prepared our ready-to-use models yet; however, you can easily do this by modifying the "train_val_test_dict.json" file of the target protein that you are interested in. In this file, there are ChEMBL ids of the compounds that are in the training set, validation set, and the hold-out test set, including their labels as either 0 or 1 (meaning inactive or active).

Let's say you wish to produce predictions for 100 compounds that are not known to interact with the target of interest. Include their ids in the test dataset split of that target (inside the "train_val_test_dict.json" file, to be included in the test dataset -and not train or validation- they should be added after the expression "test": [ in the file, also give them fake labels as 0 or 1 similar to the compounds already included in that file, it does not matter which label you gave to them).

After that, train/test the model using the desired hyperparameters. When the process is finished check the prediction results in the "best_val_test_predictions-....txt" file. Here you can see the prediction results of the compounds you have added to the test dataset (in the right-most column).

The only downside is that the reported predictive performance of this model will change according to the concordance between your fake labels and the prediction results of these compounds, as a result, do not trust these performance results. If you wish to have a reliable performance calculation for your model, please first train/test the model without the compounds that you wish to produce predictions for. Check the performance results from the original model and if you are satisfied with the performance, re-train/test your model this time with your compound additions.

Two important things:

2D images of your prediction compounds should be in: "target_training_datasets/chembl_id_of_your_target/imgs/"
the compounds ids you added to "train_val_test_dict.json" file should be the same as the filenames of the image files of these compounds (e.g. compound id: "CHEMBL123" and image filename: "CHEMBL123.png").

Since these compound images are generated with a specific library (RDkit), the same library and parameters should be used to generate the images of your new compounds. If those compounds are already in ChEMBL, you may easily find them in this file containing nearly 409K ChEMBL compounds: https://drive.google.com/file/d/1E7ZpLN_fMdXmPJPP7WH3IPWPceleP_3a/view?usp=sharing

If not, please follow the instructions that we explained in our paper to construct the images of your compounds of interest. Please let me know if you have further questions.

Answer to your second question: We presented those 3 architectures in the first version of the DEEPScreen tool. This is the new and the better version (re-coded using modern frameworks and with GPU support). In this one, there is only one CNN architecture. However, its predictive performance is on par with the old ones.

Bigrock-dd · 2020-11-30T01:06:15Z

Thank you very much for your patient answer! Very helpful to me!
In the first version of DEEPScreen, a compressed package file was damaged and the code could not be reproduced. If possible, can you upload the compressed package file named chembl_23_chemreps.txt.zip again? Thanks！

tuncadogan · 2020-11-30T08:43:48Z

No problem at all, glad that it helped.

Yes, we could not retrieve that damaged package and after that, we moved to the new version of DEEPScreen. This means that even if have all files you cannot directly use those pre-trained models (since the optimal threshold information is missing). But you can train your own model by following the instructions in the old version of our repository.

To find these instructions and the file you are looking for please switch to the master branch in our repository (the new and the default version of DEEPScreen if the "PyTorch" branch). In the master branch, you can find the file you are looking for in this path: "DEEPScreen/trainingFiles/chembl_23_chemreps.txt.zip"

ljh433 · 2020-12-09T13:30:45Z

Hi,i want to construct the images of my compounds of interest,Do you have the code to construct the image？

tuncadogan · 2020-12-10T18:31:04Z

Hi, we currently do not have a ready-to-use module inside the platform for molecule image drawing. However, we are using RDkit for this, as explained here:

https://www.rdkit.org/docs/GettingStartedInPython.html#drawing-molecules

Here is a piece to display the settings we used for molecule drawing over a simple example SMILES:

>>> IMG_SIZE = 200
>>> smiles="CCc1nc(N)nc(N)c1-c1ccc(Cl)cc1"
>>> mol = Chem.MolFromSmiles(smiles)
>>> d = rdMolDraw2D.MolDraw2DCairo(IMG_SIZE, IMG_SIZE)
>>> d.drawOptions().bondLineWidth = 1
>>> d.DrawMolecule(mol)
>>> d.FinishDrawing()
>>> d.WriteDrawingText('comp_id_2.png')

Also, please find the pre-computed 2-D images of all compounds in ChEMBL v27 here:

https://drive.google.com/file/d/16T8NI1Umf8A0qeLu90Akbx3ic-vdAbUO/view?usp=sharing

and, pre-computed 2-D images of all compounds in DrugBank v5.1.7 here:

https://drive.google.com/file/d/11vSqg1SgX7y25TbX4EzNOjWNkSFVZzek/view?usp=sharing

ljh433 · 2020-12-11T07:38:52Z

Hi, we currently have a ready-to-use module inside the platform for molecule image drawing. However, we are using RDkit for this, as explained here:

https://www.rdkit.org/docs/GettingStartedInPython.html#drawing-molecules

Here is a piece to display the settings we used for molecule drawing over a simple example SMILES:

>>> IMG_SIZE = 200
>>> smiles="CCc1nc(N)nc(N)c1-c1ccc(Cl)cc1"
>>> mol = Chem.MolFromSmiles(smiles)
>>> d = rdMolDraw2D.MolDraw2DCairo(IMG_SIZE, IMG_SIZE)
>>> d.drawOptions().bondLineWidth = 1
>>> d.DrawMolecule(mol)
>>> d.FinishDrawing()
>>> d.WriteDrawingText('comp_id_2.png')

Also, please find the pre-computed 2-D images of all compounds in ChEMBL v27 here:

https://drive.google.com/file/d/16T8NI1Umf8A0qeLu90Akbx3ic-vdAbUO/view?usp=sharing

and, pre-computed 2-D images of all compounds in DrugBank v5.1.7 here:

https://drive.google.com/file/d/11vSqg1SgX7y25TbX4EzNOjWNkSFVZzek/view?usp=sharing

Thank you very much for your patient answer!

Bigrock-dd · 2021-01-11T02:52:48Z

Excuse me again. Does this model only need to prepare labeled data sets for each protein instead of inputting protein information?

tuncadogan · 2021-01-11T09:21:56Z

Excuse me again. Does this model only need to prepare labeled data sets for each protein instead of inputting protein information?

Yes, proteins are used as labels in DEEPScreen, and the actual input of the models are the 2-D images of the compounds. We train an individual classifier for each target protein, which allows us to optimize the model parameters specific to that protein.

Bigrock-dd · 2021-01-13T02:09:46Z

Excuse me again. Does this model only need to prepare labeled data sets for each protein instead of inputting protein information?

Yes, proteins are used as labels in DEEPScreen, and the actual input of the models are the 2-D images of the compounds. We train an individual classifier for each target protein, which allows us to optimize the model parameters specific to that protein.

Thank you very much for your patient answer!

tuncadogan closed this as completed Dec 2, 2020

tuncadogan reopened this Dec 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to predict DTI of new compounds with trained protein models？ #5

How to predict DTI of new compounds with trained protein models？ #5

Bigrock-dd commented Nov 23, 2020

Bigrock-dd commented Nov 23, 2020

tuncadogan commented Nov 29, 2020 •

edited

Loading

Bigrock-dd commented Nov 30, 2020

tuncadogan commented Nov 30, 2020

ljh433 commented Dec 9, 2020

tuncadogan commented Dec 10, 2020 •

edited

Loading

ljh433 commented Dec 11, 2020

Bigrock-dd commented Jan 11, 2021

tuncadogan commented Jan 11, 2021

Bigrock-dd commented Jan 13, 2021

How to predict DTI of new compounds with trained protein models？ #5

How to predict DTI of new compounds with trained protein models？ #5

Comments

Bigrock-dd commented Nov 23, 2020

Bigrock-dd commented Nov 23, 2020

tuncadogan commented Nov 29, 2020 • edited Loading

Bigrock-dd commented Nov 30, 2020

tuncadogan commented Nov 30, 2020

ljh433 commented Dec 9, 2020

tuncadogan commented Dec 10, 2020 • edited Loading

ljh433 commented Dec 11, 2020

Bigrock-dd commented Jan 11, 2021

tuncadogan commented Jan 11, 2021

Bigrock-dd commented Jan 13, 2021

tuncadogan commented Nov 29, 2020 •

edited

Loading

tuncadogan commented Dec 10, 2020 •

edited

Loading