Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the quality of generated molecules using "sample_for_pocket.py" #1

Closed
jack-cadd opened this issue Jun 27, 2024 · 9 comments

Comments

@jack-cadd
Copy link

jack-cadd commented Jun 27, 2024

Hello, first of all, thank you very much for making such a valuable research publicly available. I have some questions regarding the application of molecular generation to new proteins:
I used the "sample_for_pocket.py" script and the "last.ckpt" model provided by you to generate molecules for the CDK6 protein 4aua in the test set, using the default.yaml configuration file. However, I found that the generated molecular structures seem to have some issues - most of them are molecules formed purely by saturated carbon atoms.
I would like to confirm if the result of the generated molecules is the test_outputs/generated.pt file? If so, are there any additional parameters that need to be considered during the sampling process?

commands:
python sample_for_pocket.py --protein_path data/test_set/CDK6_HUMAN_1_312_0/4aua_A_rec.pdb --ligand_path data/test_set/CDK6_HUMAN_1_312_0/4aua_A_rec_4aua_4au_lig_it2_tt_docked_7.sdf --ckpt_path checkpoints/last.ckpt --num_samples 100 --batch_size 10

@jack-cadd jack-cadd closed this as not planned Won't fix, can't repro, duplicate, stale Jun 27, 2024
@jack-cadd jack-cadd reopened this Jun 27, 2024
@Atomu2014
Copy link
Collaborator

Hi,

I've tested this pocket via our temporary demo page: https://a0d841047576540970.gradio.live/. Everything looks good.

This is the reference:
image

This is one generated:
image

And the metrics:
image

Please try this demo and test any pocket of your interests, and tell me if the model generate normal outputs.

I think sample_for_pocket.py contains some bugs, and I'll update soon.

Thanks,
Yanru

@jack-cadd
Copy link
Author

Thank you so much, I will try that. And I'm looking forward to the updation of the sample script.

@jack-cadd
Copy link
Author

I have tried this demo on 4AUA, and the result looks like very nice (especially: O=C1NC=C2C=CC=CC2=C1NC3=CC(CNC(C4CSC(C(N)=[NH2+])C4)=O)=CC=C3, I can't paste picture。>...<). If the output poses is generated by the model directly rather than from docking, that means the model may capture the key interactions well.
Another little question: Is the checkpoint in the demo same as public one (last.ckpt)?

Thanks,
Chenglong

@Atomu2014
Copy link
Collaborator

I have tried this demo on 4AUA, and the result looks like very nice (especially: O=C1NC=C2C=CC=CC2=C1NC3=CC(CNC(C4CSC(C(N)=[NH2+])C4)=O)=CC=C3, I can't paste picture。>...<). If the output poses is generated by the model directly rather than from docking, that means the model may capture the key interactions well. Another little question: Is the checkpoint in the demo same as public one (last.ckpt)?

Thanks, Chenglong

Yes, the output pose is directly generated without redocking. As you can see in our paper, our model achieves the best Vina Score without redocking. We believe that Vina Score can better reflect the quality of generative models.

Yes, exactly the same checkpoint.

Thanks,
Yanru

@qky18
Copy link
Contributor

qky18 commented Jun 28, 2024

Hi Chenglong, thanks again for experimenting with our codebase!
Another quick fix might be to adopt the train_bfn.py script instead of sample_for_pocket.py, given that we've generated all our samples from the former script.
All you need to do is (1) slightly modify the https://github.com/AlgoMole/MolCRAFT/blob/master/train_bfn.py#L70 into:

debug_set_val = torch.utils.data.Subset(val_set, [87] * 100) 
# here 100 means the number of samples so --num_samples will be 1
# you can also set the list to [87] and --num_samples 100

since CDK6_HUMAN_1_312_0/4aua_A_rec_4aua_4au_lig_it2_tt_docked_7_pocket10.pdb corresponds to pocket id = 87 in the test set. (You can check the molecule property _Name to see if it matches CDK6_HUMAN_1_312_0/4aua_A_rec_4aua_4au_lig_it2_tt_docked_7.sdf.)

And (2) try some command for train_bfn.py with the debug mode turned on, for example

python train_bfn.py --test_only --debug --num_samples 1 --batch_size 10 --no_wandb --ckpt_path ./checkpoints/last.ckpt

@Atomu2014
Copy link
Collaborator

I have tried this demo on 4AUA, and the result looks like very nice (especially: O=C1NC=C2C=CC=CC2=C1NC3=CC(CNC(C4CSC(C(N)=[NH2+])C4)=O)=CC=C3, I can't paste picture。>...<). If the output poses is generated by the model directly rather than from docking, that means the model may capture the key interactions well. Another little question: Is the checkpoint in the demo same as public one (last.ckpt)?

Thanks, Chenglong

Hi, I've testes train_bfn.py, and it works well. You canfollow qky18's comment and try train_bfn.py before I fix sample_for_pocket.py.

Thanks,
Yanru

@jack-cadd
Copy link
Author

Oh, thanks for your solutions @qky18 @Atomu2014 , I will try that.

@Atomu2014
Copy link
Collaborator

Oh, thanks for your solutions @qky18 @Atomu2014 , I will try that.

Hi, sample_for_pocket_v2.py and app.py have been updated. Follow the instructions in README to see how to sample and host our demo locally.

Thanks,
Yanru

@jack-cadd
Copy link
Author

jack-cadd commented Jul 4, 2024

Oh, thanks for your solutions @qky18 @Atomu2014 , I will try that.

Hi, sample_for_pocket_v2.py and app.py have been updated. Follow the instructions in README to see how to sample and host our demo locally.

Thanks, Yanru

Thank you, Yanru. I have noticed this updating and applied it to my custom object. And I think this work is a breakthrough in the area of pocket-based molecular generation. Congratulations!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants