Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Want to know the detailed preparation process of dataset #18

Open
G1NO3 opened this issue Mar 31, 2024 · 2 comments
Open

Want to know the detailed preparation process of dataset #18

G1NO3 opened this issue Mar 31, 2024 · 2 comments

Comments

@G1NO3
Copy link

G1NO3 commented Mar 31, 2024

Hi I just want to use GFlowNet for another protein pocket. Now I have a dataset of SMILES and docking scores, but I'm not very sure about the rest of the preparation process of the dataset. For example, if you curate the result from the BRICS algorithm, then how you process the blocks that do not emerge in the block dictionary? And do you have a script for the generation of "jbonds" and "stem_idx"? I'd appreciate it if you could provide some! Thanks!

@bengioe
Copy link
Collaborator

bengioe commented Apr 1, 2024

As per @MKorablyov's answer in #9, this involved some manual intervention from a chemist, I'm afraid the details are lost to time, but I'll try digging...

@G1NO3
Copy link
Author

G1NO3 commented Apr 8, 2024

Thanks! I've followed the steps in #9 and get a block dictionary myself. But the next question is, how can we determine the block_r? It seems that it's not merely the "reaction site" of a block. For example, mol 9 and 10 are both pyridine but the block_r[9] is [0,1,2,4,5] and block_r[10] is [1,0,2,4,5]. Does the order matter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants