Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional Generation based on Subgraph #15

Closed
twidatalla opened this issue Mar 1, 2023 · 3 comments
Closed

Conditional Generation based on Subgraph #15

twidatalla opened this issue Mar 1, 2023 · 3 comments

Comments

@twidatalla
Copy link

Hello,

I work in drug discovery and am very interested in the application of this model in the generation of drug-molecules which contain a predefined motif, as you demonstrate in Appendix E. Would you be able to share a code example of the node and edge feature masking for motif preserving generation?

Wouldn't this require retraining the model on molecules with the given motif such that the noise model preserves the motif during diffusion? From this I could see how by masking/disallowing transitions for edges and nodes in the motif during denoising and letting everything else denoise regularly would result in structures which extend the motif.... or maybe I'm thinking about this wrong and no retraining is needed?

Best,
Talal

@cvignac
Copy link
Owner

cvignac commented Mar 2, 2023

Hello Talal,

The code is based on an old version of the files, but in sample_zs_given_zt we add something like this:

        if self.cfg.scaffold_extension.use:
            # scaffold extension mask operation
            graph_scaffold = self.graph_from_scaffold(scaffold_smile='C1C=CNC2=CC=CC=C21')
            dense_data_scaffold, node_mask_scaffold = utils.to_dense(graph_scaffold.x, graph_scaffold.edge_index,
                                                                     graph_scaffold.edge_attr, graph_scaffold.batch)
            X_scaffold, E_scaffold = dense_data_scaffold.X, dense_data_scaffold.E
            n_nodes_scaffold = X_scaffold.shape[1]

            sampled_s.X[:, :n_nodes_scaffold] = X_scaffold.argmax(-1)
            sampled_s.E[:, :n_nodes_scaffold, :n_nodes_scaffold] = E_scaffold.argmax(-1)

It would probably work better if we preserved the motif during diffusion in training, as was done in https://arxiv.org/abs/2210.05274 for 3D point clouds. As you can see in the figures, the results of our method are not great. We wanted to showcase that substructure conditioning is possible, but we didn't spend much time on it.

Another option that does not involve retraining is to adapt the proposition of RePaint to graphs:
https://arxiv.org/abs/2201.09865 and http://arxiv.org/abs/2302.01217

Best,
Clement

@twidatalla
Copy link
Author

Hi Clement,

Thank you for the response, I didn't realize the substructure conditioning wasn't one of you focuses, so thank you for referencing the other projects.

Looking at the script I can understand how your approach works so thank you for that as well. I can see why other approaches may be better. I recommend doing more work on this task if it suits your interests in the future however because it's quite relevant for drug design and a needed tool, a lot of the time there is an idea of the interactions/motifs desired with a target and we want to generate compounds from that, or we have a compound already and want to generate different components.

Excellent Paper!
Talal

@xinyangATK
Copy link

xinyangATK commented Aug 27, 2023

Hello Talal,

The code is based on an old version of the files, but in sample_zs_given_zt we add something like this:

        if self.cfg.scaffold_extension.use:
            # scaffold extension mask operation
            graph_scaffold = self.graph_from_scaffold(scaffold_smile='C1C=CNC2=CC=CC=C21')
            dense_data_scaffold, node_mask_scaffold = utils.to_dense(graph_scaffold.x, graph_scaffold.edge_index,
                                                                     graph_scaffold.edge_attr, graph_scaffold.batch)
            X_scaffold, E_scaffold = dense_data_scaffold.X, dense_data_scaffold.E
            n_nodes_scaffold = X_scaffold.shape[1]

            sampled_s.X[:, :n_nodes_scaffold] = X_scaffold.argmax(-1)
            sampled_s.E[:, :n_nodes_scaffold, :n_nodes_scaffold] = E_scaffold.argmax(-1)

It would probably work better if we preserved the motif during diffusion in training, as was done in https://arxiv.org/abs/2210.05274 for 3D point clouds. As you can see in the figures, the results of our method are not great. We wanted to showcase that substructure conditioning is possible, but we didn't spend much time on it.

Another option that does not involve retraining is to adapt the proposition of RePaint to graphs: https://arxiv.org/abs/2201.09865 and http://arxiv.org/abs/2302.01217

Best, Clement

Hi Clement,

Recently I am working in drug discovery, especially small molecule generation. I found the 'substructure conditioned generation' in Appendix E. Thank you for giving such script to show this function, but it still has a little difficulty in reproducing this function with DiGress, especially self.graph_from_scaffold. Could you share detailed instruction or code, that will really help me.

Thanks!
Xinyang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants