You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, I really enjoyed reading the paper. Amazing work!
I have a question regarding the number of building blocks used for generating small molecules. Appendix A.3 of the paper states that there are a total of 105 unique building blocks (after accounting for different attachment points) and that they were obtained by the process suggested by the JT-VAE paper. (Jin et al. (2020)). However, in the JT-VAE paper, the total vocabulary size is $|\chi|=780$ obtained from the same ZINC dataset. My understanding is they are both the same. If that is correct, why are the number of building blocks different here? What am I missing? If they are not the same, can you please explain the difference?
Thank you so much for your help
The text was updated successfully, but these errors were encountered:
The building blocks in two papers are not the same but quite similar. In both cases we represent molecules as junction trees - that means there are no cycles. Ours are obtained by BRICS followed by Bemis-Murcko decomposition. Finally, we had a chemist who curated our set of building blocks. In the end, I think our building blocks ended up slightly smaller and more rigid compared to JT-VAE and worked better for us in practice.
Thank you. After performing the BRICS followed by Bemis-Murcko decomposition on the 250k SMILES dataset, I get 8962 unique building blocks. Can you please comment a bit more about the curation process? How did you narrow down to a smaller list of 105 building blocks?
Also, how did you determine the attachment points (block_r in data/blocks_PDB_105.json)?
Hello,
First, I really enjoyed reading the paper. Amazing work!
I have a question regarding the number of building blocks used for generating small molecules. Appendix A.3 of the paper states that there are a total of 105 unique building blocks (after accounting for different attachment points) and that they were obtained by the process suggested by the JT-VAE paper. (Jin et al. (2020)). However, in the JT-VAE paper, the total vocabulary size is$|\chi|=780$ obtained from the same ZINC dataset. My understanding is they are both the same. If that is correct, why are the number of building blocks different here? What am I missing? If they are not the same, can you please explain the difference?
Thank you so much for your help
The text was updated successfully, but these errors were encountered: