Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lists of PDB Ids for training and testing #5

Closed
jadolfbr opened this issue Jun 14, 2022 · 4 comments
Closed

Lists of PDB Ids for training and testing #5

jadolfbr opened this issue Jun 14, 2022 · 4 comments

Comments

@jadolfbr
Copy link

jadolfbr commented Jun 14, 2022

Hello,

I am wondering if you have a list of PDB Ids + chains used to train and validate the model. These would be useful for comparisons to other methods and general benchmarking. I could not find these listed in the supplemental.

The list of proteins/chains would be the 25k training clusters, 402 monomer backbones, and then the last set of 690 monomers, 732 homomers (with less than 2000 residues), and 98 heteromers described in the paper.

Would it be possible to provide these? Thanks!

@dauparas
Copy link
Owner

Hello,

Yes, we are working on making a training/validation/test data for PDB biounits that were used to train ProteinMPNN publicly available. I will let you know when it is ready!

@jadolfbr
Copy link
Author

Sounds great! Thanks!

@Z-MU-Z
Copy link

Z-MU-Z commented Apr 12, 2023

thanks for the interesting work! I am waiting for this dataset split so long. how is the progress? thank you very much

@dauparas
Copy link
Owner

See valid and test clusters here: https://github.com/dauparas/ProteinMPNN/tree/main/training

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants