Is there any alignment files to download? #113

Zhang690683220 · 2022-06-06T19:09:16Z

Hi,

We're trying to reproduce the training process. However, the alignment seems to take extremely long time.

We used 128 nodes to align 128 mmcif files (1 file on each node), but it took 13 hours to finish the entire job.

I'm wondering if there is tar file that already aligned all mmcif files for us to download which will helps a lot.

Thanks

gahdritz · 2022-06-06T21:42:08Z

There will be approximately one week from now, when we release our full training data. Stay tuned.

llwx593 · 2022-07-15T05:20:21Z

Hi,
I want to know whether the alignment files of full training data sets have been published.
I don't seem to have found them.
Thanks

gahdritz · 2022-07-15T13:46:08Z

Yes they have. See the RODA link in the README.

llwx593 · 2022-07-20T08:02:35Z

Hi,
Thank you very much for the training data。But I have some following questions：

RODA contains two dir, one is the training data after PDB dataset alignment, and the other is the training data after uniclust30 self distillation dataset alignment. Don't know if I understand it correctly.
Whether the training data after PDB dataset alignment is same with pdb_mmcif/mmcif_ files through scripts/precompute_ alignments.py processing? If so, should I set mmcif_dir to pdb_mmcif/mmcif_files，alignment_dir to RODA_PATH/pdb/，template_mmcif_dir to pdb_mmcif/mmcif_files？
Thanks

gahdritz · 2022-07-20T15:42:18Z

Correct. You can simultaneously use the distillation data using the --distillation... flags and the predicted structures uploaded to RODA.

llwx593 · 2022-07-21T09:33:45Z

Thank you for your answer. I try to run the training scripts using above method, but I got the “StopIteration Eexception”. The following figure is the location of the exception(in openfold/data/data_modules.py):
$3`@B3DI_LYSDUVS1OTI{IFH$
The value of "flag1" just become 1, then the exception throw. I already check the RODA_PATH/pdb/ is normal. I don't know what else can cause this error.

gahdritz · 2022-07-21T18:46:38Z

Could you print out "self.probabilities" for me?

llwx593 · 2022-07-22T01:06:21Z

The value of "self.probabilities" is 1.

The file structure of the RODA data is:
RODAPATH
--pdb
----101m_A
------a3m
--------bfd_uniclust_hits.a3m, mgnify_hits.a3m, uniref90_hits.a3
------hhr
--------pdb70_hits.hhr
----uniclust30

llwx593 · 2022-07-22T03:12:51Z

I found that some chain in RODAPATH/pdb does not exist in pdb_mmcif/mmcif_files/. This will lead to keyerror when query cache with chain id of RODAPATH/pdb. Maybe my mmcif_files/ is different from yours. Can I simply delete these nonexistent chain? I'll try to see if it can be trained normally. If it can, will it affect the accuracy？
Thanks.

gahdritz · 2022-07-22T12:15:47Z

I see now. Since the RODA data is supposed to be generally applicable, it has a slightly different format than that expected by the OF dataloaders. For OF's sake, you should flatten the intermediate a3m and hhr directories, putting all .a3m and .hhr files directly in directories corresponding to the individual chains. So e.g.

alignment_dir/
---101m_A
------msa_1.a3m
------msa_2.a3m
------template_hits.hhr
---next_chain
------ etc.

If you also want to use the distillation set in uniclust30/, you should similarly flatten the file format directories.

llwx593 · 2022-07-29T01:25:08Z

Thanks for your reply. I tried to skip the nonexistent Chain ID and found that it could train normally, even if I didn't flatten the data.

gahdritz · 2022-07-29T03:31:50Z

It's important that you flatten the data, or the model is going to run with empty MSAs and templates. It doesn't know how to read un-flattened data like you have.

llwx593 · 2022-07-29T03:40:15Z

Oh, maybe I run with empty MSAs and templates. I will try to flatten the data. Thanks.

gahdritz closed this as completed Jun 6, 2022

NZ99 mentioned this issue Jul 28, 2022

Training Runtime Error: StopIteration #132

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any alignment files to download? #113

Is there any alignment files to download? #113

Zhang690683220 commented Jun 6, 2022

gahdritz commented Jun 6, 2022

llwx593 commented Jul 15, 2022

gahdritz commented Jul 15, 2022

llwx593 commented Jul 20, 2022

gahdritz commented Jul 20, 2022 •

edited

llwx593 commented Jul 21, 2022

gahdritz commented Jul 21, 2022

llwx593 commented Jul 22, 2022

llwx593 commented Jul 22, 2022

gahdritz commented Jul 22, 2022

llwx593 commented Jul 29, 2022

gahdritz commented Jul 29, 2022

llwx593 commented Jul 29, 2022

Is there any alignment files to download? #113

Is there any alignment files to download? #113

Comments

Zhang690683220 commented Jun 6, 2022

gahdritz commented Jun 6, 2022

llwx593 commented Jul 15, 2022

gahdritz commented Jul 15, 2022

llwx593 commented Jul 20, 2022

gahdritz commented Jul 20, 2022 • edited

llwx593 commented Jul 21, 2022

gahdritz commented Jul 21, 2022

llwx593 commented Jul 22, 2022

llwx593 commented Jul 22, 2022

gahdritz commented Jul 22, 2022

llwx593 commented Jul 29, 2022

gahdritz commented Jul 29, 2022

llwx593 commented Jul 29, 2022

gahdritz commented Jul 20, 2022 •

edited