-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pdb_assembly.json does not agree with train_multi_label.json #114
Comments
check the mmcif of 7l89, you'll see that H and L chains have no valid sequences. |
I see. Then why are H and L in the pub_assembly.json? |
@henrywotton the pdb assembly is from the website (https://github.com/dptech-corp/Uni-Fold/blob/main/scripts/get_pdb_assembly.py), therefore, we cannot filter the chains. refer to this line: Uni-Fold/scripts/get_pdb_assembly.py Line 59 in ea1a169
|
I see. With current pdb_assembly.json, it will search for corresponding labels within train_multi_label.json and give key not found error. I have filtered pdb_assembly.json myself to solve the issue. Would you like me to upload the filtered version of pdb_assembly.json? |
Hi, Just in case anyone else also has the same issue, I have uploaded the Cheers |
thx for the contribution, while I think a run-time filtering of pdb_assemblies would be better. this is done in unifold multimer dataset in #119 . |
Hi,
There are some entries in pdb_assembly.json that contains chains which are not listed in train_multi_label.json. Thus, the programme reports a key not found error. For example, in pdb_assembly.json, 7l89 has:
{'symbol': 'C1', 'stoi': ['A3', 'B3', 'C2'], 'chains': ['F', 'D', 'C', 'E', 'B', 'A', 'H', 'L'], 'opers': ['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']}
but in train_multi_label.json dictionary, only
7l8d_B
and7l87_C
have chains A, B, C, D, E, and F from7l89
in their values. There are no records for 7l89 H or L in the train_multi_label.jsonI've added some extra checking codes to dataset.py myself and now the programme works but I suppose it shouldn't be like this? I believe either pdb_assembly.json or train_multi_label is incorrect?
Cheers
The text was updated successfully, but these errors were encountered: