Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdb_assembly.json does not agree with train_multi_label.json #114

Closed
dingquanyu opened this issue Apr 16, 2023 · 6 comments
Closed

pdb_assembly.json does not agree with train_multi_label.json #114

dingquanyu opened this issue Apr 16, 2023 · 6 comments

Comments

@dingquanyu
Copy link

Hi,

There are some entries in pdb_assembly.json that contains chains which are not listed in train_multi_label.json. Thus, the programme reports a key not found error. For example, in pdb_assembly.json, 7l89 has: {'symbol': 'C1', 'stoi': ['A3', 'B3', 'C2'], 'chains': ['F', 'D', 'C', 'E', 'B', 'A', 'H', 'L'], 'opers': ['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']}
but in train_multi_label.json dictionary, only 7l8d_B and 7l87_C have chains A, B, C, D, E, and F from 7l89 in their values. There are no records for 7l89 H or L in the train_multi_label.json

I've added some extra checking codes to dataset.py myself and now the programme works but I suppose it shouldn't be like this? I believe either pdb_assembly.json or train_multi_label is incorrect?

Cheers

@ZiyaoLi
Copy link
Member

ZiyaoLi commented May 9, 2023

check the mmcif of 7l89, you'll see that H and L chains have no valid sequences.

@dingquanyu
Copy link
Author

I see. Then why are H and L in the pub_assembly.json?

@guolinke
Copy link
Member

guolinke commented May 9, 2023

@henrywotton the pdb assembly is from the website (https://github.com/dptech-corp/Uni-Fold/blob/main/scripts/get_pdb_assembly.py), therefore, we cannot filter the chains.

refer to this line:

url = f"https://data.rcsb.org/rest/v1/core/assembly/{name}/1"

@dingquanyu
Copy link
Author

I see. With current pdb_assembly.json, it will search for corresponding labels within train_multi_label.json and give key not found error. I have filtered pdb_assembly.json myself to solve the issue. Would you like me to upload the filtered version of pdb_assembly.json?

@dingquanyu
Copy link
Author

Hi,

Just in case anyone else also has the same issue, I have uploaded the pdb_assembly.json to owncloud after I filtered it by myself here. It solved the error for me and please let me know if it also works for you.

Cheers

@ZiyaoLi
Copy link
Member

ZiyaoLi commented May 11, 2023

thx for the contribution, while I think a run-time filtering of pdb_assemblies would be better. this is done in unifold multimer dataset in #119 .

@ZiyaoLi ZiyaoLi closed this as completed May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants