Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

an error in train_ligand_binging_model. #6

Open
yanbosmu opened this issue Jul 25, 2024 · 13 comments
Open

an error in train_ligand_binging_model. #6

yanbosmu opened this issue Jul 25, 2024 · 13 comments

Comments

@yanbosmu
Copy link

The first process of train VAE was successful.
But an error occurred when train_ligand_binging_model.

polygon train_ligand_binding_model --uniprot_id Q9Y572O --binding_db_path /home/yanbosmu/Bioinfo/polygon/data/output.csv --output_path /home/yanbosmu/Bioinfo/polygon/data/Q9Y572_ligand_binding.pkl
Traceback (most recent call last):
File "/home/yanbosmu/mambaforge/bin/polygon", line 8, in
sys.exit(main())
File "/home/yanbosmu/mambaforge/lib/python3.10/site-packages/polygon/run.py", line 849, in main
r = train_ligand_binding_model_main(args)
File "/home/yanbosmu/mambaforge/lib/python3.10/site-packages/polygon/run.py", line 810, in train_ligand_binding_model_main
train_ligand_binding_model( args.uniprot_id,
File "/home/yanbosmu/mambaforge/lib/python3.10/site-packages/polygon/utils/train_ligand_binding_model.py", line 17, in train_ligand_binding_model
binddb = pd.read_csv(binding_db_path, sep="\t",header=0,low_memory=False,error_bad_lines=False)
TypeError: read_csv() got an unexpected keyword argument 'error_bad_lines'

GPT said it was because that PANDAS 2.2 lack of error_bad_lines function. So I delete it in the "train_ligand_binding_model.py".
But then I got a new error listed below.

polygon train_ligand_binding_model --uniprot_id Q9Y572O --binding_db_path /home/yanbosmu/Bioinfo/polygon/data/output.csv --output_path /home/yanbosmu/Bioinfo/polygon/data/Q9Y572_ligand_binding.pkl
Traceback (most recent call last):
File "/home/yanbosmu/mambaforge/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
return self._engine.get_loc(casted_key)
File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'UniProt (SwissProt) Primary ID of Target Chain'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/yanbosmu/mambaforge/bin/polygon", line 8, in
sys.exit(main())
File "/home/yanbosmu/mambaforge/lib/python3.10/site-packages/polygon/run.py", line 849, in main
r = train_ligand_binding_model_main(args)
File "/home/yanbosmu/mambaforge/lib/python3.10/site-packages/polygon/run.py", line 810, in train_ligand_binding_model_main
train_ligand_binding_model( args.uniprot_id,
File "/home/yanbosmu/mambaforge/lib/python3.10/site-packages/polygon/utils/train_ligand_binding_model.py", line 20, in train_ligand_binding_model
d = binddb[binddb['UniProt (SwissProt) Primary ID of Target Chain']==target_unit_pro_id]
File "/home/yanbosmu/mambaforge/lib/python3.10/site-packages/pandas/core/frame.py", line 4102, in getitem
indexer = self.columns.get_loc(key)
File "/home/yanbosmu/mambaforge/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3812, in get_loc
raise KeyError(key) from err

any ideas or solution to this?
Is that because it was the newest pandas version I used?

@Feriolet
Copy link

Hi

I cant test your command line now, but I think there are several issue why yours is not working:

  1. For your first problem, as you mentioned, your pandas version may be too high from polygon's script that uses python 3.8 (the one I'm using right now).

  2. For your second problem, it may be because of your uniprot id, which cause the KeyError issue. Indeed, I tried to find your uniprot id online (Q9Y572O), but it does not exist. You can try double checking your uniprot id again.

@bpmunson
Copy link
Owner

As Feriolet mentioned, the uniprot ID "Q9Y572O" does not seem to be valid. What protein target are you attempting to train a model for?

This issue does highlight that POLYGON should be more graceful when invalid IDs are used.

Best,
Brenton

@yanbosmu
Copy link
Author

Thank you for your advices.
I used the correct uniprot ID "Q9Y572", and reinstall the pandas version 1.2.0. And also I changed to python 3.9
Which TypeError: read_csv() got an unexpected keyword argument 'error_bad_lines' is no longer exist.
But still I got

File "/home/yanbosmu/mambaforge/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
return self._engine.get_loc(casted_key)
File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'UniProt (SwissProt) Primary ID of Target Chain'

I believe that it was something wrong with my BindingDB files.
I download TSV files from BindingDB website and then transform into CSV files.

Can you share the CSV files in tutorial? That would help me find out the reason.
Thank you so much!

@yanbosmu
Copy link
Author

(/home/yanbosmu/your_path/polygonfinal) 20:15:44yanbosmu@Yanbosmu-PC:~/Bioinfo/polygonfinal/polygon$ polygon train_ligand_binding_model --uniprot_id Q9Y572 --binding_db_path /home/yanbosmu/Bioinfo/polygon/data/outputxx.csv --output_path /home/yanbosmu/Bioinfo/polygon/data/Q9Y572_ligand_binding.pkl
Traceback (most recent call last):
File "/home/yanbosmu/your_path/polygonfinal/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2898, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'UniProt (SwissProt) Primary ID of Target Chain'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/yanbosmu/your_path/polygonfinal/bin/polygon", line 8, in
sys.exit(main())
File "/home/yanbosmu/your_path/polygonfinal/lib/python3.9/site-packages/polygon/run.py", line 849, in main
r = train_ligand_binding_model_main(args)
File "/home/yanbosmu/your_path/polygonfinal/lib/python3.9/site-packages/polygon/run.py", line 810, in train_ligand_binding_model_main
train_ligand_binding_model( args.uniprot_id,
File "/home/yanbosmu/your_path/polygonfinal/lib/python3.9/site-packages/polygon/utils/train_ligand_binding_model.py", line 20, in train_ligand_binding_model
d = binddb[binddb['UniProt (SwissProt) Primary ID of Target Chain']==target_unit_pro_id]
File "/home/yanbosmu/your_path/polygonfinal/lib/python3.9/site-packages/pandas/core/frame.py", line 2906, in getitem
indexer = self.columns.get_loc(key)
File "/home/yanbosmu/your_path/polygonfinal/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2900, in get_loc
raise KeyError(key) from err
KeyError: 'UniProt (SwissProt) Primary ID of Target Chain'

@Feriolet
Copy link

Feriolet commented Jul 26, 2024

Can I know which BindingDB file you downloaded? You do not need to convert the TSV to CSV because the script split the files based on "tab" (i.e., TSV).

I used the BindingDB_All_202407.tsv file in the BindingDB website.

If you still has the KeyError issue, it may mean that the BindigDB may not have strong ligand that binds to your Q9Y572 protein.

However, that would be odd because using the BindingDB website found ligands binding to the uniprot ID, filtered from IC50 <= 1000 nM
(Sorry for the bad image)
image

@yanbosmu
Copy link
Author

Than

Can I know which BindingDB file you downloaded? You do not need to convert the TSV to CSV because the script split the files based on "tab" (i.e., TSV).

I used the BindingDB_All_202407.tsv file in the BindingDB website.

If you still has the KeyError issue, it may mean that the BindigDB may not have strong ligand that binds to your Q9Y572 protein.

However, that would be odd because using the BindingDB website found ligands binding to the uniprot ID, filtered from IC50 <= 1000 nM (Sorry for the bad image) image

Thank you so much ~!!!!!!!!!!!!!!!!!!!!
YES, It's because the bindingDB file, I just download Q9Y572 related ligands in this website.
After I using BindingDB_All_202407.tsv you mentioned, It works out just fine!!!!

Thank you !

@DM0815
Copy link

DM0815 commented Aug 12, 2024

Than

Can I know which BindingDB file you downloaded? You do not need to convert the TSV to CSV because the script split the files based on "tab" (i.e., TSV).
I used the BindingDB_All_202407.tsv file in the BindingDB website.
If you still has the KeyError issue, it may mean that the BindigDB may not have strong ligand that binds to your Q9Y572 protein.
However, that would be odd because using the BindingDB website found ligands binding to the uniprot ID, filtered from IC50 <= 1000 nM (Sorry for the bad image) image

Thank you so much ~!!!!!!!!!!!!!!!!!!!! YES, It's because the bindingDB file, I just download Q9Y572 related ligands in this website. After I using BindingDB_All_202407.tsv you mentioned, It works out just fine!!!!

Thank you !

Excuseme, I used another Protein ID to generate. code is ok . But It didnot generate any pkl file. Do you meet this question? In the process, it will remind that these warns:"expected 194 fields, saw 266\nSkipping line 2874651: expected 194 fields, saw 266\nSkipping line 2874652: expected 194 fields, saw 266\n'" Do you have some suggestions? Thanks.

@yanbosmu
Copy link
Author

just ignore those warnings. I also saw those errors, but still got hte pkl files

@Feriolet
Copy link

Yepp, I also ignore these warnings and still got the pkl result

@DM0815
Copy link

DM0815 commented Aug 15, 2024

just ignore those warnings. I also saw those errors, but still got hte pkl files

@Feriolet Dear all, I solved the problem because I used my proteinID by revising the script. But in the last step Use the chemical embedding to design polypharmacology compounds, I met the another question, errors as follows:
File "/POLYGON/lib/python3.9/site-packages/torch/nn/utils/rnn.py", line 482, in pack_sequence
return pack_padded_sequence(pad_sequence(sequences), lengths, enforce_sorted=enforce_sorted)
File "
/POLYGON/lib/python3.9/site-packages/torch/nn/utils/rnn.py", line 397, in pad_sequence
return torch._C._nn.pad_sequence(sequences, batch_first, padding_value)
RuntimeError: received an empty list of sequences.

I was wondering if you have had the same problem and how you solved it. Thanks.

@Feriolet
Copy link

Feriolet commented Aug 16, 2024

Can you send the full error message? The logs that you sent is only from the 'torch' package, not the polygon package. The Error indicates that there is no torch tensor for the pack_sequence function. Please see the documentation here: https://pytorch.org/docs/stable/generated/torch.nn.utils.rnn.pack_sequence.html

The error can be reproduced by the following:

>>> from torch.nn.utils.rnn import pack_sequence
>>> pack_sequence(torch.tensor([]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/user/miniforge3/envs/envname/lib/python3.9/site-packages/torch/nn/utils/rnn.py", line 484, in pack_sequence
    return pack_padded_sequence(pad_sequence(sequences), lengths, enforce_sorted=enforce_sorted)
  File "/Users/user/miniforge3/envs/envname/lib/python3.9/site-packages/torch/nn/utils/rnn.py", line 398, in pad_sequence
    return torch._C._nn.pad_sequence(sequences, batch_first, padding_value)
RuntimeError: received an empty list of sequences

@DM0815
Copy link

DM0815 commented Aug 17, 2024

Can you send the full error message? The logs that you sent is only from the 'torch' package, not the polygon package. The Error indicates that there is no torch tensor for the pack_sequence function. Please see the documentation here: https://pytorch.org/docs/stable/generated/torch.nn.utils.rnn.pack_sequence.html

The error can be reproduced by the following:

>>> from torch.nn.utils.rnn import pack_sequence
>>> pack_sequence(torch.tensor([]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/user/miniforge3/envs/envname/lib/python3.9/site-packages/torch/nn/utils/rnn.py", line 484, in pack_sequence
    return pack_padded_sequence(pad_sequence(sequences), lengths, enforce_sorted=enforce_sorted)
  File "/Users/user/miniforge3/envs/envname/lib/python3.9/site-packages/torch/nn/utils/rnn.py", line 398, in pad_sequence
    return torch._C._nn.pad_sequence(sequences, batch_first, padding_value)
RuntimeError: received an empty list of sequences

Yes I'm very confused. The full error messages as follows:
2024-08-17 16:22:55,376 [DEBUG ] Making scoring function,
fpscores.pkl.gz
fpscores.pkl.gz
[16:22:55] Explicit valence for atom # 0 N, 4, is greater than permitted
/home/dm/anaconda3/envs/py3.9/lib/python3.9/site-packages/polygon/utils/utils.py:82: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
model.load_state_dict(torch.load(model_definition, map_location="cpu"))
Traceback (most recent call last):
File "/home/dm/anaconda3/envs/py3.9/bin/polygon", line 8, in
sys.exit(main())
File "/home/dm/anaconda3/envs/py3.9/lib/python3.9/site-packages/polygon/run.py", line 841, in main
generate_main(args)
File "/home/dm/anaconda3/envs/py3.9/lib/python3.9/site-packages/polygon/run.py", line 658, in generate_main
scoring_function = build_scoring_function(
File "/home/dm/anaconda3/envs/py3.9/lib/python3.9/site-packages/polygon/utils/utils.py", line 293, in build_scoring_function
scorers[name] = LatentDistance( smiles_targets=smiles_targets,
File "/home/dm/anaconda3/envs/py3.9/lib/python3.9/site-packages/polygon/utils/custom_scoring_fcn.py", line 154, in init
self.z_targets = self.model.encode(self.x_targets)
File "/home/dm/anaconda3/envs/py3.9/lib/python3.9/site-packages/polygon/vae/vae_model.py", line 179, in encode
z, kl_loss, mu = self.forward_encoder(x, return_mu=True)
File "/home/dm/anaconda3/envs/py3.9/lib/python3.9/site-packages/polygon/vae/vae_model.py", line 225, in forward_encoder
x = nn.utils.rnn.pack_sequence(x)
File "/home/dm/anaconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/utils/rnn.py", line 482, in pack_sequence
return pack_padded_sequence(pad_sequence(sequences), lengths, enforce_sorted=enforce_sorted)
File "/home/dm/anaconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/utils/rnn.py", line 397, in pad_sequence
return torch._C._nn.pad_sequence(sequences, batch_first, padding_value)
RuntimeError: received an empty list of sequences

I donnot sure which step is wrong. In the scoring_definition.csv, the pkl file and smi file should be matched? I think my files are matched well. Does a pre-trained model cause this error ?

@Feriolet
Copy link

Yes, the directory you put in the scoring_definition.csv should match the corresponding target of your interest. I am not using POLYGON anymore, so I cant try to reproduce your error. My wild guess is that there is no potent ligand available in BindingDB.tsv file. Can you try to see if there is a potent ligand in the BindingDB website if it is true?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants