Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty dataframe output when generating interaction fingerprints #87

Closed
shinoxide opened this issue Oct 23, 2022 · 7 comments
Closed

Empty dataframe output when generating interaction fingerprints #87

shinoxide opened this issue Oct 23, 2022 · 7 comments

Comments

@shinoxide
Copy link

Thank you for your efforts. Prolif is quite easy to install.
Please the issue I have is that I keep getting empty dataframe output when I generate the fingerprints and read into pandas dataframe. My protein and ligands are okay, I confirmed interactions elsewhere. Please kindly help me out. See image please, thank you

prolif_issue_095335

@cbouy
Copy link
Member

cbouy commented Oct 23, 2022

Hi @shinoxide ,

Does your protein PDB file have explicit hydrogens on all atoms? It's a requirement for ProLIF to work correctly. If not, you can use webservers like PypKa or PropKa (also available as command line tools) for this matter.

Another possible explanation would be that your protein PDB file has CONECT records for some of the bonds but not all, in which case you might want to load the PDB file with:

prot = mda.Universe("myprot2.pdb", guess_bonds=True)

Tell me if this works.

Best,
Cédric

@shinoxide
Copy link
Author

Hi Cédric,

This is great! It worked! The "guess_bonds=True" worked for me. See picture. Thank you very much 😊 I'm grateful.
prolif_solution_011410

Adeshina

@shinoxide shinoxide reopened this Oct 24, 2022
@shinoxide
Copy link
Author

shinoxide commented Oct 24, 2022

Hi @shinoxide ,

Does your protein PDB file have explicit hydrogens on all atoms? It's a requirement for ProLIF to work correctly. If not, you can use webservers like PypKa or PropKa (also available as command line tools) for this matter.

Another possible explanation would be that your protein PDB file has CONECT records for some of the bonds but not all, in which case you might want to load the PDB file with:

prot = mda.Universe("myprot2.pdb", guess_bonds=True)

Tell me if this works.

Best, Cédric

Hi Cédric

Thanks again for your response. I have 2 questions please.

  1. Can I very much rely on the accuracy of guess_bonds?
  2. Do you know tools I can use to capture CONECT records for all bonds?
    Thank you.

@cbouy
Copy link
Member

cbouy commented Oct 24, 2022

  1. Can I very much rely on the accuracy of guess_bonds?

MDAnalysis uses the same algorithm as VMD and many other tools for guessing bonds based on 3D coordinates and elements. so although it's not going to be completely failure-proof, I'd say it's pretty safe.
Also when you run plf.Molecule.from_mda(prot) it actually converts the full protein into an RDKit molecule, and RDKit will complain if it sees atoms with incorrect valences, so if there's a problem during bond guessing it will very likely result in an error on the RDKit side and you will know immediatly.

  1. Do you know tools I can use to capture CONECT records for all bonds?

I'm not sure I understand the question, do you mean extracting all the lines that start with CONECT in your PDB file?
If so, on a Linux/Mac shell, the command grep 'CONECT ' myprot2.pdb should do the trick.

@shinoxide
Copy link
Author

  1. Can I very much rely on the accuracy of guess_bonds?

MDAnalysis uses the same algorithm as VMD and many other tools for guessing bonds based on 3D coordinates and elements. so although it's not going to be completely failure-proof, I'd say it's pretty safe. Also when you run plf.Molecule.from_mda(prot) it actually converts the full protein into an RDKit molecule, and RDKit will complain if it sees atoms with incorrect valences, so if there's a problem during bond guessing it will very likely result in an error on the RDKit side and you will know immediatly.

  1. Do you know tools I can use to capture CONECT records for all bonds?

I'm not sure I understand the question, do you mean extracting all the lines that start with CONECT in your PDB file? If so, on a Linux/Mac shell, the command grep 'CONECT ' myprot2.pdb should do the trick.

Thank you very much for your response.
For my second question... You said earlier that it is possible that my protein PDB file doesn't have CONECT records for all bonds and indeed I have only two lines starting with 'CONECT' in the file. So I thought there might be ways to generate complete CONECT records.

@cbouy
Copy link
Member

cbouy commented Oct 25, 2022

MDAnalysis and RDKit can both read and write PDB files and have that functionality, VMD and probably PyMol or Avogadro also have that.

@shinoxide
Copy link
Author

Thank you very much for your support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants