Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new helper functions to PDBManager #322

Merged
merged 14 commits into from May 25, 2023
Merged

Add new helper functions to PDBManager #322

merged 14 commits into from May 25, 2023

Conversation

amorehead
Copy link
Contributor

@amorehead amorehead commented May 24, 2023

What does this implement/fix? Explain your changes

Adds some new helper functions to the PDBManager to allow one to work more easily with heterogeneous protein complexes.

What testing did you do to verify the changes in this PR?

I evaluated these functions in a local copy of this repository.

Pull Request Checklist

  • Added a note about the modification or contribution to the ./CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./graphein/tests/* directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under ./notebooks/ (if applicable)
  • Ran python -m py.test tests/ and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., python -m py.test tests/protein/test_graphs.py)
  • Checked for style issues by running black . and isort .

Copy link
Owner

@a-r-j a-r-j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (less changelog)!

assert (
len(filtered_pdb) > 0
), "Filtered DataFrame must contain atoms."
if "ATOM" in key and len(filtered_pdb) == 0:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be being a little pedantic here but I see two possible edge cases with this:

  1. Sometimes protein atoms are stored as HETATMs (typically modified residues but this kind of bad practice does happen as an abuse of the PDB format to suit some niche way to store structure data)
  2. Similarly, what if the desired selection is actually the HETATM data? Protein-nucleic acid complexes or protein-peptide complexes may store the ligand as a HETATM

Copy link
Contributor Author

@amorehead amorehead May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. Points 1 and 2 didn't come to my mind when I originally implemented this, and I kept expanding on it until now. In light of these points, I think it makes more sense to avoid skipping such DataFrames altogether. It will then be the user's responsibility to "vet" the exported PDB files for these kinds of edge cases in their selected PDBs. Better yet, we can simply issue a warning to users that no "standard" atoms were found post-filtering. However, we would then still export the PDB complex as requested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified this logic to only issue a warning in such an edge case now.

@codecov-commenter
Copy link

codecov-commenter commented May 25, 2023

Codecov Report

Patch coverage: 39.83% and project coverage change: +3.47 🎉

Comparison is base (8123f42) 40.27% compared to head (c5d7d6e) 43.74%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #322      +/-   ##
==========================================
+ Coverage   40.27%   43.74%   +3.47%     
==========================================
  Files          48      113      +65     
  Lines        2811     7829    +5018     
==========================================
+ Hits         1132     3425    +2293     
- Misses       1679     4404    +2725     
Impacted Files Coverage Δ
graphein/ml/datasets/foldcomp_dataset.py 0.00% <0.00%> (ø)
graphein/ml/diffusion.py 0.00% <0.00%> (ø)
graphein/ml/metrics/__init__.py 0.00% <0.00%> (ø)
graphein/ml/metrics/gdt.py 0.00% <0.00%> (ø)
graphein/ml/metrics/tm_score.py 0.00% <0.00%> (ø)
graphein/ppi/graph_metadata.py 0.00% <0.00%> (ø)
graphein/ppi/visualisation.py 0.00% <0.00%> (ø)
graphein/protein/analysis.py 0.00% <0.00%> (ø)
graphein/protein/features/utils.py 27.77% <0.00%> (ø)
graphein/protein/folding_utils.py 0.00% <0.00%> (ø)
... and 95 more

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@amorehead
Copy link
Contributor Author

Feel free to merge this PR if and when you are ready to.

@a-r-j a-r-j merged commit 371ce9a into a-r-j:master May 25, 2023
11 of 12 checks passed
@sonarcloud
Copy link

sonarcloud bot commented May 25, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants