Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and enhance geometric node features #305

Merged
merged 28 commits into from
Jul 31, 2023
Merged

Conversation

anton-bushuiev
Copy link
Contributor

@anton-bushuiev anton-bushuiev commented Apr 16, 2023

Reference Issues/PRs

  1. Fixes On rgroup_df=compute_rgroup_dataframe(remove_insertions(raw_pdb_df)) #304
  2. Fixes the case when beta-carbons or side chains are missing in non-glycine residues (for example in H:CYS:104 in 3SE8).
  3. Fixes data types of geometric feature vectors. Now they are of type object, which breaks for example their smooth conversion to PyTorch tensors, when converting to PyG data.
  4. Implements virtual beta-carbon vectors (used recently in ProteinMPNN and RFdiffusion). In the figure below real beta-carbon vectors are in green and the virtual ones are in red.
    image
  5. Minorly enhances the visualization of PyG data
  6. Fixes add_k_nn_edges for the case when some residues were dropped before (e.g. when some alt_locs are removed). The code of add_k_nn_edges assumes the dataframe to have continuous index which is not true if some residues are dropped. For example the following line constructs continuous outgoing which are then used to index dataframe:
    outgoing = np.repeat(np.array(range(len(G.graph["pdb_df"]))), k)

I have also added the functionality to test whether a warning is raised by loguru according to Delgan/loguru#59.

What does this implement/fix? Explain your changes

What testing did you do to verify the changes in this PR?

Pull Request Checklist

  • Added a note about the modification or contribution to the ./CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./graphein/tests/* directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under ./notebooks/ (if applicable)
  • Ran python -m py.test tests/ and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., python -m py.test tests/protein/test_graphs.py)
  • Checked for style issues by running black . and isort .

@codecov-commenter
Copy link

codecov-commenter commented Apr 16, 2023

Codecov Report

Patch coverage: 40.40% and project coverage change: +3.83 🎉

Comparison is base (8123f42) 40.27% compared to head (6190f9a) 44.10%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #305      +/-   ##
==========================================
+ Coverage   40.27%   44.10%   +3.83%     
==========================================
  Files          48      113      +65     
  Lines        2811     7817    +5006     
==========================================
+ Hits         1132     3448    +2316     
- Misses       1679     4369    +2690     
Impacted Files Coverage Δ
graphein/ml/datasets/foldcomp_dataset.py 0.00% <0.00%> (ø)
graphein/ml/diffusion.py 0.00% <0.00%> (ø)
graphein/ml/metrics/__init__.py 0.00% <0.00%> (ø)
graphein/ml/metrics/gdt.py 0.00% <0.00%> (ø)
graphein/ml/metrics/tm_score.py 0.00% <0.00%> (ø)
graphein/ppi/graph_metadata.py 0.00% <0.00%> (ø)
graphein/ppi/visualisation.py 0.00% <0.00%> (ø)
graphein/protein/analysis.py 0.00% <0.00%> (ø)
graphein/protein/features/utils.py 27.77% <0.00%> (ø)
graphein/protein/folding_utils.py 0.00% <0.00%> (ø)
... and 95 more

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Owner

@a-r-j a-r-j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for another great contribution :) Can you please add a note to the changelog (also for #301)?

@anton-bushuiev anton-bushuiev changed the title Bugs in geometric node features Fix and enhance geometric node features Apr 16, 2023
@@ -115,13 +116,17 @@ def plot_pyg_data(
d["coords"] = x.coords[i]
if node_colour_tensor is not None:
d["colour"] = float(node_colour_tensor[i])
if hasattr(x, "c_beta_vector"):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really nice idea. I think it would be better to generalise this and provide these as default arguments.

E.g.:

for node_vec in node_vector_features:
    if hasattr(x, node_vec):
        d[node_feat] = getattr(x, node_feat)[i]

@@ -135,3 +140,12 @@ def plot_pyg_data(
colour_nodes_by if node_colour_tensor is None else "colour",
colour_edges_by if edge_colour_tensor is None else "colour",
)
if hasattr(x, "c_beta_vector"):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

@sonarcloud
Copy link

sonarcloud bot commented Apr 29, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@anton-bushuiev
Copy link
Contributor Author

Hi, @a-r-j! Can you take another look at this? It would be nice if you could merge this.

@sonarcloud
Copy link

sonarcloud bot commented Jul 31, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@a-r-j
Copy link
Owner

a-r-j commented Jul 31, 2023

Hey @anton-bushuiev sorry for being slow on this. Recently started a new job & this PR slipped through the cracks. Appreciate the contribution, as always.

@a-r-j a-r-j merged commit 7c99e57 into a-r-j:master Jul 31, 2023
12 of 13 checks passed
@anton-bushuiev anton-bushuiev deleted the insertions branch August 2, 2023 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

On rgroup_df=compute_rgroup_dataframe(remove_insertions(raw_pdb_df))
3 participants