Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix graph sequence (atomistic graphs in initialise_graph_with_metadata had duplicated residues) #268

Merged
merged 15 commits into from
Feb 21, 2023

Conversation

kamurani
Copy link
Contributor

Fixes #267

This ensures that atomistic level graphs have the same sequence (for each c in chain_ids) as those created with CA or residue granularity.

Tested with an assertion (comparing sequences from residue and atomistic graphs).

@kamurani
Copy link
Contributor Author

@a-r-j where abouts should I put the tests that I came up with?

@a-r-j
Copy link
Owner

a-r-j commented Feb 19, 2023

@a-r-j where abouts should I put the tests that I came up with?

https://github.com/a-r-j/graphein/blob/master/tests/protein/test_graphs.py

@@ -493,6 +493,13 @@ def initialise_graph_with_metadata(
sequence = protein_df.loc[protein_df["chain_id"] == c][
"residue_name"
].str.cat()
elif granularity == "atom":
protein_df = protein_df.loc[protein_df["atom_name"] == "CA"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is keeping the CA subset of protein_df not problematic downstream?

Do we not want to store this in a temporary variable (if at all, can we not just add the atom_name selection into L499 when we select by chain ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, 100%. My bad, I forgot that when it's passed to the graph construction it only passes a reference to it.

@codecov-commenter
Copy link

codecov-commenter commented Feb 19, 2023

Codecov Report

Base: 40.27% // Head: 45.48% // Increases project coverage by +5.21% 🎉

Coverage data is based on head (5d0098e) compared to base (8123f42).
Patch coverage: 49.54% of modified lines in pull request are covered.

❗ Current head 5d0098e differs from pull request most recent head a74f38e. Consider uploading reports for the commit a74f38e to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #268      +/-   ##
==========================================
+ Coverage   40.27%   45.48%   +5.21%     
==========================================
  Files          48      111      +63     
  Lines        2811     7128    +4317     
==========================================
+ Hits         1132     3242    +2110     
- Misses       1679     3886    +2207     
Impacted Files Coverage Δ
graphein/ml/diffusion.py 0.00% <0.00%> (ø)
graphein/ml/metrics/__init__.py 0.00% <0.00%> (ø)
graphein/ml/metrics/gdt.py 0.00% <0.00%> (ø)
graphein/ml/metrics/tm_score.py 0.00% <0.00%> (ø)
graphein/ppi/graph_metadata.py 0.00% <0.00%> (ø)
graphein/ppi/visualisation.py 0.00% <0.00%> (ø)
graphein/protein/analysis.py 0.00% <0.00%> (ø)
graphein/protein/features/utils.py 27.77% <0.00%> (ø)
graphein/protein/folding_utils.py 0.00% <0.00%> (ø)
graphein/protein/tensor/data.py 30.37% <ø> (ø)
... and 107 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@kamurani kamurani marked this pull request as ready for review February 21, 2023 03:48
@sonarcloud
Copy link

sonarcloud bot commented Feb 21, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@a-r-j a-r-j merged commit acaef3a into a-r-j:master Feb 21, 2023
a-r-j added a commit that referenced this pull request Apr 22, 2023
…st PDB model, and merging-in latest updates from `master` (#309)

* Fix graph sequence (atomistic graphs in `initialise_graph_with_metadata` had duplicated residues)  (#268)

* Fix param name typo in function docstring

* fix: atomistic graph only has sequence residues for CA atom in `initialise_graph_with_metadata`

* fix: avoid changing dataframe when extracting rows

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add: test sequence feature in graphs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix graph sequence feature (#268)

* fix matplotlib deprecation

* fix test bug

* change build to ubuntu-latest

* remove unecessary selection

---------

Co-authored-by: Cam <73625486+cimranm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arian Jamasb <arjamasb@gmail.com>

* Add dataset splits functionality and add new documentation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve merge conflicts with remote

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused test

* Address lingering SonarCloud concerns

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add deposition date parsing

* remove pdb.py

* add chain extraction util

* add chain writing method

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* After fixing merge conflicts, add more filters and add time-based splits

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix up SonarCloud concerns

* Improve verbiage surrounding PDB resolutions

* Simplify code and improve variable names

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Track names of splits in df_splits

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix column naming during merging of DataFrame splits

* add additional properties

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor clustering to allow file caching and overwriting

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add description to assert statements

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add extra documentation around clustering function, and address small formatting issues

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add method to write selection to CSV

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve from_fasta documentation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Enable code reuse for length filters

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Minor documentation changes to FASTA write-out function

* Add ability to perform most API calls for a subset of splits

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update .gitignore

* Fix missing download call, and add more documentation to download functions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix small bug when merging different splits together

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug in length filtering functions, fix print bugs in utils, and add ability to write-out PDB files after selecting a subset of chains to include in them

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix string formatting

* Update PDB write-out logic and documentation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add PDB download workaround for PDBs that can no longer be downloaded

* Make exception more specific

* Add TQDM for data split exporting

* Add improved error message for non standard node funcs #274 (#275)

* Add improved error message for non standard node funcs #274

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* clean up unused files and move docs from root (#276)

* clean up unused files and move docs from root

* remove setup.cfg

* prelim path support #269 (#277)

* prelim path support #269

* fix import error

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Switch to miniconda for build (#278)

* switch to miniconda for build

* update docker build

* switch to checkout v3

* Improve altloc handling (#263)

* Fix bug in `add_k_nn_edges`.

`kneighbors_graph(X=dist_mat, ...)` is wrong since `X` may not be a distance matrix. This leads to wrong results which may be similar to correct ones.

* Extend `add_k_nn_edges`.

* Add types to docstring

* Update changelog

* Add `kind_name` argument

* Test `filter_distmat`

* Set default value of `long_interaction_threshold` to 0

* Fix filtering bug in `add_k_nn_edges`

* Test `add_k_nn_edges`

* Refactor with `add_edge`

* Fix bug for empty `edges_to_excl`

* Improve `convert_nx_to_pyg`

* Fix bug in `plot_pyg_data`

* Test `convert_nx_to_pyg` on multimers

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update `CHANGELOG.md`

* Fix version in `CHANGELOG.md`

* Handle corner cases

* Handle NaNs in coordinatess

* Add PyG install to CI

* typo in CI config

* bump torch versions in CI

* make pyg-related tests conditional pyg installation

* Try fixing graph attributes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo and extend amino acid 3to1, 1to3 mappings

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adapt imports of amino acid codes

* add semicolon to version

* remove wildcard version number for pyyaml

* fix typo

* fix additonal typos

* Extend aggregation to vectors

* Implement `aggregate_feature_over_residues`

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add docstring and aggregation type

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import literal from typing extensions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add missing `median` in exception message

* Fix `nullcontext`

* fix dataset test

* fix division by zero errors in edge colouring

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update changlelog

* Separate and improve `remove_alt_locs`

Removal of alt_locs is separeted from removal of insertions. Additionaly, now alt_locs with hihger occupancies are left

* Test `remove_alt_locs`

* Rename test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Set `insertions=True` by default

* Make `alt_locs` configurable (TODO `include` case)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use typing_extensions literal for 3.7 compatibility

* use typing extensions literal for 3.7 compatibility

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve hbond donor/acceptor assignment robustnness

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* replace trailing ":" in insertions

* fix test and hbond granularity inference

* Add altloc identifer to node ID

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix test

* fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* actually fix test

* update changelog

* Fix typo

---------

Co-authored-by: Arian Jamasb <arjamasb@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Df processing #216 (#222)

* docstrings and df processing funcs #216

* dcstrings

* add test

* lint test

* fix test

* fix typo in test

* Update changelog

* fix typo in test

* fix broken test

* fix broken test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add hetatm removal to test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use atomic granularity

* fix syntax error

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bugs in test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix test

* typo

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Minor patch `convert_nx_to_pyg` #280 (#281)

* nx_to_pyg bug fix #280

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arian Jamasb <arjamasb@gmail.com>

* changes for 1.6.0 (#279)

* changes for 1.6.0

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Enable PDBManager root to be set to an arbitrary location

* add initial tests

* update changelog

* add tutorial notebook

* Allow all chains in a complex to be exported together

* add module-level import

* Remove old, unused PDBManager prototype file

* add parsing & checks for unavailable PDB structures

* fix download checker

* actually fix download checker

* add availability filter

* FoldComp ML Datasets (#284)

* add foldcomp dataset util

* clean up

* add import warnings

* add foldcomp dataset extra dependencies

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* exclude foldcomp from notebook tests. download too big :(

* update changelog

* add lightning datamodule wrapper

* add transform functionality

* docs: add new module to API reference

* update notebook

* fix: fix paths issue on setup

* add foldcomp dataset tutorial to docs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add stage param to setup

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Default to export model 1's chains only in PDBManager, and clean-up notebook and utilities

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add tutorial nblink

* add tutorial to datasets sections

* mv pdb data to ml API

* rm pyg dataset import

* rm unused code

* fix annotation

* add MMTF download format

* refactor dependency utils

* refactor graphein.utils.utils.import_message

* refactor graphein.protein.utils.is_tool

* update .gitignore

* ignore cif too

* ignore cif too

* ignore foldcomp files

* catch straggling erroneous imports

* ignore mol2

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update folding utils

* add max batch option

* add foldcomp utils

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add notebook updates [WIP]

* move manager class into graphein.ml

* remove datasets init

* fix import util refactor I didn't catch

* add PDBmanager to __init__

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix oligomeric filtering

* update notebook

* fix dataset init

* fix protein.coord renaming in tensor module

* add try/except to pyg-related datasets

* add try/except to pyg-related datasets

* add mmseqs to CI build

* rollback dssp install to conda

* ignore pdb manager notebook in minimal tests

* fix code smell

* fix metrics

* shorten line lengths

* add minimum scipy version

* remove python 3.7 from CI

* Add Torch 2.0.0 to CI

* add note about multiple split strategies

* add torch cluster install to CI

* update dockerfile to torch 2.0

* switch docker pytorch 1.13 for VMD python version conflict

* switch out torchtyping for jaxtyping

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update tensor shape syntax for jaxtyping

* remove torch-dependent tests from minimal install testing

* update test ignores

* install dssp from apt, rather than conda in docker

* update typing extensions version

* Update citation (#287)

* update citation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Support MMTF & rename pdb_path to path throughout (#293)

* rename pdb_path to path throughout

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* install from biopandas bleeding edge

* fix bleeding edge biopandas install

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update to bleeding edge biopandas

* [pre-commit.ci] pre-commit autoupdate (#294)

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/psf/black: 23.1.0 → 23.3.0](psf/black@23.1.0...23.3.0)

* pin pandas to <2.0.0

* Bump AF2 version

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arian Jamasb <arjamasb@gmail.com>

* update path in notebooks

* Add missing import #296 (#297)

* update changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Prep for 1.7.0 release (#292)

* update version string

* update readme

* update doc version

* update changelog

* Add autopublish workflow (#298)

* Add autopublish workflow

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update version for 1.7.0

* update workflow version

* remove rogue print statement (#302)

* Consistent conversion to undirected graphs (#301)

* Fix `convert_nx_to_pyg` to return undirected graph

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix symmetrization of edges of different kinds

* Clean

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix case when `edge_index` is not desired

* Test directed/undirected conversion consistency

* Update contributors

* Update CHANGELOG.md

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add graphein install to tutorial notebook #306

* Tensor fixes (#307)

* add PSW to nonstandard residues

* improve insertion and non-standard residue handling

* refactor chain selection

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused verbosity arg

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix chain selection in tests

* fix chain selection in tutorial notebook

* fix notebook chain selection

* fix chain selection typehint

* Update changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add NLW as a nonstandard residue

* Export only first model of each downloaded PDB file, and typecast model_id column to str to avoid to_pdb() errors

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Track split names for edge cases in dataset splitting

* Add fix for scenario where downloaded PDB files do not contain ATOMs for an entry's listed chains

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Cam <73625486+kamurani@users.noreply.github.com>
Co-authored-by: Cam <73625486+cimranm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arian Jamasb <arjamasb@gmail.com>
Co-authored-by: Anton Bushuiev <67932762+anton-bushuiev@users.noreply.github.com>
Co-authored-by: Ryan Greenhalgh <35999546+rg314@users.noreply.github.com>
a-r-j added a commit that referenced this pull request Apr 28, 2023
…st PDB model, and merging-in latest updates from `master` (#311)

* add PDB manager #270

* add download method

* add clustering utilities

* `PDBManager` - Bug fixes, adding necessary changes to export only first PDB model, and merging-in latest updates from `master` (#309)

* Fix graph sequence (atomistic graphs in `initialise_graph_with_metadata` had duplicated residues)  (#268)

* Fix param name typo in function docstring

* fix: atomistic graph only has sequence residues for CA atom in `initialise_graph_with_metadata`

* fix: avoid changing dataframe when extracting rows

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add: test sequence feature in graphs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix graph sequence feature (#268)

* fix matplotlib deprecation

* fix test bug

* change build to ubuntu-latest

* remove unecessary selection

---------

Co-authored-by: Cam <73625486+cimranm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arian Jamasb <arjamasb@gmail.com>

* Add dataset splits functionality and add new documentation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve merge conflicts with remote

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused test

* Address lingering SonarCloud concerns

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add deposition date parsing

* remove pdb.py

* add chain extraction util

* add chain writing method

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* After fixing merge conflicts, add more filters and add time-based splits

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix up SonarCloud concerns

* Improve verbiage surrounding PDB resolutions

* Simplify code and improve variable names

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Track names of splits in df_splits

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix column naming during merging of DataFrame splits

* add additional properties

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor clustering to allow file caching and overwriting

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add description to assert statements

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add extra documentation around clustering function, and address small formatting issues

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add method to write selection to CSV

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve from_fasta documentation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Enable code reuse for length filters

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Minor documentation changes to FASTA write-out function

* Add ability to perform most API calls for a subset of splits

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update .gitignore

* Fix missing download call, and add more documentation to download functions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix small bug when merging different splits together

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug in length filtering functions, fix print bugs in utils, and add ability to write-out PDB files after selecting a subset of chains to include in them

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix string formatting

* Update PDB write-out logic and documentation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add PDB download workaround for PDBs that can no longer be downloaded

* Make exception more specific

* Add TQDM for data split exporting

* Add improved error message for non standard node funcs #274 (#275)

* Add improved error message for non standard node funcs #274

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* clean up unused files and move docs from root (#276)

* clean up unused files and move docs from root

* remove setup.cfg

* prelim path support #269 (#277)

* prelim path support #269

* fix import error

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Switch to miniconda for build (#278)

* switch to miniconda for build

* update docker build

* switch to checkout v3

* Improve altloc handling (#263)

* Fix bug in `add_k_nn_edges`.

`kneighbors_graph(X=dist_mat, ...)` is wrong since `X` may not be a distance matrix. This leads to wrong results which may be similar to correct ones.

* Extend `add_k_nn_edges`.

* Add types to docstring

* Update changelog

* Add `kind_name` argument

* Test `filter_distmat`

* Set default value of `long_interaction_threshold` to 0

* Fix filtering bug in `add_k_nn_edges`

* Test `add_k_nn_edges`

* Refactor with `add_edge`

* Fix bug for empty `edges_to_excl`

* Improve `convert_nx_to_pyg`

* Fix bug in `plot_pyg_data`

* Test `convert_nx_to_pyg` on multimers

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update `CHANGELOG.md`

* Fix version in `CHANGELOG.md`

* Handle corner cases

* Handle NaNs in coordinatess

* Add PyG install to CI

* typo in CI config

* bump torch versions in CI

* make pyg-related tests conditional pyg installation

* Try fixing graph attributes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo and extend amino acid 3to1, 1to3 mappings

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adapt imports of amino acid codes

* add semicolon to version

* remove wildcard version number for pyyaml

* fix typo

* fix additonal typos

* Extend aggregation to vectors

* Implement `aggregate_feature_over_residues`

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add docstring and aggregation type

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import literal from typing extensions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add missing `median` in exception message

* Fix `nullcontext`

* fix dataset test

* fix division by zero errors in edge colouring

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update changlelog

* Separate and improve `remove_alt_locs`

Removal of alt_locs is separeted from removal of insertions. Additionaly, now alt_locs with hihger occupancies are left

* Test `remove_alt_locs`

* Rename test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Set `insertions=True` by default

* Make `alt_locs` configurable (TODO `include` case)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use typing_extensions literal for 3.7 compatibility

* use typing extensions literal for 3.7 compatibility

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve hbond donor/acceptor assignment robustnness

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* replace trailing ":" in insertions

* fix test and hbond granularity inference

* Add altloc identifer to node ID

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix test

* fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* actually fix test

* update changelog

* Fix typo

---------

Co-authored-by: Arian Jamasb <arjamasb@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Df processing #216 (#222)

* docstrings and df processing funcs #216

* dcstrings

* add test

* lint test

* fix test

* fix typo in test

* Update changelog

* fix typo in test

* fix broken test

* fix broken test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add hetatm removal to test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use atomic granularity

* fix syntax error

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bugs in test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix test

* typo

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Minor patch `convert_nx_to_pyg` #280 (#281)

* nx_to_pyg bug fix #280

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arian Jamasb <arjamasb@gmail.com>

* changes for 1.6.0 (#279)

* changes for 1.6.0

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Enable PDBManager root to be set to an arbitrary location

* add initial tests

* update changelog

* add tutorial notebook

* Allow all chains in a complex to be exported together

* add module-level import

* Remove old, unused PDBManager prototype file

* add parsing & checks for unavailable PDB structures

* fix download checker

* actually fix download checker

* add availability filter

* FoldComp ML Datasets (#284)

* add foldcomp dataset util

* clean up

* add import warnings

* add foldcomp dataset extra dependencies

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* exclude foldcomp from notebook tests. download too big :(

* update changelog

* add lightning datamodule wrapper

* add transform functionality

* docs: add new module to API reference

* update notebook

* fix: fix paths issue on setup

* add foldcomp dataset tutorial to docs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add stage param to setup

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Default to export model 1's chains only in PDBManager, and clean-up notebook and utilities

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add tutorial nblink

* add tutorial to datasets sections

* mv pdb data to ml API

* rm pyg dataset import

* rm unused code

* fix annotation

* add MMTF download format

* refactor dependency utils

* refactor graphein.utils.utils.import_message

* refactor graphein.protein.utils.is_tool

* update .gitignore

* ignore cif too

* ignore cif too

* ignore foldcomp files

* catch straggling erroneous imports

* ignore mol2

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update folding utils

* add max batch option

* add foldcomp utils

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add notebook updates [WIP]

* move manager class into graphein.ml

* remove datasets init

* fix import util refactor I didn't catch

* add PDBmanager to __init__

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix oligomeric filtering

* update notebook

* fix dataset init

* fix protein.coord renaming in tensor module

* add try/except to pyg-related datasets

* add try/except to pyg-related datasets

* add mmseqs to CI build

* rollback dssp install to conda

* ignore pdb manager notebook in minimal tests

* fix code smell

* fix metrics

* shorten line lengths

* add minimum scipy version

* remove python 3.7 from CI

* Add Torch 2.0.0 to CI

* add note about multiple split strategies

* add torch cluster install to CI

* update dockerfile to torch 2.0

* switch docker pytorch 1.13 for VMD python version conflict

* switch out torchtyping for jaxtyping

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update tensor shape syntax for jaxtyping

* remove torch-dependent tests from minimal install testing

* update test ignores

* install dssp from apt, rather than conda in docker

* update typing extensions version

* Update citation (#287)

* update citation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Support MMTF & rename pdb_path to path throughout (#293)

* rename pdb_path to path throughout

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* install from biopandas bleeding edge

* fix bleeding edge biopandas install

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update to bleeding edge biopandas

* [pre-commit.ci] pre-commit autoupdate (#294)

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/psf/black: 23.1.0 → 23.3.0](psf/black@23.1.0...23.3.0)

* pin pandas to <2.0.0

* Bump AF2 version

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arian Jamasb <arjamasb@gmail.com>

* update path in notebooks

* Add missing import #296 (#297)

* update changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Prep for 1.7.0 release (#292)

* update version string

* update readme

* update doc version

* update changelog

* Add autopublish workflow (#298)

* Add autopublish workflow

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update version for 1.7.0

* update workflow version

* remove rogue print statement (#302)

* Consistent conversion to undirected graphs (#301)

* Fix `convert_nx_to_pyg` to return undirected graph

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix symmetrization of edges of different kinds

* Clean

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix case when `edge_index` is not desired

* Test directed/undirected conversion consistency

* Update contributors

* Update CHANGELOG.md

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add graphein install to tutorial notebook #306

* Tensor fixes (#307)

* add PSW to nonstandard residues

* improve insertion and non-standard residue handling

* refactor chain selection

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused verbosity arg

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix chain selection in tests

* fix chain selection in tutorial notebook

* fix notebook chain selection

* fix chain selection typehint

* Update changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add NLW as a nonstandard residue

* Export only first model of each downloaded PDB file, and typecast model_id column to str to avoid to_pdb() errors

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Track split names for edge cases in dataset splitting

* Add fix for scenario where downloaded PDB files do not contain ATOMs for an entry's listed chains

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Cam <73625486+kamurani@users.noreply.github.com>
Co-authored-by: Cam <73625486+cimranm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arian Jamasb <arjamasb@gmail.com>
Co-authored-by: Anton Bushuiev <67932762+anton-bushuiev@users.noreply.github.com>
Co-authored-by: Ryan Greenhalgh <35999546+rg314@users.noreply.github.com>

* Add structure format parameter to allow mmtf manipulation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update changelog

---------

Co-authored-by: Alex Morehead <acmwhb@missouri.edu>
Co-authored-by: Cam <73625486+kamurani@users.noreply.github.com>
Co-authored-by: Cam <73625486+cimranm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Anton Bushuiev <67932762+anton-bushuiev@users.noreply.github.com>
Co-authored-by: Ryan Greenhalgh <35999546+rg314@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

graph feature sequence_{chain_id} contains duplicate residues for atomistic graphs
3 participants