Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Df processing #216 #222

Merged
merged 28 commits into from Mar 12, 2023
Merged

Df processing #216 #222

merged 28 commits into from Mar 12, 2023

Conversation

a-r-j
Copy link
Owner

@a-r-j a-r-j commented Oct 24, 2022

Reference Issues/PRs

#216

What does this implement/fix? Explain your changes

Ensures provided df_processing_funcs are used.

What testing did you do to verify the changes in this PR?

Test pending

Pull Request Checklist

  • Added a note about the modification or contribution to the ./CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./graphein/tests/* directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under ./notebooks/ (if applicable)
  • Ran python -m py.test tests/ and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., python -m py.test tests/protein/test_graphs.py)
  • Checked for style issues by running black . and isort .

@a-r-j a-r-j changed the title Df processing Df processing #216 Nov 2, 2022
@a-r-j a-r-j added the 1 - Priority P1 High Priority label Nov 2, 2022
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@sonarcloud
Copy link

sonarcloud bot commented Dec 8, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@sonarcloud
Copy link

sonarcloud bot commented Jan 25, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@sonarcloud
Copy link

sonarcloud bot commented Mar 11, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@a-r-j a-r-j merged commit b308a58 into master Mar 12, 2023
@a-r-j a-r-j deleted the df_processing branch March 30, 2023 21:46
a-r-j added a commit that referenced this pull request Apr 22, 2023
…st PDB model, and merging-in latest updates from `master` (#309)

* Fix graph sequence (atomistic graphs in `initialise_graph_with_metadata` had duplicated residues)  (#268)

* Fix param name typo in function docstring

* fix: atomistic graph only has sequence residues for CA atom in `initialise_graph_with_metadata`

* fix: avoid changing dataframe when extracting rows

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add: test sequence feature in graphs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix graph sequence feature (#268)

* fix matplotlib deprecation

* fix test bug

* change build to ubuntu-latest

* remove unecessary selection

---------

Co-authored-by: Cam <73625486+cimranm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arian Jamasb <arjamasb@gmail.com>

* Add dataset splits functionality and add new documentation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve merge conflicts with remote

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused test

* Address lingering SonarCloud concerns

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add deposition date parsing

* remove pdb.py

* add chain extraction util

* add chain writing method

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* After fixing merge conflicts, add more filters and add time-based splits

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix up SonarCloud concerns

* Improve verbiage surrounding PDB resolutions

* Simplify code and improve variable names

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Track names of splits in df_splits

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix column naming during merging of DataFrame splits

* add additional properties

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor clustering to allow file caching and overwriting

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add description to assert statements

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add extra documentation around clustering function, and address small formatting issues

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add method to write selection to CSV

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve from_fasta documentation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Enable code reuse for length filters

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Minor documentation changes to FASTA write-out function

* Add ability to perform most API calls for a subset of splits

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update .gitignore

* Fix missing download call, and add more documentation to download functions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix small bug when merging different splits together

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug in length filtering functions, fix print bugs in utils, and add ability to write-out PDB files after selecting a subset of chains to include in them

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix string formatting

* Update PDB write-out logic and documentation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add PDB download workaround for PDBs that can no longer be downloaded

* Make exception more specific

* Add TQDM for data split exporting

* Add improved error message for non standard node funcs #274 (#275)

* Add improved error message for non standard node funcs #274

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* clean up unused files and move docs from root (#276)

* clean up unused files and move docs from root

* remove setup.cfg

* prelim path support #269 (#277)

* prelim path support #269

* fix import error

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Switch to miniconda for build (#278)

* switch to miniconda for build

* update docker build

* switch to checkout v3

* Improve altloc handling (#263)

* Fix bug in `add_k_nn_edges`.

`kneighbors_graph(X=dist_mat, ...)` is wrong since `X` may not be a distance matrix. This leads to wrong results which may be similar to correct ones.

* Extend `add_k_nn_edges`.

* Add types to docstring

* Update changelog

* Add `kind_name` argument

* Test `filter_distmat`

* Set default value of `long_interaction_threshold` to 0

* Fix filtering bug in `add_k_nn_edges`

* Test `add_k_nn_edges`

* Refactor with `add_edge`

* Fix bug for empty `edges_to_excl`

* Improve `convert_nx_to_pyg`

* Fix bug in `plot_pyg_data`

* Test `convert_nx_to_pyg` on multimers

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update `CHANGELOG.md`

* Fix version in `CHANGELOG.md`

* Handle corner cases

* Handle NaNs in coordinatess

* Add PyG install to CI

* typo in CI config

* bump torch versions in CI

* make pyg-related tests conditional pyg installation

* Try fixing graph attributes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo and extend amino acid 3to1, 1to3 mappings

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adapt imports of amino acid codes

* add semicolon to version

* remove wildcard version number for pyyaml

* fix typo

* fix additonal typos

* Extend aggregation to vectors

* Implement `aggregate_feature_over_residues`

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add docstring and aggregation type

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import literal from typing extensions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add missing `median` in exception message

* Fix `nullcontext`

* fix dataset test

* fix division by zero errors in edge colouring

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update changlelog

* Separate and improve `remove_alt_locs`

Removal of alt_locs is separeted from removal of insertions. Additionaly, now alt_locs with hihger occupancies are left

* Test `remove_alt_locs`

* Rename test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Set `insertions=True` by default

* Make `alt_locs` configurable (TODO `include` case)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use typing_extensions literal for 3.7 compatibility

* use typing extensions literal for 3.7 compatibility

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve hbond donor/acceptor assignment robustnness

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* replace trailing ":" in insertions

* fix test and hbond granularity inference

* Add altloc identifer to node ID

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix test

* fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* actually fix test

* update changelog

* Fix typo

---------

Co-authored-by: Arian Jamasb <arjamasb@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Df processing #216 (#222)

* docstrings and df processing funcs #216

* dcstrings

* add test

* lint test

* fix test

* fix typo in test

* Update changelog

* fix typo in test

* fix broken test

* fix broken test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add hetatm removal to test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use atomic granularity

* fix syntax error

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bugs in test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix test

* typo

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Minor patch `convert_nx_to_pyg` #280 (#281)

* nx_to_pyg bug fix #280

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arian Jamasb <arjamasb@gmail.com>

* changes for 1.6.0 (#279)

* changes for 1.6.0

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Enable PDBManager root to be set to an arbitrary location

* add initial tests

* update changelog

* add tutorial notebook

* Allow all chains in a complex to be exported together

* add module-level import

* Remove old, unused PDBManager prototype file

* add parsing & checks for unavailable PDB structures

* fix download checker

* actually fix download checker

* add availability filter

* FoldComp ML Datasets (#284)

* add foldcomp dataset util

* clean up

* add import warnings

* add foldcomp dataset extra dependencies

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* exclude foldcomp from notebook tests. download too big :(

* update changelog

* add lightning datamodule wrapper

* add transform functionality

* docs: add new module to API reference

* update notebook

* fix: fix paths issue on setup

* add foldcomp dataset tutorial to docs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add stage param to setup

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Default to export model 1's chains only in PDBManager, and clean-up notebook and utilities

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add tutorial nblink

* add tutorial to datasets sections

* mv pdb data to ml API

* rm pyg dataset import

* rm unused code

* fix annotation

* add MMTF download format

* refactor dependency utils

* refactor graphein.utils.utils.import_message

* refactor graphein.protein.utils.is_tool

* update .gitignore

* ignore cif too

* ignore cif too

* ignore foldcomp files

* catch straggling erroneous imports

* ignore mol2

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update folding utils

* add max batch option

* add foldcomp utils

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add notebook updates [WIP]

* move manager class into graphein.ml

* remove datasets init

* fix import util refactor I didn't catch

* add PDBmanager to __init__

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix oligomeric filtering

* update notebook

* fix dataset init

* fix protein.coord renaming in tensor module

* add try/except to pyg-related datasets

* add try/except to pyg-related datasets

* add mmseqs to CI build

* rollback dssp install to conda

* ignore pdb manager notebook in minimal tests

* fix code smell

* fix metrics

* shorten line lengths

* add minimum scipy version

* remove python 3.7 from CI

* Add Torch 2.0.0 to CI

* add note about multiple split strategies

* add torch cluster install to CI

* update dockerfile to torch 2.0

* switch docker pytorch 1.13 for VMD python version conflict

* switch out torchtyping for jaxtyping

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update tensor shape syntax for jaxtyping

* remove torch-dependent tests from minimal install testing

* update test ignores

* install dssp from apt, rather than conda in docker

* update typing extensions version

* Update citation (#287)

* update citation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Support MMTF & rename pdb_path to path throughout (#293)

* rename pdb_path to path throughout

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* install from biopandas bleeding edge

* fix bleeding edge biopandas install

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update to bleeding edge biopandas

* [pre-commit.ci] pre-commit autoupdate (#294)

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/psf/black: 23.1.0 → 23.3.0](psf/black@23.1.0...23.3.0)

* pin pandas to <2.0.0

* Bump AF2 version

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arian Jamasb <arjamasb@gmail.com>

* update path in notebooks

* Add missing import #296 (#297)

* update changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Prep for 1.7.0 release (#292)

* update version string

* update readme

* update doc version

* update changelog

* Add autopublish workflow (#298)

* Add autopublish workflow

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update version for 1.7.0

* update workflow version

* remove rogue print statement (#302)

* Consistent conversion to undirected graphs (#301)

* Fix `convert_nx_to_pyg` to return undirected graph

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix symmetrization of edges of different kinds

* Clean

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix case when `edge_index` is not desired

* Test directed/undirected conversion consistency

* Update contributors

* Update CHANGELOG.md

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add graphein install to tutorial notebook #306

* Tensor fixes (#307)

* add PSW to nonstandard residues

* improve insertion and non-standard residue handling

* refactor chain selection

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused verbosity arg

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix chain selection in tests

* fix chain selection in tutorial notebook

* fix notebook chain selection

* fix chain selection typehint

* Update changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add NLW as a nonstandard residue

* Export only first model of each downloaded PDB file, and typecast model_id column to str to avoid to_pdb() errors

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Track split names for edge cases in dataset splitting

* Add fix for scenario where downloaded PDB files do not contain ATOMs for an entry's listed chains

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Cam <73625486+kamurani@users.noreply.github.com>
Co-authored-by: Cam <73625486+cimranm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arian Jamasb <arjamasb@gmail.com>
Co-authored-by: Anton Bushuiev <67932762+anton-bushuiev@users.noreply.github.com>
Co-authored-by: Ryan Greenhalgh <35999546+rg314@users.noreply.github.com>
a-r-j added a commit that referenced this pull request Apr 28, 2023
…st PDB model, and merging-in latest updates from `master` (#311)

* add PDB manager #270

* add download method

* add clustering utilities

* `PDBManager` - Bug fixes, adding necessary changes to export only first PDB model, and merging-in latest updates from `master` (#309)

* Fix graph sequence (atomistic graphs in `initialise_graph_with_metadata` had duplicated residues)  (#268)

* Fix param name typo in function docstring

* fix: atomistic graph only has sequence residues for CA atom in `initialise_graph_with_metadata`

* fix: avoid changing dataframe when extracting rows

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add: test sequence feature in graphs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix graph sequence feature (#268)

* fix matplotlib deprecation

* fix test bug

* change build to ubuntu-latest

* remove unecessary selection

---------

Co-authored-by: Cam <73625486+cimranm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arian Jamasb <arjamasb@gmail.com>

* Add dataset splits functionality and add new documentation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve merge conflicts with remote

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused test

* Address lingering SonarCloud concerns

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add deposition date parsing

* remove pdb.py

* add chain extraction util

* add chain writing method

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* After fixing merge conflicts, add more filters and add time-based splits

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix up SonarCloud concerns

* Improve verbiage surrounding PDB resolutions

* Simplify code and improve variable names

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Track names of splits in df_splits

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix column naming during merging of DataFrame splits

* add additional properties

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor clustering to allow file caching and overwriting

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add description to assert statements

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add extra documentation around clustering function, and address small formatting issues

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add method to write selection to CSV

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve from_fasta documentation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Enable code reuse for length filters

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Minor documentation changes to FASTA write-out function

* Add ability to perform most API calls for a subset of splits

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update .gitignore

* Fix missing download call, and add more documentation to download functions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix small bug when merging different splits together

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug in length filtering functions, fix print bugs in utils, and add ability to write-out PDB files after selecting a subset of chains to include in them

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix string formatting

* Update PDB write-out logic and documentation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add PDB download workaround for PDBs that can no longer be downloaded

* Make exception more specific

* Add TQDM for data split exporting

* Add improved error message for non standard node funcs #274 (#275)

* Add improved error message for non standard node funcs #274

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* clean up unused files and move docs from root (#276)

* clean up unused files and move docs from root

* remove setup.cfg

* prelim path support #269 (#277)

* prelim path support #269

* fix import error

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Switch to miniconda for build (#278)

* switch to miniconda for build

* update docker build

* switch to checkout v3

* Improve altloc handling (#263)

* Fix bug in `add_k_nn_edges`.

`kneighbors_graph(X=dist_mat, ...)` is wrong since `X` may not be a distance matrix. This leads to wrong results which may be similar to correct ones.

* Extend `add_k_nn_edges`.

* Add types to docstring

* Update changelog

* Add `kind_name` argument

* Test `filter_distmat`

* Set default value of `long_interaction_threshold` to 0

* Fix filtering bug in `add_k_nn_edges`

* Test `add_k_nn_edges`

* Refactor with `add_edge`

* Fix bug for empty `edges_to_excl`

* Improve `convert_nx_to_pyg`

* Fix bug in `plot_pyg_data`

* Test `convert_nx_to_pyg` on multimers

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update `CHANGELOG.md`

* Fix version in `CHANGELOG.md`

* Handle corner cases

* Handle NaNs in coordinatess

* Add PyG install to CI

* typo in CI config

* bump torch versions in CI

* make pyg-related tests conditional pyg installation

* Try fixing graph attributes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo and extend amino acid 3to1, 1to3 mappings

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adapt imports of amino acid codes

* add semicolon to version

* remove wildcard version number for pyyaml

* fix typo

* fix additonal typos

* Extend aggregation to vectors

* Implement `aggregate_feature_over_residues`

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add docstring and aggregation type

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import literal from typing extensions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add missing `median` in exception message

* Fix `nullcontext`

* fix dataset test

* fix division by zero errors in edge colouring

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update changlelog

* Separate and improve `remove_alt_locs`

Removal of alt_locs is separeted from removal of insertions. Additionaly, now alt_locs with hihger occupancies are left

* Test `remove_alt_locs`

* Rename test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Set `insertions=True` by default

* Make `alt_locs` configurable (TODO `include` case)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use typing_extensions literal for 3.7 compatibility

* use typing extensions literal for 3.7 compatibility

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve hbond donor/acceptor assignment robustnness

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* replace trailing ":" in insertions

* fix test and hbond granularity inference

* Add altloc identifer to node ID

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix test

* fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* actually fix test

* update changelog

* Fix typo

---------

Co-authored-by: Arian Jamasb <arjamasb@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Df processing #216 (#222)

* docstrings and df processing funcs #216

* dcstrings

* add test

* lint test

* fix test

* fix typo in test

* Update changelog

* fix typo in test

* fix broken test

* fix broken test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add hetatm removal to test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use atomic granularity

* fix syntax error

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bugs in test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix test

* typo

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Minor patch `convert_nx_to_pyg` #280 (#281)

* nx_to_pyg bug fix #280

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arian Jamasb <arjamasb@gmail.com>

* changes for 1.6.0 (#279)

* changes for 1.6.0

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Enable PDBManager root to be set to an arbitrary location

* add initial tests

* update changelog

* add tutorial notebook

* Allow all chains in a complex to be exported together

* add module-level import

* Remove old, unused PDBManager prototype file

* add parsing & checks for unavailable PDB structures

* fix download checker

* actually fix download checker

* add availability filter

* FoldComp ML Datasets (#284)

* add foldcomp dataset util

* clean up

* add import warnings

* add foldcomp dataset extra dependencies

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* exclude foldcomp from notebook tests. download too big :(

* update changelog

* add lightning datamodule wrapper

* add transform functionality

* docs: add new module to API reference

* update notebook

* fix: fix paths issue on setup

* add foldcomp dataset tutorial to docs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add stage param to setup

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Default to export model 1's chains only in PDBManager, and clean-up notebook and utilities

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add tutorial nblink

* add tutorial to datasets sections

* mv pdb data to ml API

* rm pyg dataset import

* rm unused code

* fix annotation

* add MMTF download format

* refactor dependency utils

* refactor graphein.utils.utils.import_message

* refactor graphein.protein.utils.is_tool

* update .gitignore

* ignore cif too

* ignore cif too

* ignore foldcomp files

* catch straggling erroneous imports

* ignore mol2

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update folding utils

* add max batch option

* add foldcomp utils

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add notebook updates [WIP]

* move manager class into graphein.ml

* remove datasets init

* fix import util refactor I didn't catch

* add PDBmanager to __init__

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix oligomeric filtering

* update notebook

* fix dataset init

* fix protein.coord renaming in tensor module

* add try/except to pyg-related datasets

* add try/except to pyg-related datasets

* add mmseqs to CI build

* rollback dssp install to conda

* ignore pdb manager notebook in minimal tests

* fix code smell

* fix metrics

* shorten line lengths

* add minimum scipy version

* remove python 3.7 from CI

* Add Torch 2.0.0 to CI

* add note about multiple split strategies

* add torch cluster install to CI

* update dockerfile to torch 2.0

* switch docker pytorch 1.13 for VMD python version conflict

* switch out torchtyping for jaxtyping

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update tensor shape syntax for jaxtyping

* remove torch-dependent tests from minimal install testing

* update test ignores

* install dssp from apt, rather than conda in docker

* update typing extensions version

* Update citation (#287)

* update citation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Support MMTF & rename pdb_path to path throughout (#293)

* rename pdb_path to path throughout

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* install from biopandas bleeding edge

* fix bleeding edge biopandas install

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update to bleeding edge biopandas

* [pre-commit.ci] pre-commit autoupdate (#294)

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/psf/black: 23.1.0 → 23.3.0](psf/black@23.1.0...23.3.0)

* pin pandas to <2.0.0

* Bump AF2 version

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arian Jamasb <arjamasb@gmail.com>

* update path in notebooks

* Add missing import #296 (#297)

* update changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Prep for 1.7.0 release (#292)

* update version string

* update readme

* update doc version

* update changelog

* Add autopublish workflow (#298)

* Add autopublish workflow

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update version for 1.7.0

* update workflow version

* remove rogue print statement (#302)

* Consistent conversion to undirected graphs (#301)

* Fix `convert_nx_to_pyg` to return undirected graph

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix symmetrization of edges of different kinds

* Clean

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix case when `edge_index` is not desired

* Test directed/undirected conversion consistency

* Update contributors

* Update CHANGELOG.md

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add graphein install to tutorial notebook #306

* Tensor fixes (#307)

* add PSW to nonstandard residues

* improve insertion and non-standard residue handling

* refactor chain selection

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused verbosity arg

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix chain selection in tests

* fix chain selection in tutorial notebook

* fix notebook chain selection

* fix chain selection typehint

* Update changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add NLW as a nonstandard residue

* Export only first model of each downloaded PDB file, and typecast model_id column to str to avoid to_pdb() errors

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Track split names for edge cases in dataset splitting

* Add fix for scenario where downloaded PDB files do not contain ATOMs for an entry's listed chains

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Cam <73625486+kamurani@users.noreply.github.com>
Co-authored-by: Cam <73625486+cimranm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arian Jamasb <arjamasb@gmail.com>
Co-authored-by: Anton Bushuiev <67932762+anton-bushuiev@users.noreply.github.com>
Co-authored-by: Ryan Greenhalgh <35999546+rg314@users.noreply.github.com>

* Add structure format parameter to allow mmtf manipulation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update changelog

---------

Co-authored-by: Alex Morehead <acmwhb@missouri.edu>
Co-authored-by: Cam <73625486+kamurani@users.noreply.github.com>
Co-authored-by: Cam <73625486+cimranm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Anton Bushuiev <67932762+anton-bushuiev@users.noreply.github.com>
Co-authored-by: Ryan Greenhalgh <35999546+rg314@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 - Priority P1 High Priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant