Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development of PDBManager Class (WIP) #272

Merged
merged 131 commits into from
Mar 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
131 commits
Select commit Hold shift + click to select a range
5578157
add PDB manager #270
a-r-j Feb 24, 2023
8c177e1
add download method
a-r-j Feb 24, 2023
1d2bb0b
add clustering utilities
a-r-j Feb 24, 2023
6a10c9d
Add dataset splits functionality and add new documentation
amorehead Feb 25, 2023
3012ffe
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 25, 2023
d304a06
Resolve merge conflicts with remote
amorehead Feb 25, 2023
228946f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 25, 2023
a2f1424
Remove unused test
amorehead Feb 25, 2023
a6d2137
Address lingering SonarCloud concerns
amorehead Feb 25, 2023
052801b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 25, 2023
685a2e6
add deposition date parsing
a-r-j Feb 26, 2023
abeef32
remove pdb.py
a-r-j Feb 26, 2023
5412622
add chain extraction util
a-r-j Feb 26, 2023
e4ffe3d
add chain writing method
a-r-j Feb 26, 2023
ae8a246
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 26, 2023
f520f98
After fixing merge conflicts, add more filters and add time-based splits
amorehead Feb 26, 2023
22122b3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 26, 2023
c4d8de4
Fix up SonarCloud concerns
amorehead Feb 26, 2023
67bc788
Improve verbiage surrounding PDB resolutions
amorehead Feb 27, 2023
3b42dab
Simplify code and improve variable names
amorehead Feb 27, 2023
9ed7171
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 27, 2023
1878f78
Track names of splits in df_splits
amorehead Feb 27, 2023
67524a8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 27, 2023
d252d2e
Fix column naming during merging of DataFrame splits
amorehead Feb 27, 2023
81e4c23
add additional properties
a-r-j Feb 27, 2023
279ad3e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 27, 2023
1baf309
refactor clustering to allow file caching and overwriting
a-r-j Feb 27, 2023
e27c11c
Merge branch 'pdb_manager' of https://github.com/amorehead/graphein i…
a-r-j Feb 27, 2023
408ab6a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 27, 2023
f9e8d8d
add description to assert statements
a-r-j Feb 27, 2023
bf36dc2
Merge branch 'pdb_manager' of https://github.com/amorehead/graphein i…
a-r-j Feb 27, 2023
af2818b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 27, 2023
b6c52d8
Add extra documentation around clustering function, and address small…
amorehead Feb 27, 2023
407a80c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 27, 2023
f6e4e40
add method to write selection to CSV
a-r-j Feb 27, 2023
7821d9f
Merge branch 'pdb_manager' of https://github.com/amorehead/graphein i…
a-r-j Feb 27, 2023
59b95b2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 27, 2023
3b7d4c9
improve from_fasta documentation
a-r-j Feb 27, 2023
7797e29
Merge branch 'pdb_manager' of https://github.com/amorehead/graphein i…
a-r-j Feb 27, 2023
7a66e91
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 27, 2023
7a6ff72
Enable code reuse for length filters
amorehead Feb 27, 2023
49b594d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 27, 2023
624ed1b
Minor documentation changes to FASTA write-out function
amorehead Feb 27, 2023
a36c960
Add ability to perform most API calls for a subset of splits
amorehead Feb 27, 2023
88f5f14
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 27, 2023
45c5a6f
Update .gitignore
amorehead Feb 27, 2023
44663bb
Fix missing download call, and add more documentation to download fun…
amorehead Feb 27, 2023
ed6c3e4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 27, 2023
fd56f47
Fix small bug when merging different splits together
amorehead Feb 27, 2023
8df967c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 27, 2023
540662b
Fix bug in length filtering functions, fix print bugs in utils, and a…
amorehead Feb 28, 2023
0c4c39e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 28, 2023
9eac3e9
Fix string formatting
amorehead Feb 28, 2023
8462fa5
Update PDB write-out logic and documentation
amorehead Feb 28, 2023
0fb19cc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 28, 2023
2ba6c99
Add PDB download workaround for PDBs that can no longer be downloaded
amorehead Feb 28, 2023
587f900
Make exception more specific
amorehead Mar 1, 2023
055a192
Add TQDM for data split exporting
amorehead Mar 2, 2023
35d1064
Enable PDBManager root to be set to an arbitrary location
amorehead Mar 21, 2023
ff675bd
Merge branch 'a-r-j:pdb_manager' into pdb_manager
amorehead Mar 21, 2023
9ae9375
add initial tests
a-r-j Mar 21, 2023
054ff60
update changelog
a-r-j Mar 21, 2023
8bb6e63
add tutorial notebook
a-r-j Mar 21, 2023
3a316ba
Merge branch 'pdb_manager' of https://github.com/amorehead/graphein i…
a-r-j Mar 21, 2023
e1ac7b0
Allow all chains in a complex to be exported together
amorehead Mar 21, 2023
a4969d6
add module-level import
a-r-j Mar 22, 2023
efb3bc4
Remove old, unused PDBManager prototype file
amorehead Mar 22, 2023
7335baa
add parsing & checks for unavailable PDB structures
a-r-j Mar 25, 2023
f8ad185
fix download checker
a-r-j Mar 25, 2023
f8ae78e
actually fix download checker
a-r-j Mar 25, 2023
1436907
add availability filter
a-r-j Mar 25, 2023
c0cdaf0
Default to export model 1's chains only in PDBManager, and clean-up n…
amorehead Mar 27, 2023
b7226a0
Merge branch 'master' into pdb_manager
a-r-j Mar 27, 2023
5ff520e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 27, 2023
e65935d
add tutorial nblink
a-r-j Mar 27, 2023
0c96f38
add tutorial to datasets sections
a-r-j Mar 27, 2023
4d1007c
mv pdb data to ml API
a-r-j Mar 27, 2023
64ec09f
rm pyg dataset import
a-r-j Mar 27, 2023
96f3ac5
rm unused code
a-r-j Mar 27, 2023
618bcee
fix annotation
a-r-j Mar 27, 2023
8d5ca7a
add MMTF download format
a-r-j Mar 29, 2023
d8e5c62
refactor dependency utils
a-r-j Mar 29, 2023
142c8d9
refactor graphein.utils.utils.import_message
a-r-j Mar 29, 2023
25f576c
refactor graphein.protein.utils.is_tool
a-r-j Mar 29, 2023
bd4c60c
update .gitignore
a-r-j Mar 29, 2023
746ad26
ignore cif too
a-r-j Mar 29, 2023
98f0075
ignore cif too
a-r-j Mar 29, 2023
0a09b00
ignore foldcomp files
a-r-j Mar 29, 2023
5ad764c
catch straggling erroneous imports
a-r-j Mar 29, 2023
fe84d40
ignore mol2
a-r-j Mar 29, 2023
be63c6d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 29, 2023
dd7ce39
update folding utils
a-r-j Mar 29, 2023
9737fd6
add max batch option
a-r-j Mar 29, 2023
9b6c832
add foldcomp utils
a-r-j Mar 29, 2023
a899352
Merge branch 'pdb_manager' of https://github.com/amorehead/graphein i…
a-r-j Mar 29, 2023
2bf820d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 29, 2023
ea0812d
add notebook updates [WIP]
a-r-j Mar 29, 2023
c82a344
Merge branch 'pdb_manager' of https://github.com/amorehead/graphein i…
a-r-j Mar 29, 2023
4a3bf5c
move manager class into graphein.ml
a-r-j Mar 29, 2023
cdcce18
remove datasets init
a-r-j Mar 29, 2023
88c6d91
fix import util refactor I didn't catch
a-r-j Mar 29, 2023
9e3315e
add PDBmanager to __init__
a-r-j Mar 30, 2023
9b11370
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 30, 2023
4f47c6b
fix oligomeric filtering
a-r-j Mar 30, 2023
81d4359
update notebook
a-r-j Mar 30, 2023
86d5118
fix dataset init
a-r-j Mar 30, 2023
5f1c525
fix protein.coord renaming in tensor module
a-r-j Mar 30, 2023
e572f40
add try/except to pyg-related datasets
a-r-j Mar 30, 2023
eacff28
add try/except to pyg-related datasets
a-r-j Mar 30, 2023
8417a0e
add mmseqs to CI build
a-r-j Mar 30, 2023
d1d713d
rollback dssp install to conda
a-r-j Mar 30, 2023
d727656
ignore pdb manager notebook in minimal tests
a-r-j Mar 30, 2023
4115bd5
fix code smell
a-r-j Mar 30, 2023
51ca8f6
fix metrics
a-r-j Mar 30, 2023
33926b6
shorten line lengths
a-r-j Mar 31, 2023
6f7f0c3
add minimum scipy version
a-r-j Mar 31, 2023
4f25b5c
remove python 3.7 from CI
a-r-j Mar 31, 2023
dc6112d
Add Torch 2.0.0 to CI
a-r-j Mar 31, 2023
78379f9
add note about multiple split strategies
a-r-j Mar 31, 2023
52efc31
add torch cluster install to CI
a-r-j Mar 31, 2023
cdbfef3
Merge branch 'pdb_manager' of https://github.com/amorehead/graphein i…
a-r-j Mar 31, 2023
9aae6e9
update dockerfile to torch 2.0
a-r-j Mar 31, 2023
418669c
switch docker pytorch 1.13 for VMD python version conflict
a-r-j Mar 31, 2023
6a9a234
switch out torchtyping for jaxtyping
a-r-j Mar 31, 2023
37ab1ec
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 31, 2023
0a8affb
update tensor shape syntax for jaxtyping
a-r-j Mar 31, 2023
ef5b811
update tensor shape syntax for jaxtyping
a-r-j Mar 31, 2023
ae1a524
remove torch-dependent tests from minimal install testing
a-r-j Mar 31, 2023
160409b
update test ignores
a-r-j Mar 31, 2023
b19df0d
install dssp from apt, rather than conda in docker
a-r-j Mar 31, 2023
cd4db9e
update typing extensions version
a-r-j Mar 31, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
10 changes: 7 additions & 3 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
strategy:
matrix:
python-version: [3.8, 3.9]
torch: [1.12.0, 1.13.0]
torch: [1.12.0, 1.13.0, 2.0.0]
#include:
# - torch: 1.6.0
# torchvision: 0.7.0
Expand Down Expand Up @@ -62,11 +62,15 @@ jobs:
# run: source activate graphein-dev
- name: Install DSSP
run: conda install -c salilab dssp
- name: Install mmseqs
run: mamba install -c conda-forge -c bioconda mmseqs2
- name: Install PyTorch
run: conda install -c pytorch pytorch==${{matrix.torch}} cpuonly
run: mamba install -c pytorch pytorch==${{matrix.torch}} cpuonly
#run: pip install torch==${{matrix.torch}}+cpu torchvision==${{matrix.torchvision}}+cpu -f https://download.pytorch.org/whl/torch_stable.html
- name: Install PyG
run: conda install -c pyg pyg
run: mamba install -c pyg pyg
- name: Install torch-cluster
run: mamba install pytorch-cluster -c pyg
- name: Install BLAST
run: sudo apt install ncbi-blast+
- name: Install Graphein
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/minimal__install.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:

strategy:
matrix:
python-version: [3.7, 3.8, 3.9, 3.11]
python-version: [3.8, 3.9, 3.11]
steps:
- name: Checkout repository
uses: actions/checkout@v3
Expand All @@ -45,6 +45,6 @@ jobs:
- name: Install Dev Dependencies
run: pip install -r .requirements/dev.in
- name: Run unit tests and generate coverage report
run: pytest . --ignore-glob="tests/protein/tensor"
run: pytest . --ignore-glob="tests/protein/tensor" --ignore="tests/ml/test_conversion.py" --ignore="tests/ml/test_torch_geometric_dataset.py"
- name: Test notebook execution
run: pytest --nbval-lax notebooks/ --current-env --ignore-glob="notebooks/dataloader_tutorial.ipynb" --ignore-glob="notebooks/higher_order_graphs.ipynb" --ignore-glob="notebooks/protein_graph_analytics.ipynb" --ignore-glob="notebooks/subgraphing_tutorial.ipynb" --ignore-glob="notebooks/splitting_a_dataset.ipynb" --ignore-glob="notebooks/protein_tensors.ipynb" --ignore-glob="notebooks/datasets_and_dataloaders.ipynb" --ignore-glob="notebooks/foldcomp.ipynb"
run: pytest --nbval-lax notebooks/ --current-env --ignore-glob="notebooks/dataloader_tutorial.ipynb" --ignore-glob="notebooks/higher_order_graphs.ipynb" --ignore-glob="notebooks/protein_graph_analytics.ipynb" --ignore-glob="notebooks/subgraphing_tutorial.ipynb" --ignore-glob="notebooks/splitting_a_dataset.ipynb" --ignore-glob="notebooks/protein_tensors.ipynb" --ignore-glob="notebooks/datasets_and_dataloaders.ipynb" --ignore-glob="notebooks/foldcomp.ipynb" --ignore-glob="notebooks/creating_datasets_from_the_pdb.ipynb"
25 changes: 25 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,31 @@ dmypy.json
# Local test files
datasets/examples/*
*.ent
*.pdb
*.pt
*.dbn
*.cif
*.zip
*.mol2
datasets/regnetwork/human
notebooks/lightning_logs
pdb/
cc-to-pdb.tdd
entries.idx
pdb_cluster_all_seqs.fasta
pdb_cluster_cluster.tsv
pdb_cluster_rep_seq_id_*_c_*.fasta
pdb_bundle_index.txt
pdb_entry_type.txt
pdb_seqres.txt
pdb_seqres.txt.gz
pdb.fasta
resolu.idx
source.idx

# Foldcomp files
afdb_swissprot_v4
afdb_swissprot_v4.*

# Local test directories
tmp/
6 changes: 3 additions & 3 deletions .requirements/base.in
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ rich-click
seaborn
pyyaml>=5.1,<6.0
scikit-learn
scipy
scipy>=1.8
tqdm
typing_extensions
typing_extensions==4.5.0
wget
xarray
torchtyping
jaxtyping
29 changes: 20 additions & 9 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,34 @@
### 1.6.1 - UNRELEASED

* `Protein` tensors have coordinates renamed from `Protein.x` to `Protein.coords`. [#272](https://github.com/a-r-j/graphein/pull/272)
* Tensor types are now defined using [`jaxtyping`](https://github.com/google/jaxtyping), removing the `torchtyping` dependency [#272](https://github.com/a-r-j/graphein/pull/272)
* Drops explicit Python 3.7 support. Colab now runs on 3.8+. [#272](https://github.com/a-r-j/graphein/pull/272)
* Dockerfile now builds from `pytorch/pytorch:1.13.0-cuda11.6-cudnn8-runtime` (replaces `pytorch/pytorch:1.9.1-cuda11.1-cudnn8-runtime`) [#272](https://github.com/a-r-j/graphein/pull/272)

#### New Features
* [FoldComp Dataset] - [#284](https://github.com/a-r-j/graphein/pull/284) - Create ML datasets from FoldComp databases.

* [PDBManager] - [#272](https://github.com/a-r-j/graphein/pull/272) Adds a utility for creating custom dataset splits from the PDB.
* [FoldComp Dataset] - [#284](https://github.com/a-r-j/graphein/pull/284) - Create ML datasets from FoldComp databases.
* [ESM] - [#284](https://github.com/a-r-j/graphein/pull/284) - Wrapper for ESMFold batch folding & embedding.
* [Downloads] MMTF downloading now supported in download utilities. [#272](https://github.com/a-r-j/graphein/pull/272)

### 1.6.0dev - UNRELEASED
### 1.6.0 - 18/03/2023

#### New Features

* [Metrics] - [#245](https://github.com/a-r-j/graphein/pull/221) Adds a selection of structural metrics relevant to protein structures.
* [Tensor Operations] - [#244](https://github.com/a-r-j/graphein/pull/244) Adds suite of utilities for working directly with tensor-based representations of proteins (graphein.protein.tensor).
* [Tensor Operations] - [#244](https://github.com/a-r-j/graphein/pull/244) Adds suite of utilities for working with ESMfold (graphein.protein.folding_utils).



#### Improvements

* [Feature] = [#277](https://github.com/a-r-j/graphein/pull/227) Adds support for pathlib paths for protein graph creation. [#269](https://github.com/a-r-j/graphein/issues/269)
* [Logging] - [#221](https://github.com/a-r-j/graphein/pull/221) Adds global control of logging with `graphein.verbose(enabled=False)`.
* [Logging] - [#242](https://github.com/a-r-j/graphein/pull/242) Adds control of protein graph construction logging. Resolves [#238](https://github.com/a-r-j/graphein/issues/238)

#### Protein
* [Bugfix] - [#222]https://github.com/a-r-j/graphein/pull/222) Fixes entrypoint for user-defined `df_processing_funcs` ([#216](https://github.com/a-r-j/graphein/issues/216))

* [Bugfix] - [#222]<https://github.com/a-r-j/graphein/pull/222)> Fixes entrypoint for user-defined `df_processing_funcs` ([#216](https://github.com/a-r-j/graphein/issues/216))
* [Feature] = [#263](https://github.com/a-r-j/graphein/pull/263) Adds control of Alt Loc selection strategy. N.b. Default `ProteinGraphConfig` changed to include insertions by default (`insertions=True`) and `alt_locs="max_occupancy"`.
* [Feature] - [#264](https://github.com/a-r-j/graphein/pull/264) Adds entrypoint to `graphein.protein.graphs.construct_graph` for passing in a BioPandas dataframe directly.
* [Feature] - [#229](https://github.com/a-r-j/graphein/pull/220) Adds support for filtering KNN edges based on self-loops and chain membership. Contribution by @anton-bushuiev.
Expand All @@ -36,31 +45,33 @@

* [Bugfix] - [#268](https://github.com/a-r-j/graphein/pull/268) Fixes 'sequence' metadata feature for atomistic graphs, removing duplicate residues. Contribution by @kamurani.


#### ML

* [Bugfix] - [#234](https://github.com/a-r-j/graphein/pull/234) - Fixes bugs and improves `conversion.convert_nx_to_pyg` and `visualisation.plot_pyg_data`. Removes distance matrix (`dist_mat`) from defualt set of features converted to tensor.

#### Utils

* [Improvement] - [#234](https://github.com/a-r-j/graphein/pull/234) - Adds `parse_aggregation_type` to retrieve aggregation functions.

#### RNA

* [Bugfix] - [#281](https://github.com/a-r-j/graphein/pull/234) - Bugfix for nx->PyG conversion for graphs containing edges without "kind" attributes. Contribution by @rg314.

#### Constants
* [Improvement] - [#234](https://github.com/a-r-j/graphein/pull/234) - Adds 1 to 3 mappings to `graphein.protein.resi_atoms`.

* [Improvement] - [#234](https://github.com/a-r-j/graphein/pull/234) - Adds 1 to 3 mappings to `graphein.protein.resi_atoms`.

#### Documentation

* [Tensor Module] - [#244](https://github.com/a-r-j/graphein/pull/244) Documents new graphein.protein.tensor module.
* [CI] - [#244](https://github.com/a-r-j/graphein/pull/244) Updates to intersphinx maps


#### Package

* [CI] - [#244](https://github.com/a-r-j/graphein/pull/244) CI now runs for python 3.8, 3.9 and torch 1.12.0 and 1.13.0
* [CI] - [#244](https://github.com/a-r-j/graphein/pull/244) Separate builds for core library and library with DL dependencies.
* [Licence] - [#244](https://github.com/a-r-j/graphein/pull/244) Bump to 2023


### 1.5.2 - 19/9/2022

#### Protein
Expand Down
7 changes: 5 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM pytorch/pytorch:1.9.1-cuda11.1-cudnn8-runtime
FROM pytorch/pytorch:1.13.0-cuda11.6-cudnn8-runtime

RUN apt-get update \
&& apt-get -y install build-essential ffmpeg libsm6 libxext6 wget git \
Expand All @@ -12,6 +12,10 @@ RUN apt-get update && apt-get install -y iputils-ping && apt-get clean \
RUN apt-get update && apt-get install -y ncbi-blast+ && apt-get clean \
&& rm -rf /var/lib/apt/lists/*

# Install DSSP
RUN apt-get update && apt-get install -y dssp && apt-get clean \
&& rm -rf /var/lib/apt/lists/*

ENV CONDA_ALWAYS_YES=true


Expand Down Expand Up @@ -41,7 +45,6 @@ ENV PATH /getcontacts:$PATH
RUN conda install -c fvcore -c iopath -c conda-forge fvcore iopath
RUN conda install -c pytorch3d pytorch3d
RUN conda install -c dglteam dgl
RUN conda install -c salilab dssp
RUN conda install -c conda-forge ipywidgets

RUN export CUDA=$(python -c "import torch; print('cu'+torch.version.cuda.replace('.',''))") \
Expand Down
1 change: 1 addition & 0 deletions docs/source/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Summaries

notebooks/dataloader_tutorial.nblink
notebooks/foldcomp.nblink
notebooks/creating_datasets_from_the_pdb.nblink
datasets/pscdb
notebooks/pscdb_processing.nblink
notebooks/pscdb_baselines.nblink
Expand Down
3 changes: 3 additions & 0 deletions docs/source/notebooks/creating_datasets_from_the_pdb.nblink
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"path": "../../../notebooks/creating_datasets_from_the_pdb.ipynb"
}
14 changes: 1 addition & 13 deletions graphein/grn/features/node_features.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,9 @@
"""Node featurisation utilities for Gene Regulatory Networks."""
from typing import Any, Dict

from bioservices import HGNC, UniProt
from loguru import logger as log

from graphein.utils.utils import import_message

try:
from bioservices import HGNC, UniProt
except ImportError:
message = import_message(
submodule="graphein.grn.features.node_features",
package="bioservices",
conda_channel="bioconda",
pip_install=True,
)
log.warning(message)


def add_sequence_to_nodes(n: str, d: Dict[str, Any]):
"""
Expand Down
13 changes: 4 additions & 9 deletions graphein/ml/clustering.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,8 @@

import networkx as nx
import numpy as np
import pandas as pd
from Bio import SeqIO

from graphein.protein.utils import is_tool


def build_fasta_file_from_mapping(
pdb_sequence_mapping: Dict[str, str],
Expand All @@ -46,7 +43,7 @@ def build_fasta_file_from_graphs(
if chains is None:
chains = ["A"] * len(graphs)
mapping = {
g.name + "_" + chain: g.graph[f"sequence_{chain}"]
f"{g.name}_{chain}": g.graph[f"sequence_{chain}"]
for g, chain in zip(graphs, chains)
}

Expand Down Expand Up @@ -104,9 +101,7 @@ def get_seq_records(
f"WARNING in {get_seq_records.__name__} sequence {record.seq.id} from file "
f"{filename} is not compatible with declared alphabet {str(alphabet)}\n"
)
if return_as_dictionary:
return SeqIO.to_dict(records)
return records
return SeqIO.to_dict(records) if return_as_dictionary else records


def create_pairs_for_clustering(
Expand Down Expand Up @@ -497,8 +492,8 @@ def generate_random_sets(
n = 0
in_other_tests = []
while n < number_of_sets:
train_set_name = train_set_key + f"_{n:02}"
test_set_name = test_set_key + f"_{n:02}"
train_set_name = f"{train_set_key}_{n:02}"
test_set_name = f"{test_set_key}_{n:02}"
with open(train_set_name, mode="w") as train:
with open(test_set_name, mode="w") as test:
ids_in_test = []
Expand Down
2 changes: 1 addition & 1 deletion graphein/ml/conversion.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
import torch
from loguru import logger as log

from graphein.utils.utils import import_message
from graphein.utils.dependencies import import_message

try:
import torch
Expand Down
15 changes: 10 additions & 5 deletions graphein/ml/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
from .torch_geometric_dataset import (
InMemoryProteinGraphDataset,
ProteinGraphDataset,
ProteinGraphListDataset,
)
from .pdb_data import PDBManager

try:
from .torch_geometric_dataset import (
InMemoryProteinGraphDataset,
ProteinGraphDataset,
ProteinGraphListDataset,
)
except (NameError, ImportError):
pass
4 changes: 2 additions & 2 deletions graphein/ml/datasets/foldcomp_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@
from tqdm import tqdm

from graphein.protein.tensor import Protein
from graphein.utils.utils import import_message
from graphein.utils.dependencies import import_message

try:
import foldcomp
except ImportError:
message = import_message(
"graphein.ml.datasets.foldcomp", "foldcomp", None, True
"graphein.ml.datasets.foldcomp", "foldcomp", None, True, extras=True
)
log.warning(message)

Expand Down