Skip to content

Commit

Permalink
update PDB urls to avoid deprecated ftp (#364)
Browse files Browse the repository at this point in the history
* update PDB urls to avoid deprecated ftp

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add CI for torch 2.2.0

* restict CI action triggers to only code files

* install torch cluster from pip

* resolve float equality check

* revert to mamba install

* switch CI to pip installs for torch, pyg and deps.

* remove torch vision

* install cpu pyg deps.

* update changelog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Arian Jamasb <arian.jamasb@roche.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
3 people committed Feb 7, 2024
1 parent e1ab15f commit 53290a5
Show file tree
Hide file tree
Showing 7 changed files with 46 additions and 43 deletions.
44 changes: 13 additions & 31 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,17 @@ on:
push:
paths-ignore:
- "README.md"
- "CHANGELOG.md"
- "CONTRIBUTORS.md"
- "CONTRIBUTING.md"
- "docs/**"

pull_request:
paths-ignore:
- "README.md"
- "CHANGELOG.md"
- "CONTRIBUTORS.md"
- "CONTRIBUTING.md"
- "docs/*"

jobs:
Expand All @@ -17,25 +23,7 @@ jobs:
strategy:
matrix:
python-version: [3.8, 3.9, "3.10"]
torch: [1.13.0, 2.0.0, 2.1.0]
#include:
# - torch: 1.6.0
# torchvision: 0.7.0
# - torch: 1.7.0
# torchvision: 0.8.1
# - torch: 1.8.0
# torchvision: 0.9.0
# - torch: 1.9.0
# torchvision: 0.10.0
# - torch: 1.8.0
# torchvision: 0.9.0
# python-version: 3.9
# - torch: 1.9.0
# torchvision: 0.10.0
# python-version: 3.8
# - torch: 1.9.0
# torchvision: 0.10.0
# python-version: 3.9
torch: [1.13.0, 2.1.0, 2.2.0]
# https://github.com/marketplace/actions/setup-miniconda#use-a-default-shell
defaults:
run:
Expand All @@ -52,27 +40,21 @@ jobs:
channels: "conda-forge, salilab, pytorch, pyg"
python-version: ${{ matrix.python-version }}
use-mamba: true
#- name: Set up Python ${{ matrix.python-version }}
# uses: actions/setup-python@v2
# with:
#python-version: ${{ matrix.python-version }}
#- name: Setup conda environment
# run: conda env create -n graphein-dev python=${{ matrix.python-version }}
#- name: Activate Conda Environment
# run: source activate graphein-dev
- name: Install Boost 1.7.3 (for DSSP)
run: conda install -c anaconda libboost=1.73.0
- name: Install DSSP
run: conda install dssp -c salilab
- name: Install mmseqs
run: mamba install -c conda-forge -c bioconda mmseqs2
- name: Install PyTorch
run: mamba install -c pytorch pytorch==${{matrix.torch}} cpuonly
#run: pip install torch==${{matrix.torch}}+cpu torchvision==${{matrix.torchvision}}+cpu -f https://download.pytorch.org/whl/torch_stable.html
#run: mamba install -c pytorch pytorch==${{matrix.torch}} cpuonly
run: pip install torch==${{matrix.torch}}+cpu -f https://download.pytorch.org/whl/torch_stable.html
- name: Install PyG
run: mamba install -c pyg pyg
#run: mamba install -c pyg pyg
run: pip install torch_geometric
- name: Install torch-cluster
run: mamba install pytorch-cluster -c pyg
#run: mamba install pytorch-cluster -c pyg
run: pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-${{matrix.torch}}+cpu.html
- name: Install BLAST
run: sudo apt install ncbi-blast+
- name: Install Graphein
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/minimal__install.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ on:
paths-ignore:
- "README.md"
- "docs/**"
- "CHANGELOG.md"
- "CONTRIBUTORS.md"
- "CONTRIBUTING.md"

pull_request:
paths-ignore:
Expand Down
8 changes: 5 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
### 1.7.6 - UNRELEASED

#### Bugfixes
- Remove hydrogen isotopes as well in `graphein.protein.graphs.deprotonate_structure`. [#337](https://github.com/a-r-j/graphein/pull/337)
- Fixes bug in sidechain torsion angle computation for structures containing `PYL`/other non-standard amino acids ([#357](https://github.com/a-r-j/graphein/pull/357)). Fixes [#356](https://github.com/a-r-j/graphein/issues/356).
- In Pandas 1.2.0 and later, The default value of regex for `Series.str.replace()` will change from `True` to False. So we need use regular expressions explicitly now, to suppress a FutureWarning. By @StevenAZy ([#359](https://www.github.com/a-r-j/graphein/pull/359))

* Remove hydrogen isotopes as well in `graphein.protein.graphs.deprotonate_structure`. [#337](https://github.com/a-r-j/graphein/pull/337)
* Fixes bug in sidechain torsion angle computation for structures containing `PYL`/other non-standard amino acids ([#357](https://github.com/a-r-j/graphein/pull/357)). Fixes [#356](https://github.com/a-r-j/graphein/issues/356).
* Replaces RCSB PDB FTP urls with new API. [#364](https://github.com/a-r-j/graphein/pull/364)
* In Pandas `1.2.0` and later, The default value of regex for `Series.str.replace()` will change from `True` to `False`. So we need use regular expressions explicitly now, to suppress a FutureWarning. By @StevenAZy ([#359](https://www.github.com/a-r-j/graphein/pull/359))


### 1.7.5 - 27/10/2024
Expand Down
16 changes: 12 additions & 4 deletions graphein/ml/datasets/foldcomp_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,9 @@ def processed_file_names(self):
def download(self):
"""Downloads foldcomp database if not already downloaded."""

if not all(os.path.exists(self.root / f) for f in self._database_files):
if not all(
os.path.exists(self.root / f) for f in self._database_files
):
log.info(f"Downloading FoldComp dataset {self.database}...")
curr_dir = os.getcwd()
os.chdir(self.root)
Expand Down Expand Up @@ -249,7 +251,9 @@ def _get_indices(self):
]
# Sub sample
log.info(f"Sampling fraction: {self.fraction}...")
accessions = random.sample(accessions, int(len(accessions) * self.fraction))
accessions = random.sample(
accessions, int(len(accessions) * self.fraction)
)
self.ids = accessions
log.info("Creating index...")
indices = dict(enumerate(accessions))
Expand Down Expand Up @@ -382,7 +386,9 @@ def __init__(
self.val_split = val_split
self.test_split = test_split
self.transform = (
self._compose_transforms(transform) if transform is not None else None
self._compose_transforms(transform)
if transform is not None
else None
)

if (
Expand Down Expand Up @@ -421,7 +427,9 @@ def _get_indices(self):
self.ids = ds.ids
ds.db.close()

def _split_data(self, train_split: float, val_split: float, test_split: float):
def _split_data(
self, train_split: float, val_split: float, test_split: float
):
"""Split the database into non-overlapping train, validation and test"""
if not hasattr(self, "ids"):
self._get_indices()
Expand Down
5 changes: 3 additions & 2 deletions graphein/ml/datasets/pdb_data.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import gzip
import math
import os
import shutil
import subprocess
Expand Down Expand Up @@ -122,8 +123,8 @@ def __init__(
assert len(splits) == len(
split_ratios
), f"Number of splits ({splits}) must match number of split ratios ({split_ratios})."
assert (
sum(split_ratios) == 1.0
assert math.isclose(
sum(split_ratios), 1.0
), f"Split ratios must sum to 1.0: {split_ratios}."
self.split_ratios = split_ratios
# Time-based splits
Expand Down
7 changes: 4 additions & 3 deletions graphein/protein/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
import tempfile
from functools import lru_cache, partial
from pathlib import Path
from shutil import which
from typing import Any, Dict, List, Optional, Tuple, Type, Union
from urllib.error import HTTPError
from urllib.request import urlopen
Expand Down Expand Up @@ -48,7 +47,9 @@ def get_obsolete_mapping() -> Dict[str, str]:
"""
obs_dict: Dict[str, str] = {}

response = urlopen("ftp://ftp.wwpdb.org/pub/pdb/data/status/obsolete.dat")
response = urlopen(
"https://files.wwpdb.org/pub/pdb/data/status/obsolete.dat"
)
for line in response:
entry = line.split()
if len(entry) == 4:
Expand Down Expand Up @@ -146,7 +147,7 @@ def download_pdb(
Download PDB structure from PDB.
If no structure is found, we perform a lookup against the record of
obsolete PDB codes (ftp://ftp.wwpdb.org/pub/pdb/data/status/obsolete.dat)
obsolete PDB codes (https://files.wwpdb.org/pub/pdb/data/status/obsolete.dat)
:param pdb_code: 4 character PDB accession code.
:type pdb_code: str
Expand Down
6 changes: 6 additions & 0 deletions tests/protein/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from graphein.protein.utils import (
download_pdb,
download_pdb_multiprocessing,
get_obsolete_mapping,
save_graph_to_pdb,
save_pdb_df_to_pdb,
save_rgroup_df_to_pdb,
Expand Down Expand Up @@ -98,6 +99,11 @@ def test_download_structure_multi():
assert str(path).endswith("4hhb.pdb")


def test_download_obsolete_map():
mapping = get_obsolete_mapping()
assert len(mapping) > 100


if __name__ == "__main__":
test_save_graph_to_pdb()
test_save_pdb_df_to_pdb()
Expand Down

0 comments on commit 53290a5

Please sign in to comment.