Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accounts for selenocysteine in sidechain torsion angle computation #316

Merged
merged 46 commits into from
May 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
176d884
add PSW to nonstandard residues
a-r-j Apr 17, 2023
fa89a37
improve insertion and non-standard residue handling
a-r-j Apr 17, 2023
9855b9b
refactor chain selection
a-r-j Apr 17, 2023
f143719
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 17, 2023
3f3b3d9
remove unused verbosity arg
a-r-j Apr 17, 2023
09f05e5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 17, 2023
b7475df
fix chain selection in tests
a-r-j Apr 17, 2023
2e0a371
Merge branch 'tensor_fixes' of https://www.github.com/a-r-j/graphein …
a-r-j Apr 17, 2023
d2c1808
fix chain selection in tutorial notebook
a-r-j Apr 17, 2023
fc332c6
fix notebook chain selection
a-r-j Apr 17, 2023
4a67851
fix chain selection typehint
a-r-j Apr 17, 2023
5f648d2
Update changelog
a-r-j Apr 17, 2023
ab26d78
Add NLW to non-standard residues
a-r-j Apr 17, 2023
a449bba
Merge branch 'tensor_fixes' of https://www.github.com/a-r-j/graphein …
a-r-j Apr 17, 2023
afc0f8b
add .ent support
a-r-j Apr 20, 2023
258c94d
add entry for construction from dataframe
a-r-j Apr 20, 2023
c9856ae
add missing stage arg
a-r-j Apr 20, 2023
9e1191a
improve obsolete mapping retrieving to include entries with no replac…
a-r-j Apr 20, 2023
17c38ab
Merge branch 'master' into tensor_fixes
a-r-j Apr 20, 2023
7bf4ff3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 20, 2023
5af9e06
update changelog
a-r-j Apr 21, 2023
e00bdfb
add transforms to foldcomp datasets
a-r-j Apr 22, 2023
31018bc
fix jaxtyping syntax
a-r-j Apr 25, 2023
6e26455
Merge branch 'tensor_fixes' of https://www.github.com/a-r-j/graphein …
a-r-j Apr 25, 2023
3681714
Merge branch 'master' into tensor_fixes
a-r-j Apr 27, 2023
adbdbe1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 27, 2023
50ac31b
Update changelog
a-r-j Apr 27, 2023
088ae02
fix double application of transforms
a-r-j Apr 27, 2023
fb684af
improve foldcomp data loading performance
a-r-j May 1, 2023
a543a75
Merge branch 'tensor_fixes' of https://www.github.com/a-r-j/graphein …
a-r-j May 1, 2023
a00e2be
Merge branch 'master' into tensor_fixes
a-r-j May 1, 2023
ccf0437
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 1, 2023
7939a82
remove unused imports
a-r-j May 1, 2023
d72abf9
remove unused imports
a-r-j May 1, 2023
8b551c7
linting
a-r-j May 1, 2023
86bedcf
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 1, 2023
685d3db
Update changelog
a-r-j May 1, 2023
bebc3c4
add B factors to FC parsing output
a-r-j May 2, 2023
c973422
Merge branch 'tensor_fixes' of https://www.github.com/a-r-j/graphein …
a-r-j May 2, 2023
828af29
bugfix to alpha & kappa angle embedding
a-r-j May 7, 2023
c986df0
Merge branch 'master' into tensor_fixes
a-r-j May 7, 2023
6c48878
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 7, 2023
fc7657e
update changelog
a-r-j May 7, 2023
7192613
handle selenocysteine in sidechain torsion angle computation
a-r-j May 10, 2023
6a31729
Merge branch 'tensor_fixes' of https://www.github.com/a-r-j/graphein …
a-r-j May 10, 2023
bf9f4e9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
* Ensures exproting groups of PDB chains with PDBManager selects the first model for multu-model structures. [#311](https://github.com/a-r-j/graphein/pull/311)
* Fixes bug with exporting PDBs with only one splitting strategy in PDBManager [#311](https://github.com/a-r-j/graphein/pull/311)
* Fixes incorrect jaxtyping syntax for variable size dimensions [#312](https://github.com/a-r-j/graphein/pull/312)
* Fixes shape of angle embeddings for `graphein.protein.tesnor.angles.alpha/kappa`. [#315](https://github.com/a-r-j/graphein/pull/315)

#### Other Changes
* Adds entry point for biopandas dataframes in `graphein.protein.tensor.io.protein_to_pyg`. [#310](https://github.com/a-r-j/graphein/pull/310)
Expand All @@ -18,7 +19,7 @@
* Improved handling of non-standard residues in the `graphein.protein.tensor` module. [#307](https://github.com/a-r-j/graphein/pull/307)
* Insertions retained by default in the `graphein.protein.tensor` module. I.e. `insertions=True` is now the default behaviour.[#307](https://github.com/a-r-j/graphein/pull/307)
* Adds transform composition to FoldComp Dataset [#312](https://github.com/a-r-j/graphein/pull/312)
* Improve FoldComp dataloading performance [#313](https://github.com/a-r-j/graphein/pull/313)
* Improve FoldComp dataloading performance and include B factors (pLDDT) in output. [#313](https://github.com/a-r-j/graphein/pull/313) [#315](https://github.com/a-r-j/graphein/pull/315)

### 1.7.0 - UNRELEASED

Expand Down
2 changes: 2 additions & 0 deletions graphein/ml/datasets/foldcomp_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,13 +292,15 @@ def fc_to_pyg(data: Dict[str, Any], name: Optional[str] = None) -> Protein:

res_idx = np.repeat(res_num, atom_counts)
coords[res_idx, atom_idx, :] = np.array(data["coordinates"])
b_factor = np.array(data["b_factors"]) / 100

return Protein(
coords=torch.from_numpy(coords).float(),
residues=res,
residue_id=[f"A:{m}:{str(n)}" for m, n in zip(res, res_num)],
chains=torch.zeros(len(res)),
residue_type=residue_type.long(),
b_factor=torch.from_numpy(b_factor).float(),
id=name,
)

Expand Down
23 changes: 19 additions & 4 deletions graphein/protein/tensor/angles.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,10 +70,18 @@ def _extract_torsion_coords(
res_atoms = []
idxs = []

# Whether or not the protein contains selenocysteine
selenium = coords.shape[1] == 38

# Iterate over residues and grab indices of the atoms for each Chi angle
for i, res in enumerate(res_types):
res_coords = []
for angle_coord_set in CHI_ANGLES_ATOMS[res]:

angle_groups = CHI_ANGLES_ATOMS[res]
if not selenium and res == "SEC":
angle_groups = []

for angle_coord_set in angle_groups:
res_coords.append([ATOM_NUMBERING[i] for i in angle_coord_set])
idxs.append(i)
res_atoms.append(torch.tensor(res_coords, device=coords.device))
Expand Down Expand Up @@ -115,6 +123,9 @@ def sidechain_torsion(
:return: _description_
:rtype: Union[TorsionTensor, Tuple[TorsionTensor, torch.Tensor]]
"""
# Whether or not the protein contains selenocysteine
selenium = coords.shape[1] == 38

idxs, coords = _extract_torsion_coords(coords, res_types)
angles = _dihedral_angle(
coords[:, 0, :].unsqueeze(1),
Expand All @@ -139,7 +150,11 @@ def sidechain_torsion(
res_types = copy.deepcopy(res_types)
res_types.reverse()
for res in res_types:
if res in ["ALA", "GLY", "UNK"]:
PAD_RESIDUES = ["ALA", "GLY", "UNK"]
if not selenium:
PAD_RESIDUES.append("SEC")

if res in PAD_RESIDUES:
post_pad_len += 1
else:
break
Expand Down Expand Up @@ -226,7 +241,7 @@ def kappa(
angles = angles[mask]

if embed:
angles = angle_to_unit_circle(angles)
angles = torch.stack([torch.cos(angles), torch.sin(angles)], dim=-1)

return angles

Expand Down Expand Up @@ -288,7 +303,7 @@ def alpha(
angles = angles[mask]

if embed:
angles = angle_to_unit_circle(angles)
angles = torch.stack([torch.cos(angles), torch.sin(angles)], dim=-1)

return angles

Expand Down