Add alignment string position support #44

alyst · 2020-09-22T19:11:00Z

Alignment string position support

Types of changes

This PR implements the following changes:

✨ New feature (A non-breaking change which adds functionality).
🐛 Bug fix (A non-breaking change, which fixes an issue).
💥 Breaking change (fix or feature that would cause existing functionality to change).

📋 Additional detail

Mapping positions from/to the input or reference sequence onto the common alignment string is quite useful (e.g. when comparing annotations attached to specific positions of the aligned strings). In addition to seq2ref() and ref2seq(), this PR adds 4 new functions to the public API: seq2agn, ref2agn, agn2seq, agn2ref, which allow to convert between input/reference/alignment positions in any direction.
Technically, keeping track of alignment positions is implemented as an AlignmentAnchor.agnpos field. Updating the pairwise alignment algorithms to support alignment positions was easy, only the common traceback utility macros had to be changed. It also should not affect the performance or memory consumption.
I also did a few minor cleanups (@inbounds were beneficial, refactored position conversion to be more generic and using Julia 1.x approach (avoid generated functions, rely on constant propagation, argument splatting)).
Provide a runnable example of use of your addition. This lets reviewers
and others try out the feature before it is merged or makes it's way to release.

☑️ Checklist

🎨 The changes implemented is consistent with the julia style guide.
📘 I have updated and added relevant docstrings, in a manner consistent with the documentation styleguide.
📘 I have added or updated relevant user and developer manuals/documentation in docs/src/.
🆗 There are unit tests that cover the code changes I have made.
🆗 The unit tests cover my code changes AND they pass.
📝 I have added an entry to the [UNRELEASED] section of the manually curated CHANGELOG.md file for this repository.
🆗 All changes should be compatible with the latest stable version of Julia.
💭 I have commented liberally for any complex pieces of internal code.

alyst · 2020-09-22T19:23:59Z

The failures due to Registry.toml problems, the testing even didn't start.
Also: is it possible to update CI to test on latest stable Julia and also on nightly (with allow_failures). I guess only LTS (is it 1.0 or 1.1?) Julia should be kept from the previous versions.

alyst · 2020-09-23T13:03:46Z

Some other notes on the PR:

In this PR I haven't yet updated the manual -- but if the maintainers think it's reasonable to include the alignment position functionality, I'll update it. The terminology (e.g. alignment position) also has to be clarified before committing to update the docs.
The PairwiseAlignment iterator interface could be updated to return the alignment position. But that would break the API, so I don't know what to do about that.
In addition to the alignment position, it would be nice to have API to query the input/reference sequence ranges aligned. Partly it exists as Alignment.firstref/lastref, but so far it's not public API and it doesn't cover input sequence positions.
Not a part of this PR, but something that I noticed: there Alignment, AlignedSequence and PairwiseAlignment types. Judging from the names, PariwiseAlignment and Alignment are expected to be from the same type hierarchy, but in fact these are very different types. Perhaps, it's possible to rename Alignment into something like AlignmentDefinition or AlignmentTransform (and fix the .jl filename and field names of that type accordingly (e.g. into transform -- right now in some parts of the code one may read aln.a.aln.anchors)) to avoid the confusion.

alyst · 2020-10-03T12:55:29Z

I don't know who would be the best to review, but since you were committing recently, maybe you can take a look, @benjward ? Or maybe you can suggest a more suitable reviewer? Thanks in advance!

alyst · 2021-03-03T11:30:40Z

@benjward @jakobnissen If you would have time, could you please review this PR?

jakobnissen

Sorry for just getting around to it now. I can't wrap my head around the logic, but despite a few minor issues, it looks good.

I'm not 100% sure what this functionality is useful for, though. Does the concept of an "alignment position" make sense?

jakobnissen · 2021-03-03T11:51:56Z

src/BioAlignments.jl

+    seq2agn,
+    ref2agn,
+    agn2seq,
+    agn2ref,


In the rest of the codebase, "alignment" is abbreviated aln. Perhaps a search-replace for "agn"->"aln" would be best.

jakobnissen · 2021-03-03T11:53:56Z

src/alignment.jl

-Map a position `i` from sequence to reference.
-"""
-function seq2ref(aln::Alignment, i::Integer)::Tuple{Int,Operation}
-    idx = findanchor(aln, i, Val{true})


Idiomatically, you usually use Val by doing findanchor(aln, i, Val(true)), and then in the type signature that uses Val, you say

findanchor(aln::Alignment, i::Integer, ::Val{my_bool}) where my_bool

, i.e. using ::Val{Foo} instead of ::Type{Val{Foo}}.

I was never sure whether Val(...) generates a code to allocate that object or not, so to avoid allocation I was using type approach, but I'll change it to Val().

There is no allocation - the two things are completely equivalent. It's just the norm to use instances instead of types whenever you have a zero-field struct. (e.g. Julia's Base and the rest of BioJulia does it).

Actually, that was not my code. %) In the refactored functions Val is replaced by passing function references.

src/alignment.jl

jakobnissen · 2021-03-03T11:56:57Z

src/alignment.jl

+
+Map a position `i` from sequence to reference.
+"""
+seq2ref(aln::Alignment, i::Integer) = pos2pos(aln, i, seqpos, refpos)


Same here - no variable named seqpos / refpos

jakobnissen · 2021-03-03T11:58:11Z

src/alignment.jl

-                return j
-            end
+        # invariant (activate this for debugging)
+        #@assert anchors[lo].$pos < i ≤ anchors[hi].$pos


Should this assert be re-enabled? Otherwise it should be deleted, I think.

I think it should be still relevant, because the function logic was not touched, I've just added support for aligned position + made it a normal function, not @generated.
I'd rather keep it here as a comment: it nicely documents the constraint of the algorithm, but the context might be performance-critical to enable it.
I'll update it to use the current way of getting the position (pos(anchors...))) and check internally (without pushing here) whether it doesn't throw in tests.

So I checked and tests do not throw if the line is enabled.

jakobnissen · 2021-03-03T12:00:17Z

src/operations.jl

@@ -113,7 +113,7 @@ function Base.convert(::Type{Char}, op::Operation)
 end

 function Base.print(io::IO, op::Operation)
-    write(io, convert(Char, op))
+    write(io, isvalid(op) ? convert(Char, op) : '?')


Should we support invalid operations? Perhaps it's a better design to not allow invalid operations to exist. I'm not sure though

The time has passed, I don't know why I've added it. Maybe when I'll be revisiting the PR, I'll see why.
Otherwise I agree, it could be removed.

I suggest to keep this change:

to be in sync with show() (see below) that doesn't throw

in general, if you cannot print incorrect object, it's hard to figure out what is the problem. If we just throw an exception the user would not see the context (sequence position).

src/alignment.jl

alyst · 2021-03-03T12:41:30Z

I'm not 100% sure what this functionality is useful for, though. Does the concept of an "alignment position" make sense?

For multiple sequence alignments it definitely does, because it's actually the column of MSA.

But it also does make sense for pairwise one. If you want to display the pairwise alignment, you may need to know what is the position of a specific letter from either sequence in the aligned sequence.
My use case was to correctly display the post-translational modifications (phosphorylation, ubiquitination etc) of inidividual amino acids on the pairwise-aligned homologous protein sequences.
There could be regions that are present only on the first sequence or regions that are present only on the second one. So you definitely need an aligned position, which preserves everything.

alyst · 2021-03-03T12:45:17Z

@jakobnissen It looks like at the moment CI seems broken. Should this package be updated to use GitHab actions for CI?
I can try submitting the PR for that, but maybe it would be better if someone who has better knowledge of BioJulia ecosystem is doing that.

jakobnissen · 2021-03-03T13:12:49Z

Yeah... BioJulia is strapped for dev manpower at the moment, so there is a lot of cleanup and housekeeping that needs to be done in all sorts of packages, and just haven't been done. If you make a PR, I'll merge it! You can look at e.g. BioSequences for how to do this.

otherwise Anchors with invalid operation (and these anchors are created temporarily during traceback) would throw upon printing

codecov · 2021-03-03T18:47:16Z

Codecov Report

Merging #44 (c8e434c) into master (3e3910c) will decrease coverage by 0.41%.
The diff coverage is 87.50%.

@@            Coverage Diff             @@
##           master      #44      +/-   ##
==========================================
- Coverage   88.82%   88.41%   -0.42%     
==========================================
  Files          17       17              
  Lines        1056     1079      +23     
==========================================
+ Hits          938      954      +16     
- Misses        118      125       +7

Impacted Files	Coverage Δ
src/BioAlignments.jl	`100.00% <ø> (ø)`
src/alignment.jl	`79.16% <76.92%> (-4.62%)`	⬇️
src/alignedseq.jl	`96.92% <100.00%> (+0.31%)`	⬆️
src/anchors.jl	`83.33% <100.00%> (+8.33%)`	⬆️
src/operations.jl	`90.32% <100.00%> (ø)`
src/pairwise/algorithms/common.jl	`91.17% <100.00%> (+4.21%)`	⬆️
src/pairwise/algorithms/hamming_distance.jl	`100.00% <100.00%> (ø)`
src/pairwise/alignment.jl	`99.06% <100.00%> (-0.05%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e3910c...cc8237c. Read the comment docs.

- add position getters for AlignmentAnchor - convert findanchor() from generated function to normal one with position getter as a param (in Julia constant propagation we trust) - generic pos2pos() function that gets called by seq2ref/ref2seq()

jakobnissen · 2021-03-03T19:13:23Z

@alyst Ping me when it's ready to be merged :)

- add alnpos to each anchor - add mapping to/from alnpos - add alnpos support to traceback

alyst · 2021-03-04T09:54:46Z

@jakobnissen Thank you! I think the PR is good to go.
I have addressed your suggestions, also I have added tests to the new aln2xxx/xxx2aln() methods that were not covered.
There are still new lines in the PR that are not covered, but they were derived from the existing exceptions that got more specific after alnpos was added.
They were not covered by the tests before. So I think it's fine to merge this PR, and address the coverage later by introducing more test cases.

jakobnissen · 2021-03-04T10:56:44Z

Thanks for putting in the work!

timholy · 2021-11-24T11:33:58Z

Docs need updating: https://github.com/BioJulia/BioAlignments.jl/blob/master/docs/src/alignments.md#representing-alignments

Downstream tests are failing because the breaking change from #44 was merged here.

alyst force-pushed the agn_pos branch from 251dc65 to 5ffb114 Compare October 3, 2020 13:02

alyst closed this Mar 3, 2021

alyst reopened this Mar 3, 2021

jakobnissen reviewed Mar 3, 2021

View reviewed changes

alyst mentioned this pull request Mar 3, 2021

New CI based on GH-actions #47

Merged

alyst added 4 commits March 3, 2021 19:23

print(op): don't throw Exception, print ? instead

be5b72b

otherwise Anchors with invalid operation (and these anchors are created temporarily during traceback) would throw upon printing

count(pairwisealn): add @inbounds

d920daa

AlnAnchor: drop Tuple-based ctor (use splatting)

8b0d70e

check_aln_anchors(): optimize anchors access

ecc9543

alyst force-pushed the agn_pos branch from 5ffb114 to cded341 Compare March 3, 2021 18:45

alyst added 2 commits March 3, 2021 19:53

refactor seq2ref/ref2seq()

eceb92d

- add position getters for AlignmentAnchor - convert findanchor() from generated function to normal one with position getter as a param (in Julia constant propagation we trust) - generic pos2pos() function that gets called by seq2ref/ref2seq()

Alignment: expand docstring

90513fb

alyst force-pushed the agn_pos branch from cded341 to 89fbe1a Compare March 3, 2021 19:02

alyst added 2 commits March 3, 2021 20:34

alignment position support (seq2aln, ref2aln etc)

0156de4

- add alnpos to each anchor - add mapping to/from alnpos - add alnpos support to traceback

count_aligned(): optimize with alnpos

cc8237c

alyst force-pushed the agn_pos branch from 89fbe1a to cc8237c Compare March 3, 2021 19:34

jakobnissen merged commit bc9ad52 into BioJulia:master Mar 4, 2021

alyst deleted the agn_pos branch March 4, 2021 11:07

alyst added a commit to alyst/BioAlignments.jl that referenced this pull request Nov 25, 2021

update docs with alignment pos introduced by BioJulia#44

f0ab859

alyst mentioned this pull request Nov 25, 2021

Update docs with alignment position #61

Merged

11 tasks

TransGirlCodes pushed a commit that referenced this pull request Nov 25, 2021

update docs with alignment pos introduced by #44 (#61)

36a5e75

CiaranOMara mentioned this pull request Apr 29, 2022

BioSequences v3 and BioSymbols v5 #72

Merged

jakobnissen mentioned this pull request May 6, 2022

Add support for 'meta' alignment operations (OP_PAD and OP_HARD_CLIP) #64

Merged

11 tasks

MillironX mentioned this pull request Jun 8, 2022

Hotfix/2.2.0 #76

Merged

MillironX mentioned this pull request Jul 26, 2022

Add getter methods (and forwarded getter methods) for alignment types #83

Merged

11 tasks

MillironX added a commit that referenced this pull request Oct 1, 2022

Merge pull request #85 from BioJulia/release/v2.3.0

b92ccf3

Downstream tests are failing because the breaking change from #44 was merged here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add alignment string position support #44

Add alignment string position support #44

alyst commented Sep 22, 2020

alyst commented Sep 22, 2020

alyst commented Sep 23, 2020

alyst commented Oct 3, 2020

alyst commented Mar 3, 2021

jakobnissen left a comment

jakobnissen Mar 3, 2021

jakobnissen Mar 3, 2021

alyst Mar 3, 2021

jakobnissen Mar 3, 2021

alyst Mar 3, 2021

jakobnissen Mar 3, 2021

jakobnissen Mar 3, 2021

alyst Mar 3, 2021 •

edited

alyst Mar 3, 2021

jakobnissen Mar 3, 2021

alyst Mar 3, 2021

alyst Mar 3, 2021

alyst commented Mar 3, 2021

alyst commented Mar 3, 2021

jakobnissen commented Mar 3, 2021

codecov bot commented Mar 3, 2021 •

edited

jakobnissen commented Mar 3, 2021

alyst commented Mar 4, 2021

jakobnissen commented Mar 4, 2021

timholy commented Nov 24, 2021

Add alignment string position support #44

Add alignment string position support #44

Conversation

alyst commented Sep 22, 2020

Alignment string position support

Types of changes

📋 Additional detail

☑️ Checklist

alyst commented Sep 22, 2020

alyst commented Sep 23, 2020

alyst commented Oct 3, 2020

alyst commented Mar 3, 2021

jakobnissen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alyst Mar 3, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alyst commented Mar 3, 2021

alyst commented Mar 3, 2021

jakobnissen commented Mar 3, 2021

codecov bot commented Mar 3, 2021 • edited

Codecov Report

jakobnissen commented Mar 3, 2021

alyst commented Mar 4, 2021

jakobnissen commented Mar 4, 2021

timholy commented Nov 24, 2021

alyst Mar 3, 2021 •

edited

codecov bot commented Mar 3, 2021 •

edited