Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demultiplexer fails for Levenshtein distance > 1 #124

Closed
tp2750 opened this issue Oct 19, 2020 · 0 comments · Fixed by #126
Closed

Demultiplexer fails for Levenshtein distance > 1 #124

tp2750 opened this issue Oct 19, 2020 · 0 comments · Fixed by #126

Comments

@tp2750
Copy link
Contributor

tp2750 commented Oct 19, 2020

Expected Behavior

Demultiplexers can be generated for any Hamming distance, and for Levenshtein distance 1.
I would expect it to also work for higher distances.

These cases work:

julia> Demultiplexer(LongDNASeq.(["ATGGATGG"]), n_max_errors=1, distance=:hamming)
Demultiplexer{LongSequence{DNAAlphabet{4}}}:
  distance: hamming
  number of barcodes: 1
  number of correctable errors: 1

julia> Demultiplexer(LongDNASeq.(["ATGGATGG"]), n_max_errors=2, distance=:hamming)
Demultiplexer{LongSequence{DNAAlphabet{4}}}:
  distance: hamming
  number of barcodes: 1
  number of correctable errors: 2

julia> Demultiplexer(LongDNASeq.(["ATGGATGG"]), n_max_errors=1, distance=:levenshtein)
Demultiplexer{LongSequence{DNAAlphabet{4}}}:
  distance: levenshtein
  number of barcodes: 1
  number of correctable errors: 1

Current Behavior

This does not work:

julia> Demultiplexer(LongDNASeq.(["ATGGATGG"]), n_max_errors=2, distance=:levenshtein)
ERROR: MethodError: no method matching isless(::Nothing, ::Int64)
Closest candidates are:
  isless(::Missing, ::Any) at missing.jl:87
  isless(::AbstractFloat, ::Real) at operators.jl:167
  isless(::Real, ::Real) at operators.jl:355
  ...
Stacktrace:
 [1] <(::Nothing, ::Int64) at ./operators.jl:277
 [2] <=(::Nothing, ::Int64) at ./operators.jl:326
 [3] hamming_circle(::LongSequence{DNAAlphabet{4}}, ::Int64) at /opt/Julia/julia-1.5/packages/BioSequences/k4j4J/src/demultiplexer.jl:240
 [4] levenshtein_circle(::LongSequence{DNAAlphabet{4}}, ::Int64) at /opt/Julia/julia-1.5/packages/BioSequences/k4j4J/src/demultiplexer.jl:259
 [5] levenshtein_circle(::LongSequence{DNAAlphabet{4}}, ::Int64) at /opt/Julia/julia-1.5/packages/BioSequences/k4j4J/src/demultiplexer.jl:272
 [6] Demultiplexer(::Array{LongSequence{DNAAlphabet{4}},1}; n_max_errors::Int64, distance::Symbol) at /opt/Julia/julia-1.5/packages/BioSequences/k4j4J/src/demultiplexer.jl:162
 [7] top-level scope at REPL[81]:1

The problem appears to be in the levenshtein_circle function:

julia> BioSequences.levenshtein_circle(LongDNASeq("ATGGAGTC"), 1)
71-element Array{LongSequence{DNAAlphabet{4}},1}:
 AAGGAGTC
 AATGGAGTC
 ACGGAGTC
 ... truncated

julia> BioSequences.levenshtein_circle(LongDNASeq("ATGGAGTC"), 2)
ERROR: MethodError: no method matching isless(::Nothing, ::Int64)
Closest candidates are:
  isless(::Missing, ::Any) at missing.jl:87
  isless(::AbstractFloat, ::Real) at operators.jl:167
  isless(::Real, ::Real) at operators.jl:355
  ...
Stacktrace:
 [1] <(::Nothing, ::Int64) at ./operators.jl:277
 [2] <=(::Nothing, ::Int64) at ./operators.jl:326
 [3] hamming_circle(::LongSequence{DNAAlphabet{4}}, ::Int64) at /opt/Julia/julia-1.5/packages/BioSequences/k4j4J/src/demultiplexer.jl:240
 [4] levenshtein_circle(::LongSequence{DNAAlphabet{4}}, ::Int64) at /opt/Julia/julia-1.5/packages/BioSequences/k4j4J/src/demultiplexer.jl:259
 [5] levenshtein_circle(::LongSequence{DNAAlphabet{4}}, ::Int64) at /opt/Julia/julia-1.5/packages/BioSequences/k4j4J/src/demultiplexer.jl:272
 [6] top-level scope at REPL[100]:1

Possible Solution / Implementation

Apparently this causes the error, but I'm not able to see how this becomes empty

if findfirst(isequal(seq[p]), ACGT) r

Steps to Reproduce (for bugs)

Take the test-example

@testset "Levenshtein distance" begin

and replace the distance with 2:

julia> barcodes = [dna"ATGG", dna"CAGA", dna"GGAA", dna"TACG"]
4-element Array{LongSequence{DNAAlphabet{4}},1}:
 ATGG
 CAGA
 GGAA
 TACG

julia> dplxr = Demultiplexer(barcodes, n_max_errors=1, distance=:levenshtein)
Demultiplexer{LongSequence{DNAAlphabet{4}}}:
  distance: levenshtein
  number of barcodes: 4
  number of correctable errors: 1

julia> dplxr = Demultiplexer(barcodes, n_max_errors=2, distance=:levenshtein)
ERROR: MethodError: no method matching isless(::Nothing, ::Int64)
Closest candidates are:
  isless(::Missing, ::Any) at missing.jl:87
  isless(::AbstractFloat, ::Real) at operators.jl:167
  isless(::Real, ::Real) at operators.jl:355
  ...
Stacktrace:
 [1] <(::Nothing, ::Int64) at ./operators.jl:277
 [2] <=(::Nothing, ::Int64) at ./operators.jl:326
 [3] hamming_circle(::LongSequence{DNAAlphabet{4}}, ::Int64) at /opt/Julia/julia-1.5/packages/BioSequences/k4j4J/src/demultiplexer.jl:240
 [4] levenshtein_circle(::LongSequence{DNAAlphabet{4}}, ::Int64) at /opt/Julia/julia-1.5/packages/BioSequences/k4j4J/src/demultiplexer.jl:259
 [5] levenshtein_circle(::LongSequence{DNAAlphabet{4}}, ::Int64) at /opt/Julia/julia-1.5/packages/BioSequences/k4j4J/src/demultiplexer.jl:272
 [6] Demultiplexer(::Array{LongSequence{DNAAlphabet{4}},1}; n_max_errors::Int64, distance::Symbol) at /opt/Julia/julia-1.5/packages/BioSequences/k4j4J/src/demultiplexer.jl:162
 [7] top-level scope at REPL[104]:1

Your Environment

  • Package Version used:
  • Julia Version used:
  • Operating System and version (desktop or mobile):
  • Link to your project:
julia> versioninfo()
Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_DEPOT_PATH = /opt/Julia/julia-1.5

julia> Pkg.status("BioSequences")
  [7e6ae17a] BioSequences v2.0.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant