The following example shows the full detail walkthrough calculations for utilizing the viterbi algorithm to determine the most likely path for a given sequence/read to take through an assembly graph. In this very simple example, the key is showing how even when a relatively high error rate is assumed, no corrections are made (true negative)

In [1]:
using Eisenia
using Random
using Dates

┌ Info: Recompiling stale cache file /Users/Cameron/.julia/compiled/v1.1/Eisenia.ji for Eisenia [top-level]
└ @ Base loading.jl:1184


In [2]:
L = 10
Random.seed!(L)
reference_sequence = randdnaseq(L)
reference_sequence_id = randstring(Int(round(log10(length(L)))+3))
reference_FASTA_record = FASTA.Record(reference_sequence_id, reference_sequence)

BioSequences.FASTA.Record:
   identifier: Apx
  description: <missing>
     sequence: ACCAAACTAT

In [3]:
error_rate = 0.15
observations = [reference_FASTA_record]

1-element Array{BioSequences.FASTA.Record,1}:
 BioSequences.FASTA.Record:
   identifier: Apx
  description: <missing>
     sequence: ACCAAACTAT

In [4]:
k = 1
canonical_kmers = collect(keys(Eisenia.count_canonical_kmers(observations, k)))
stranded_kmer_graph = Eisenia.build_stranded_kmer_graph(canonical_kmers, observations)
filename = reference_sequence_id * "." * replace(string(Dates.now()), ':' => '.') * ".svg"
Eisenia.plot_stranded_kmer_graph(stranded_kmer_graph, filename=filename)
HTML("""
<image src="$filename" width=50%>
""")

In [5]:
Eisenia.viterbi_maximum_likelihood_traversals(stranded_kmer_graph, error_rate = error_rate, verbosity="debug");

computing kmer counts...
computing kmer state likelihoods...
STATE LIKELIHOODS:
	kmer	count	likelihood
	A	5	0.5
	C	3	0.3
	G	0	0.0
	T	2	0.2
finding shortest paths between kmers...
	1	1	[1, 1]
	1	2	[1, 2]
	1	3	Int64[]
	1	4	[1, 4]
	2	1	[2, 1]
	2	2	[2, 2]
	2	3	Int64[]
	2	4	[2, 4]
	3	1	Int64[]
	3	2	Int64[]
	3	3	Int64[]
	3	4	Int64[]
	4	1	[4, 1]
	4	2	[4, 1, 2]
	4	3	Int64[]
	4	4	Int64[]
finding viterbi maximum likelihood paths for observed sequences...

evaluating sequence 1 of 1
	considering path state 1
		observed kmer A
		Initial state log likelihoods:
			4-element Array{Float64,1}:
			   -0.16251892949777494
			   -2.4079456086518722 
			 -Inf                  
			   -2.8134107167600364 
	considering path state 2
		observed base C
		kmer log likelihoods
			4×10 Array{Float64,2}:
			   -0.162519    -2.75279  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
			   -2.40795     -1.52901  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
			 -Inf         -Inf        0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
			   -2.8