Skip to content
This repository has been archived by the owner on Aug 26, 2023. It is now read-only.

Genomic Distances using MASH #415

Merged
merged 19 commits into from Mar 24, 2017
Merged

Genomic Distances using MASH #415

merged 19 commits into from Mar 24, 2017

Conversation

kescobo
Copy link
Member

@kescobo kescobo commented Mar 13, 2017

This is a PR for an idea I brought up months ago in gitter. I'd like to add a small module to calculate the MASH distance between two genomes.

Briefly, the theory is that you generate a hash for all kmers of size k, but only store a "bottom sketch" of a particular genome of size s - that is the s smallest hashes. If you store the hashes in sorted order, you only need to check if each subsequent hash is larger than a single value, so this is quite fast. The genome distance for two genomes is then essentially the Jaccard index of the minhashes, with some additional modification to account for the size of the kmers used.

I have an implementation that I've validated and is quite fast - from two FASTA files of ~5 Mbp each to MASH distance takes ~2 sec on my 3 year old macbook pro. Most of this time is spent generating the sketch, but once a sketch is made, the comparison between two genomes takes less than a millisecond (so adding additional genomes scales very well).

I'm ready to start adding documentation and tests, but thought I should open a PR and get some guidance on where this fits in Bio.jl, and if I should add anything else to the module.

Minhash

  • Implement MinHash in Seq
  • MinHash Tests
  • Documentation

MASH Distance

  • Implement MASH distance in Var
  • MASH Tests
  • Documentation

Still to-do

  • fix CI errors

@TransGirlCodes
Copy link
Member

This is great @kescobo!

@TransGirlCodes
Copy link
Member

I think this probably fits with the methods I've been writing in Var for distance, where the first argument is a type which dictates the distance method used.

@TransGirlCodes
Copy link
Member

@kescobo My question to you and @kmsquire is: This and CountMin.jl both rely on Sketches, so are there commonalities between these two sets of code which can be "merged" within Bio and a consistent module/methods/types for dealing with hashes and sketches implemented that fulfils the needs of the two?

@bicycle1885
Copy link
Member

MinHash is use in many applications and I'd say +1 for adding this in Bio.jl. I think this will fit in Bio.Seq.

@kescobo
Copy link
Member Author

kescobo commented Mar 13, 2017

@Ward9250 I might be wrong, but I think CountMin.jl is doing something different. If I understand correctly, a Count-Min sketch gives counts of things, and basically serves as a frequency table. A MinHash sketch by contrast doesn't care about frequencies, it's just storing a snapshot of a subset of the things that are present, regardless of their frequency.

I looked at CountMin.jl and didn't see an obvious way that it would be used in this context, but would be happy for @kdmurray91 to take a look and let me know if I missed something.

@codecov-io
Copy link

codecov-io commented Mar 13, 2017

Codecov Report

Merging #415 into master will increase coverage by <.01%.
The diff coverage is 77.27%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #415      +/-   ##
==========================================
+ Coverage    77.6%   77.61%   +<.01%     
==========================================
  Files         129      131       +2     
  Lines        9480     9546      +66     
==========================================
+ Hits         7357     7409      +52     
- Misses       2123     2137      +14
Impacted Files Coverage Δ
src/seq/Seq.jl 0% <ø> (ø) ⬆️
src/seq/minhash.jl 69.04% <69.04%> (ø)
src/var/mash.jl 91.66% <91.66%> (ø)
src/align/alignment.jl 86.13% <0%> (-1%) ⬇️
src/seq/kmer.jl 86.04% <0%> (+2.32%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 032087b...5bc3281. Read the comment docs.

@TransGirlCodes
Copy link
Member

TransGirlCodes commented Mar 13, 2017

I think this definitely should go in Bio.jl, but I think the code may need to be split. There are several aspects of this PR, so I can imagine creating MinHashes of sequences goes in Seq as a MinHash is broadly useful, but then computation of the evolutionary distance from said MinHash goes in either Phylo or Var, as that aspect of the PR is computing measures of genetic variation or relatedness.

@kescobo I think you are right, I've also had a look at CountMin in the past 10 minutes and can see indeed it is different.

@kescobo
Copy link
Member Author

kescobo commented Mar 13, 2017

@Ward9250 That makes sense. To me, Phylo makes the most sense for MASH distance, but you have a much better sense of that package and Var, so I'll defer to your judgement there.

@TransGirlCodes
Copy link
Member

TransGirlCodes commented Mar 13, 2017

In that case, the first step to refinement of organisation in this PR is to move purely MinHash code to Seq. We can deliberate Var vs Phylo later :)

@kescobo
Copy link
Member Author

kescobo commented Mar 13, 2017

@Ward9250 upon further reflection and examination of what's in phylo and var, I changed my mind. I've added the MASH distance functions to var, but I'm not 100% sure how to incorporate it into what you've currently got going on in distances.jl

I know you're actively working on that, I'm wondering if I should let you handle that integration, and if so, wait on tests/docs.

@TransGirlCodes
Copy link
Member

I'm happy for you to continue with tests and docs and I can then deal with integration subsequently if you'd like - I'm easy 👍

@kmsquire
Copy link
Contributor

@kescobo My question to you and @kmsquire is: This and CountMin.jl both rely on Sketches, so are there commonalities between these two sets of code which can be "merged" within Bio and a consistent module/methods/types for dealing with hashes and sketches implemented that fulfils the needs of the two?

@Ward9250, I think you meant to ping @kdmurray91.

@TransGirlCodes
Copy link
Member

So I did! 😆

* .sketch: a sorted set of hashes
* .kmersize: the length of kmers used to generate the sketch
"""
type MinHashSketch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be immutable if possible.


Base.getindex(s::MinHashSketch, part) = getindex(s.sketch, part)
Base.length(s::MinHashSketch) = length(s.sketch)
Base.size(s::MinHashSketch) = (length(s), s.kmersize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks an abuse of overloading. Base.size should return the dimensions (or sizes) of an array-like data structure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something that would be better (something like dims(), or should I just not bother?

# A seqence and its reverse complement should be the same, so take the smallest
# hash of a seq or its reverse complement.
function revcomphash(kmer::Kmer)
k = minimum((kmer, reverse_complement(kmer)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k = min(kmer, reverse_complement(kmer)) would be better.

end


function kmerminhash(seq::BioSequence, kmerset, kmerhashes::Vector{UInt64}, k::Int, s::Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By passing k as a usual argument, you cannot enjoy the full optimization of Julia. kmerminhash{k}(::Type{DNAKmer{k}}, seq, kmerset, kmerhashes, s) may significantly improve the performance of this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function updates arguments, the name should end with !.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're misunderstanding this function or I'm misunderstanding you. The k argument here isn't a kmer, it's an argument to specify the length of kmers used to build the minhash sketch. And it doesn't modify the sequence, it generates all possible kmers from a sequence and builds a sketch using the smallest s hashes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it clear, I wrote a function that is ~10x faster than your implementation: https://gist.github.com/bicycle1885/23f15ac022cff46cb454c9be1a585987. In this function, k is passed as a type argument and it gives Julia more opportunities to generate specialized code that is specific to the length of kmers. Also, this function modifies the kmerset and kmerhashes arguments and hence this should be names like kmerminhash!.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bicycle1885 To your second point (kmerminhash!()) - I woke up in the middle of the night realizing what you meant! I think my python brain was in force and only thinking about the first argument.

To your first point, it's no surprise that it was me misunderstanding you. I'm not clear why that provides such a performance boost, but the evidence is clear. And that also makes point 2 moot, since you're not doing the weirdness with passing sets around. Thanks!

Copy link
Member Author

@kescobo kescobo Mar 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bicycle1885 Though now I'm going to modify the code, what about the minhash() functions that take groups of sequences rather than just one? Will it still be performant if I do something like:

function kmerminhash{k, T<:BioSequence}(::Type{DNAKmer{k}}, seqs::Vector{T}, s::Int)
    # generate first `s` kmers
    kmerhashes = UInt64[]
    iter = [each(DNAKmer{k}, seq) for seq in seqs]
    ...

?

Or would it make more sense to make a separate iterator function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the purpose of minhash functions taking multiple sequences is. I thought minhash is a kind of summary of a sequence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bicycle1885 That's true, but if the sequence you want a summary of is something like a genome that has multiple contigs, then you need the minhash of all of those contigs together (in principle, you could calculate the minhash of each sequence, and then merge and take the bottom s hashes from that, but that seems inefficient).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it makes sense. So, what about having an update function like minhash!(sketch::MinHashSketch, seq::BioSequence) that adds a new sequence to sketch?

h = revcomphash(kmer[2])
if h < kmerhashes[end]
i = searchsortedlast(kmerhashes, h)
if i == 0 && i != kmerhashes[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i != kmerhashes[1] compares an index and a hash value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! That should be if i == 0 && h != kmerhashes[1].

src/var/mash.jl Outdated
# License is MIT: https://github.com/BioJulia/Bio.jl/blob/master/LICENSE.md

function jaccarddistance(sketch1::MinHashSketch, sketch2::MinHashSketch)
d = length(setdiff(sketch1.sketch, sketch2.sketch))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since sketches are sorted, I think d can be computed more efficiently (in O(M+N)) like the merge step of the merge sort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bicycle1885 I'm not sure about merge sort, but do you mean something like: Iterating through the arrays independently in a loop, and since you know they're sorted you know based on greater than, less than and so on, whether elements are in array A and are not in array B?

@kescobo Note that setdiff gets the values that are in sketch1 and not sketch2, but not the other way around, values in sketch2 that are not in sketch1 will not be returned. If this is what you want all is well, but I thought I'd mention it just in case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bicycle1885 I initially wrote something out to do something like what you're suggesting (you can look here starting at ln 117 for what I tried), but it didn't seem to be any faster. Probably my lack of knowledge for how to do this efficiently is to blame - happy to take guidance on that.

@Ward9250 the calculation is the length of the intersect over the length of the union. setdiff tells me how many items are in one but not the other, and since we know the size of each set, that's enough information to know how many are shared as well (we don't need to know what's shared, just the number).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kescobo I think the reason it may not have been much faster is the use of shift! and array copying? I think you can probably do the task you are tying to do in 117 without all the array manipulation, by just keeping track of two indicies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ward9250 Yeah, that makes sense. I tried a number of different data structures, and I think at one point I was trying to get a vector of matches out of that function, but later morphed into just a calculation of the number. I can try to re-work that code and see if it improves things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great if it's possible to eke out a bit more speed. I drew a draft version of a function that does the same job as setdiff but is specific to sorted arrays, when I checked out setdiff with @less the other day, I can make a gist out of it if it helps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kescobo That's it! idxes vs. shifts and slices ftw! 😆

"""
type MinHashSketch
sketch::Vector{UInt64}
kmersize::Int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if having kmer size as a parameter of the type (like array dimensions are a parameter) will afford any benefit here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expound on this? I'm not familiar with making something a parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean doing something like the following gist: https://gist.github.com/Ward9250/1c550e345b742f3287da4f9a2c2fda08

@kdm9
Copy link
Member

kdm9 commented Mar 15, 2017

Hi all,

Sorry, I am just checking in here. I'll review the comments (& code) here properly tomorrow, but the short answer is that @kescobo is right about CountMin.jl. This PR happens to be the essence of my whole PhD, so expect some verbosity from me in the (aussie) morning!

For now, you can take a look at kWIP for our approach to this problem using count-min sketches. I have a preliminary julia re-write of kWIP too, FWIW.

Cheers,
K

@kescobo
Copy link
Member Author

kescobo commented Mar 15, 2017

@kdmurray91 Awesome - would be great to have your insight. I'm a total newb at this :-)

@kescobo
Copy link
Member Author

kescobo commented Mar 16, 2017

Hey guys, I think this is close - just need some advice on how to move forward with generating sketches that span multiple sequences using @bicycle1885's speedy approach (see here). Does it make sense to extend the each() function for DNAKmer to operate on groups of sequences?

# A seqence and its reverse complement should be the same, so take the smallest
# hash of a seq or its reverse complement.
function revcomphash(kmer::Kmer)
k = min((kmer, reverse_complement(kmer)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k = min(kmer, reverse_complement(kmer))?

function MinHashSketch(sketch::Vector, kmersize::Int)
length(sketch) > 0 || error("Sketch cannot be empty")
kmersize > 0 || error("Kmersize must be greater than 0")
new(sketch, kmersize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use return when returning a value to clarify your intention.

Base.done(s::MinHashSketch, state) = done(s.sketch, state)

function Base.:(==)(a::MinHashSketch, b::MinHashSketch)
a.kmersize == b.kmersize && a.sketch == b.sketch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return


Generate a MinHash sketch of size `s` for kmers of length `k`.
"""
function minhash(seq::BioSequence, k::Int, s::Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Integer for k and s would be better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also true for other places where Int is passed as the type parameter?

Also, if you don't mind explaining, is Integer better because it's more generic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the reason. I'm sometimes frustrated by this kind of restrictions. Most integer values can be converted to Int, so I think it would be reasonable to make argument types as loose as possible. This is also described in the style guide: http://docs.julialang.org/en/stable/manual/style-guide/#avoid-writing-overly-specific-types.

src/var/mash.jl Outdated
end
end
end
matches == sketchlen ? (return 1.0) : return matches / (2 * sketchlen - matches)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the if-else clause would be more readable.

Copy link
Member Author

@kescobo kescobo Mar 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, though I thought using the ternary operator is more "standard" in julia.

It's worth noting that this syntax is currently in a bunch of places around the Bio.jl code. Admittedly in most cases, they're in quite short and clear places, but in others, not so much. Eg in banded_needleman_wunch.jl:

@update2_nextcol (i == m ? end_gap_init_a : middle_gap_init_a) (i == m ? end_gap_extend_a : middle_gap_extend_a)

As someone coming from python, I was really confused by this syntax initially, and it's not easy to find info about it. I fully support using if-else in most circumstances, but in that case we should probably be more consistent (and maybe add something to CONTRIBUTING.md if we're breaking from standard julia style).

@@ -18,12 +18,12 @@ hashes for kmers of a given length. The type contains two parameters:
"""
immutable MinHashSketch
sketch::Vector{UInt64}
kmersize::Int
kmersize::Integer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be kept as Int. Loosening the type of a field will propagate to other variables and result in performance degradation. In Julia, calling new automatically converts the types of arguments to those of fields. So, even if the constructor takes an Integer value, the value will be converted to Int when a sketch object is created.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see, but just to be sure, you're saying line 21 there should be Int, but the constructor on line 23 should be Integer, yes?

So as a general rule, type fields should be specific where possible, but function arguments should be more permissive?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and yes. Function arguments are for users, but field values are for internal use. Arguments should be generic for usability and fields should be specific (or parameterized by type parameters) for performance.

@kescobo
Copy link
Member Author

kescobo commented Mar 18, 2017

OK, I've implemented methods for minhash() to take multi-sequence (vectors or readers) inputs and generate sketches. This required a slight re-working of kmerminhash(), but I benchmarked it and it's the same speed as the old version on a ~4Mb genome making a 12mer sketch of 5000 (see here).

Oh, and when I opened this PR I said

I have an implementation that I've validated and is quite fast - from two FASTA files of ~5 Mbp each to MASH distance takes ~2 sec on my 3 year old macbook pro. Most of this time is spent generating the sketch, but once a sketch is made, the comparison between two genomes takes less than a millisecond (so adding additional genomes scales very well)

I just repeated these procedures, and now going from the FASTA files to a MASH distance is ~10x faster (~250ms) as is getting MASH distance from the sketch (~13 μs). You guys are great! (thanks especially @bicycle1885).

@kescobo
Copy link
Member Author

kescobo commented Mar 18, 2017

Weee! It works! @bicycle1885 - I think I managed to address all of your concerns, right?

@Ward9250 I played around a bit with your suggested to make k a type parameter for MinHashSketch, but the code you put in the gist didn't work (or at least I didn't get it to work).

@kdm9
Copy link
Member

kdm9 commented Mar 19, 2017

Sorry for the delayed response all...

Regarding the term "sketch": In my mind, this is a generic term for a data structure that holds the essence of some dataset using as few resources as possible. These include bloom filters for presence/absence queries, hyperloglogs for cardinaltiy queries, Count-min sketches (a la countmin.jl) for counting queries, and min-hash or bottom sketches for "summary" queries.

So no, the MinHash datastructure is not really a good fit for CountMin.jl, nor is CountMin.jl useful for this PR. However, one thing I've been thinking of is making a SketchingDataStructures.jl package, that would hold a bloom filter, hyperloglog, min-hash and count-min sketch, with a consistent API, and importantly with good serialisation support via HDF5. PhD-related time poverty has prevented this thus far, sorry!

As to how mash fits in to all this, MASH's key (IMHO) innovation is applying min-hash sketches to alignment free seq. comparison. They use a metric derived from AAF, which does a similar thing, but using jellyfish to count the entire dataset, not a subset. For full disclosure, I have a paper using a completely different metric derived from the D2 statistic, and using (what amount to) count-min sketches.

As I said, I'm currently writing a thesis on this field, so ping any Q's my way, and I'll try not to take so long to get back to you. I'll take a look through the code now, but it sounds as though @bicycle1885 as already been over it with a sharper eye than I could :)

Copy link
Member

@kdm9 kdm9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, though a couple of features seem duplicated.


# A seqence and its reverse complement should be the same, so take the smallest
# hash of a seq or its reverse complement.
function revcomphash(kmer::Kmer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that implented somewhere as canonical(kmer::Kmer)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, though now that I look at it, I realize that the original MASH paper did the lexicographic minimum - eg the lowest by alphabetical order. I didn't look at the min() function as it applies to kmers - might have to convert them to strings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would take more memory and add conversion cost though :( Can we implement lexicographic sorting functionality for the kmers with methods that are used by the julia sort interface?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ward9250 I suspect we can- it would just require a custom sort order to make sure A < C < G < T.

But in any case, this is only necessary to make our sketches compatible with sketches generated by other software, and this is a bigger project that also requires the ability to select the specific hash function (see eg here). I don't think it's necessary to implement in this PR.

Copy link
Member

@kdm9 kdm9 Mar 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it just be hash(canonical(kmer))? The canonical kmer is the lexographically smaller of the kmer & revcomp(kmer), by definition.

see docs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kdmurray oh - well in that case, yes :-)

I'll take a look at it.

EDIT: Fixed!

# This file is a part of BioJulia.
# License is MIT: https://github.com/BioJulia/Bio.jl/blob/master/LICENSE.md

function jaccarddistance(sketch1::MinHashSketch, sketch2::MinHashSketch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coudn't this be obtained from Distances.jl? Then it could be jaccarddistance(s1, s2) = Distances.jaccard(s1.sketch, s2.sketch) or whatever

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the way Distances.jl implements this is 1 - sum(min(x, y)) / sum(max(x, y)), which I think will be less efficient here. The version written here takes advantage of the fact that we know sketches are (1) sets, (2) the same length and (3) sorted. Besides, probably not worth adding another dependency for this one function right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kescobo I expected the Distances methods would be slower since your function is tailored to the type.

@kescobo
Copy link
Member Author

kescobo commented Mar 20, 2017

FYI - in light of #425, I've incorporated all of these changes into my original repo here.

@kescobo
Copy link
Member Author

kescobo commented Mar 22, 2017

Just to be clear, I think this PR is ready pending @bicycle1885 checking that I addressed the concerns he raised in his review.

@TransGirlCodes
Copy link
Member

Looks good @kescobo 👍

@kdm9
Copy link
Member

kdm9 commented Mar 24, 2017

LGTM 👍

Copy link
Member

@bicycle1885 bicycle1885 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still wondering whether there is a better interface for multiple sequences, but I think it's fine to merge this now. Can you update NEWS.md?

#
# This file is a part of BioJulia.
# License is MIT: https://github.com/BioJulia/Bio.jl/blob/master/LICENSE.md
using DataStructures: SortedSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

@TransGirlCodes
Copy link
Member

The NEWS update would have caused a conflict with the ABIF News update, so I rebased and fixed.

@TransGirlCodes TransGirlCodes merged commit e49e343 into BioJulia:master Mar 24, 2017
@kescobo kescobo deleted the MASH branch March 24, 2017 17:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants