Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit 3ec3f4ddde836f9c1112072a91536d8fc03df2e8
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Mar 6 12:04:28 2023 +0100

    Added line number to error message.

commit 5f41e11788ce9ab09d0fd7e477b1bdeb0b16d952
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Mar 6 11:36:10 2023 +0100

    Fixed blastdb bug.

commit e3725268db31c8c8ef652d61fda610defc8056b1
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Mar 6 11:12:47 2023 +0100

    Fix cmake error.

commit b53a536c7e24ecba6778ef59ddf612aeb88c3c07
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Mar 6 10:54:55 2023 +0100

    Detect c++17.

commit 8da058753e7894eb005ddbd0afa98782f1d0a1ce
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Mar 6 10:33:27 2023 +0100

    Fixed prepdb error.

commit a2eb7c681cd2e34d404ec9c1821160d441c9e061
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Mar 3 17:00:20 2023 +0100

    Fixed error.

commit 4a530f31297f53b3d2c9d7c5586e566cb2adfeab
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Mar 3 16:48:26 2023 +0100

    Fixed error.

commit 6762d6c4f9ec277f8a8606f8dc1ac3069d95bbfa
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Mar 3 16:07:07 2023 +0100

    Fixed error.

commit 2aa45bbbbacf33a24568a38ddb9ae5bcba39aee3
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Mar 3 16:03:10 2023 +0100

    Added mapping table to file.

commit 44a8ffe6d2e08de4136e3c32265c646605285808
Merge: e6988560 2845fbc9
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Mar 3 13:34:26 2023 +0100

    Merge branch 'dev' of https://github.com/bbuchfink/diamond_dev into dev

commit e6988560ed5d66a4d8d5e64c415963837d4957fe
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Mar 3 13:34:22 2023 +0100

    Print message to log.

commit 2845fbc9e3f79dd7a7726fe79c19cccf7eac48fa
Author: Dimi99 <73211787+Dimi99@users.noreply.github.com>
Date:   Fri Mar 3 13:33:43 2023 +0100

    Dev (#11)

    * update remote

    * update remote

    * update remote

    * update remote

    * update remote

    * update remote

    * update remote

    * wfa libary

    * added the wfa2 library

    * added compile flags

    * change parameters

    * cleanup

    * bugfix

    * fix

    * removed debug print

    * repair merging errors

    * minor

    * adjusted preprocessor directives

    * changed size_t to Loc

    * removed std::make_unique

    * changed size_t to Loc

    * removed std::make_unique

    * added newline to the end of file

    * applied changes

commit 2085a3a19acf923e32ef6e79f03b20ef2a29320f
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Thu Mar 2 16:54:20 2023 +0100

    Added timers.

commit a8277297aca52369970b2ac015dc7cbc3599b4e5
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Thu Mar 2 16:08:07 2023 +0100

    Added cut command.

commit cc564d9aa3ea309f6911d73a9de730b98feda8e0
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Thu Mar 2 15:36:16 2023 +0100

    Added fle size message.

commit e3fafe7fbaac267c3ab525628fada436b0be917b
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Thu Mar 2 14:59:18 2023 +0100

    Added table building.

commit 9570f9ad5ff48e1c3de32124c75065487621ffe3
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Thu Mar 2 10:40:22 2023 +0100

    Added table building.

commit c6038ab0b39f70936dbc0b2f07392f702481972a
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Mar 1 17:00:40 2023 +0100

    Added build helper.

commit e0a5028e0777b89fc551315c69356dc0440cfc7f
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Mar 1 15:34:32 2023 +0100

    optimized tsv reading.

commit 4db1c008ff05493364ebce7bcaaf9c9d87b0dc15
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Mar 1 12:18:00 2023 +0100

    Fixed bug.

commit 18ae3ed01ea86b0623119b20e7c4307277be7736
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Mar 1 12:05:58 2023 +0100

    Optimized read_tsv.

commit f0adbbd7c7d130726ef86f01fd5bf0f9ddf4dc24
Author: Dimi <dimitrios_K@gmx.de>
Date:   Wed Mar 1 09:32:23 2023 +0100

    Revert "Merge branch 'dev' into dev"

    This reverts commit b0063bcc9947fa25015b5305065363333c324dfb, reversing
    changes made to 2f0537d067b668b2d2c4bcd64a4aa70e6dace604.

commit b0063bcc9947fa25015b5305065363333c324dfb
Merge: 2f0537d0 a74ab968
Author: Dimi99 <73211787+Dimi99@users.noreply.github.com>
Date:   Wed Mar 1 09:20:32 2023 +0100

    Merge branch 'dev' into dev

commit 2f0537d067b668b2d2c4bcd64a4aa70e6dace604
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Feb 28 19:27:03 2023 +0100

    Added read size option.

commit 1bdef78d264061cd0416e9287f4a789587974351
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Feb 28 18:55:57 2023 +0100

    Added line counting.

commit fc15dee3f2218997eaad2b3c024305e36a7177da
Merge: 56950f13 dbd6bd2
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Feb 28 16:25:47 2023 +0100

    Merge branch 'master' into dev
  • Loading branch information
bbuchfink committed Mar 6, 2023
1 parent dbd6bd2 commit 40e440f
Show file tree
Hide file tree
Showing 316 changed files with 479,420 additions and 2,770 deletions.
413 changes: 197 additions & 216 deletions CMakeLists.txt

Large diffs are not rendered by default.

8 changes: 8 additions & 0 deletions src/ChangeLog
@@ -1,3 +1,11 @@
[2.1.5]
- Disabled the use of frequency based seed masking when using the linear-time
search feature with respect to the targets.
- Fixed a bug that caused a `Database file is not a BLAST database` error message
for the `prepdb` workflow.
- Fixed a bug that caused a segmentation fault when using BLAST databases.
- Added line numbers for error messages when reading taxonomy mapping files.

[2.1.4]
- Leading spaces are now trimmed and tabulator characters escaped as `\t`
in sequence titles, and a warning message is produced.
Expand Down
9 changes: 5 additions & 4 deletions src/align/align.cpp
Expand Up @@ -33,6 +33,7 @@ along with this program. If not, see <http://www.gnu.org/licenses/>.
#include "extend.h"
#ifdef WITH_DNA
#include "../dna/wfa2_test.h"
#include "../dna/ksw2_extension.h"
#endif
#include "../util/algo/radix_sort.h"
#include "target.h"
Expand All @@ -58,7 +59,7 @@ TextBuffer* pipeline_short(BlockId query, Search::Hit* begin, Search::Hit* end,

}

#ifndef EXTRA
#if !defined(EXTRA) || defined(WITH_DNA)
#define OLD
#endif

Expand Down Expand Up @@ -253,7 +254,7 @@ bool align_worker(HitIterator* hit_it, ThreadPool::TaskSet* task_set, Search::Co

pair<vector<Extension::Match>, Extension::Stats> matches =
#ifdef WITH_DNA
align_mode.mode == Align_mode::blastn ? WaveExtension::extend(*cfg, h->query) :
align_mode.mode == AlignMode::blastn ? Dna::extend(*cfg, cfg->query->seqs()[h->query]) :
#endif
Extension::extend(h->query, h->begin, h->end, *cfg, stat, parallel ? DP::Flags::PARALLEL : DP::Flags::NONE);
TextBuffer* buf = cfg->blocked_processing ? Extension::generate_intermediate_output(matches.first, h->query, *cfg) : Extension::generate_output(matches.first, matches.second, h->query, stat, *cfg);
Expand Down Expand Up @@ -312,7 +313,7 @@ void align_queries(Consumer* output_file, Search::Config& cfg)
cfg.seed_hit_buf->load(std::min(mem_limit - res_size, config.trace_pt_fetch_size));

if (res_size + last_size > mem_limit)
message_stream << "Warning: resident size (" << (res_size + last_size) << ") exceeds memory limit." << std::endl;
log_stream << "Warning: resident size (" << (res_size + last_size) << ") exceeds memory limit." << std::endl;

timer.go("Sorting trace points");
ips4o::parallel::sort(hit_buf->begin(), hit_buf->end(), std::less<Search::Hit>(), config.threads_);
Expand All @@ -338,7 +339,7 @@ void align_queries(Consumer* output_file, Search::Config& cfg)
};
#ifndef OLD
cfg.thread_pool.reset(new ThreadPool(task));
cfg.thread_pool->run(n_threads, !config.no_heartbeat);
cfg.thread_pool->run(n_threads, false);
cfg.thread_pool->join();
#else
cfg.thread_pool.reset(new ThreadPool ());
Expand Down
12 changes: 10 additions & 2 deletions src/basic/config.cpp
Expand Up @@ -246,6 +246,8 @@ Config::Config(int argc, const char **argv, bool check_io, CommandLineParser& pa
.add_command("fetch-seq", "", FETCH_SEQ)
.add_command("blastn", "Align DNA query sequences against a DNA reference database", blastn)
.add_command("length-sort", "", LENGTH_SORT)
.add_command("wc", "", WORD_COUNT)
.add_command("cut", "", CUT)
#endif
;

Expand Down Expand Up @@ -334,7 +336,9 @@ Config::Config(int argc, const char **argv, bool check_io, CommandLineParser& pa
\t5 = BLAST XML\n\
\t6 = BLAST tabular\n\
\t100 = DIAMOND alignment archive (DAA)\n\
\t101 = SAM\n\n\
\t101 = SAM\n\
\t102 = Taxonomic classification\n\
\t103 = PAF\n\n\
\tValue 6 may be followed by a space-separated list of these keywords:\n\n\
\tqseqid means Query Seq - id\n\
\tqlen means Query sequence length\n\
Expand Down Expand Up @@ -401,7 +405,7 @@ Config::Config(int argc, const char **argv, bool check_io, CommandLineParser& pa

string algo_str;

auto& advanced = parser.add_group("Advanced options", { blastp, blastx, makeidx, CLUSTER_REASSIGN, regression_test, cluster, DEEPCLUST, LINCLUST });
auto& advanced = parser.add_group("Advanced options", { blastp, blastx, blastn, makeidx, CLUSTER_REASSIGN, regression_test, cluster, DEEPCLUST, LINCLUST });
advanced.add()
("algo", 0, "Seed search algorithm (0=double-indexed/1=query-indexed/ctg=contiguous-seed)", algo_str)
("bin", 0, "number of query bins for seed search", query_bins_)
Expand Down Expand Up @@ -445,6 +449,9 @@ Config::Config(int argc, const char **argv, bool check_io, CommandLineParser& pa
("log-evalue-scale", 0, "", log_evalue_scale, 1.0 / std::log(2.0))
("bootstrap", 0, "", bootstrap)
("mp-self", 0, "", mp_self)
#ifdef EXTRA
("zdrop", 'z', "zdrop for gapped dna alignment", zdrop, 40)
#endif
("query-or-subject-cover", 0, "", query_or_target_cover);

auto& view_options = parser.add_group("View options", { view, blastp, blastx });
Expand Down Expand Up @@ -646,6 +653,7 @@ Config::Config(int argc, const char **argv, bool check_io, CommandLineParser& pa
("recluster_bd", 0, "", recluster_bd)
("pipeline-short", 0, "", pipeline_short)
("graph-algo", 0, "", graph_algo, string("gvc"))
("tsv-read-size", 0, "", tsv_read_size, int64_t(GIGABYTES))
#ifndef KEEP_TARGET_ID
("kmer-ranking", 0, "Rank sequences based on kmer frequency in linear stage", kmer_ranking)
#endif
Expand Down
4 changes: 3 additions & 1 deletion src/basic/config.h
Expand Up @@ -335,6 +335,8 @@ struct Config
bool pipeline_short;
string graph_algo;
bool linsearch;
int64_t tsv_read_size;
int zdrop;

SequenceType dbtype;

Expand All @@ -350,7 +352,7 @@ struct Config
match_file_stat = 14, model_seqs = 15, opt = 16, mask = 17, fastq2fasta = 18, dbinfo = 19, test_extra = 20, test_io = 21, db_annot_stats = 22, read_sim = 23, info = 24, seed_stat = 25,
smith_waterman = 26, cluster = 27, translate = 28, filter_blasttab = 29, show_cbs = 30, simulate_seqs = 31, split = 32, upgma = 33, upgma_mc = 34, regression_test = 35,
reverse_seqs = 36, compute_medoids = 37, mutate = 38, rocid = 40, makeidx = 41, find_shapes, prep_db, composition, JOIN, HASH_SEQS, LIST_SEEDS, CLUSTER_REALIGN,
GREEDY_VERTEX_COVER, INDEX_FASTA, FETCH_SEQ, CLUSTER_REASSIGN, blastn, RECLUSTER, LENGTH_SORT, MERGE_DAA, DEEPCLUST, LINCLUST
GREEDY_VERTEX_COVER, INDEX_FASTA, FETCH_SEQ, CLUSTER_REASSIGN, blastn, RECLUSTER, LENGTH_SORT, MERGE_DAA, DEEPCLUST, LINCLUST, WORD_COUNT, CUT
};
unsigned command;

Expand Down
11 changes: 9 additions & 2 deletions src/basic/hssp.cpp
Expand Up @@ -286,15 +286,22 @@ void Hsp::push_gap(Edit_operation op, int length, const Letter *subject)
#endif
}

Hsp::Hsp(const IntermediateRecord &r, unsigned query_source_len, Loc qlen, Loc tlen, const OutputFormat* output_format) :
Hsp::Hsp(const IntermediateRecord &r, unsigned query_source_len, Loc qlen, Loc tlen, const OutputFormat* output_format, const Stats::Blastn_Score *dna_score_builder):
backtraced(!IntermediateRecord::stats_mode(output_format->hsp_values) && output_format->hsp_values != HspValues::NONE),
score(r.score),
evalue(r.evalue),
bit_score(score_matrix.bitscore(r.score)),
corrected_bit_score(score_matrix.bitscore_corrected(r.score, qlen, tlen)),
transcript(r.transcript)
{
subject_range.begin_ = r.subject_begin;
if(dna_score_builder == nullptr){
bit_score = score_matrix.bitscore(r.score);
corrected_bit_score = score_matrix.bitscore_corrected(r.score, qlen, tlen);
}
else
bit_score = dna_score_builder->blast_bit_Score(r.score);

subject_range.begin_ = r.subject_begin;
if (align_mode.mode == AlignMode::blastx) {
frame = r.frame(query_source_len, align_mode.mode);
set_translated_query_begin(r.query_begin, query_source_len);
Expand Down
3 changes: 2 additions & 1 deletion src/basic/match.h
Expand Up @@ -48,6 +48,7 @@ struct OutputFormat;

namespace Stats {
struct TargetMatrix;
struct Blastn_Score;
}

struct Hsp
Expand Down Expand Up @@ -93,7 +94,7 @@ struct Hsp
matrix(nullptr)
{}

Hsp(const IntermediateRecord &r, unsigned query_source_len, Loc qlen, Loc tlen, const OutputFormat* output_format);
Hsp(const IntermediateRecord &r, unsigned query_source_len, Loc qlen, Loc tlen, const OutputFormat* output_format, const Stats::Blastn_Score *dna_score_builder = nullptr);
Hsp(const ApproxHsp& h, Loc qlen, Loc tlen);

struct Iterator
Expand Down
74 changes: 37 additions & 37 deletions src/basic/packed_loc.h
Expand Up @@ -3,7 +3,7 @@ DIAMOND protein aligner
Copyright (C) 2013-2020 Max Planck Society for the Advancement of Science e.V.
Benjamin Buchfink
Eberhard Karls Universitaet Tuebingen
Code developed by Benjamin Buchfink <benjamin.buchfink@tue.mpg.de>
This program is free software: you can redistribute it and/or modify
Expand All @@ -28,42 +28,42 @@ along with this program. If not, see <http://www.gnu.org/licenses/>.

struct packed_uint40_t
{
uint8_t high;
uint32_t low;
packed_uint40_t():
high (),
low ()
{ }
packed_uint40_t(uint64_t v):
high ((uint8_t)(v>>32)),
low ((uint32_t)(v&0xfffffffflu))
{ }
packed_uint40_t& operator=(uint32_t x) {
high = 0;
low = x;
return *this;
}
operator uint64_t() const
{ return (uint64_t(high) << 32) | low; }
operator int64_t() const {
return (int64_t(high) << 32) | (int64_t)low;
}
operator uint32_t() const {
return low;
}
operator int32_t() const {
return low;
}
bool operator==(const packed_uint40_t& rhs) const {
return high == rhs.high && low == rhs.low;
}
bool operator!=(const packed_uint40_t& rhs) const {
return high != rhs.high && low != rhs.low;
}
bool operator<(const packed_uint40_t &rhs) const
{ return high < rhs.high || (high == rhs.high && low < rhs.low); }
friend uint64_t operator-(const packed_uint40_t &x, const packed_uint40_t &y)
{ return (uint64_t)(x) - (uint64_t)(y); }
uint8_t high;
uint32_t low;
packed_uint40_t():
high (),
low ()
{ }
packed_uint40_t(uint64_t v):
high ((uint8_t)(v>>32)),
low ((uint32_t)(v&0xfffffffflu))
{ }
packed_uint40_t& operator=(uint32_t x) {
high = 0;
low = x;
return *this;
}
operator uint64_t() const
{ return (uint64_t(high) << 32) | low; }
operator int64_t() const {
return (int64_t(high) << 32) | (int64_t)low;
}
operator uint32_t() const {
return low;
}
operator int32_t() const {
return low;
}
bool operator==(const packed_uint40_t& rhs) const {
return high == rhs.high && low == rhs.low;
}
bool operator!=(const packed_uint40_t& rhs) const {
return high != rhs.high && low != rhs.low;
}
bool operator<(const packed_uint40_t &rhs) const
{ return high < rhs.high || (high == rhs.high && low < rhs.low); }
friend uint64_t operator-(const packed_uint40_t &x, const packed_uint40_t &y)
{ return (uint64_t)(x) - (uint64_t)(y); }
} PACKED_ATTRIBUTE ;

typedef packed_uint40_t PackedLoc;
Expand Down
3 changes: 3 additions & 0 deletions src/basic/value.h
Expand Up @@ -71,6 +71,9 @@ struct ValueTraits
#define AMINO_ACID_ALPHABET "ARNDCQEGHILKMFPSTWYVBJZX*_"
#define AMINO_ACID_COUNT (int(sizeof(AMINO_ACID_ALPHABET) - 1))

#define NUCLEOTIDE_ALPHABET "ACGTN"
#define NUCLEOTIDE_COUNT (sizeof(NUCLEOTIDE_ALPHABET) -1)

constexpr Letter MASK_LETTER = 23;
constexpr Letter STOP_LETTER = 24;
constexpr Letter SUPER_HARD_MASK = 25;
Expand Down
2 changes: 1 addition & 1 deletion src/cluster/helpers.cpp
Expand Up @@ -218,7 +218,7 @@ void init_thresholds() {
}

File* open_out_tsv() {
File* file = new File(Schema{ Type::STRING, Type::STRING }, config.output_file, Flags::WRITE | Flags::OVERWRITE);
File* file = new File(Schema{ Type::STRING, Type::STRING }, config.output_file, Flags::WRITE);
if (Blast_tab_format::header_format(::Config::cluster) == Header::SIMPLE)
file->write_record("centroid", "member");
return file;
Expand Down
10 changes: 6 additions & 4 deletions src/data/blastdb/blastdb.cpp
Expand Up @@ -88,11 +88,13 @@ list<CRef<CSeq_id>>::const_iterator best_id(const list<CRef<CSeq_id>>& ids) {

BlastDB::BlastDB(const std::string& file_name, Metadata metadata, Flags flags, const ValueTraits& value_traits) :
SequenceFile(Type::BLAST, Alphabet::NCBI, flags, FormatFlags::TITLES_LAZY | FormatFlags::SEEKABLE | FormatFlags::LENGTH_LOOKUP, value_traits),
file_name_(file_name),
file_name_(file_name),
db_(new CSeqDBExpert(file_name, CSeqDB::eProtein)),
oid_(0),
long_seqids_(false),
flags_(flags)
flags_(flags),
sequence_count_(db_->GetNumOIDs()),
sparse_sequence_count_(db_->GetNumSeqs())
{
if (flag_any(metadata, Metadata::TAXON_NODES | Metadata::TAXON_MAPPING | Metadata::TAXON_SCIENTIFIC_NAMES | Metadata::TAXON_RANKS))
throw std::runtime_error("Taxonomy features are not supported for the BLAST database format.");
Expand Down Expand Up @@ -242,12 +244,12 @@ std::vector<Letter> BlastDB::dict_seq(DictId dict_id, const size_t ref_block) co

int64_t BlastDB::sequence_count() const
{
return db_->GetNumOIDs();
return sequence_count_;
}

int64_t BlastDB::sparse_sequence_count() const
{
return db_->GetNumSeqs();
return sparse_sequence_count_;
}

size_t BlastDB::letters() const
Expand Down
1 change: 1 addition & 0 deletions src/data/blastdb/blastdb.h
Expand Up @@ -67,6 +67,7 @@ struct BlastDB : public SequenceFile {
int oid_;
const bool long_seqids_;
const Flags flags_;
int64_t sequence_count_, sparse_sequence_count_;
BitVector oid_filter_;

friend void load_blast_seqid();
Expand Down
8 changes: 6 additions & 2 deletions src/data/enum_seeds.h
Expand Up @@ -48,7 +48,10 @@ Search::SeedStats enum_seeds_minimizer(SequenceSet* seqs, F* f, unsigned begin,
continue;
seqs->convert_to_std_alph(i);
const Sequence seq = (*seqs)[i];
Reduction::reduce_seq(seq, buf);
if (align_mode.mode != AlignMode::blastn)
Reduction::reduce_seq(seq, buf);
else
buf = seq.copy();
for (size_t shape_id = cfg.shape_begin; shape_id < cfg.shape_end; ++shape_id) {
const Shape& sh = shapes[shape_id];
if (seq.length() < sh.length_) continue;
Expand Down Expand Up @@ -78,6 +81,7 @@ void enum_seeds_hashed(SequenceSet* seqs, F* f, unsigned begin, unsigned end, co
const Shape& sh = shapes[shape_id];
if (seq.length() < sh.length_) continue;
const uint64_t shape_mask = sh.long_mask();
//const __m128i shape_mask = sh.long_mask_sse_;
HashedSeedIterator<BITS> it(seq, sh);
Loc j = 0;
while (it.good()) {
Expand All @@ -103,7 +107,7 @@ void enum_seeds_contiguous(SequenceSet* seqs, F* f, unsigned begin, unsigned end
const Sequence seq = (*seqs)[i];
if (seq.length() < It::length()) continue;
It it(seq);
Loc j = 0;
size_t j = 0;
while (it.good()) {
if (it.get(key))
if (filter->contains(key, 0))
Expand Down
23 changes: 15 additions & 8 deletions src/data/seed_array.h
Expand Up @@ -49,18 +49,25 @@ struct SeedArray
#else
value(pos)
#endif
{}
SeedOffset key;

struct GetKey {
uint32_t operator()(const Entry& e) const {
return e.key;
}
};
{}
bool operator<(const Entry& entry)const{
return this->key < entry.key;
}
bool operator==(const Entry& entry)const{
return this->key == entry.key && this->value == entry.value;
}
SeedOffset key;

struct GetKey {
uint32_t operator()(const Entry& e) const {
return e.key;
}
};

SeedLoc value;
using Key = decltype(key);
using Value = decltype(value);
using value_type = Entry;
} PACKED_ATTRIBUTE;

template<typename _filter>
Expand Down
1 change: 1 addition & 0 deletions src/data/sequence_file.cpp
Expand Up @@ -842,6 +842,7 @@ void prep_db() {
else
FastaFile::prep_db(config.database);
#else
else
throw runtime_error("Database file is not a BLAST database");
#endif
}
Expand Down
13 changes: 9 additions & 4 deletions src/data/taxon_list.cpp
Expand Up @@ -64,10 +64,15 @@ static void load_mapping_file(ExternalSorter<pair<string, TaxId>>& sorter)
string accession, last;

while (!f.eof() && (f.getline(), !f.line.empty())) {
if (format == 0)
Util::String::Tokenizer(f.line, "\t") >> Util::String::Skip() >> accession >> taxid;
else
Util::String::Tokenizer(f.line, "\t") >> accession >> taxid;
try {
if (format == 0)
Util::String::Tokenizer(f.line, "\t") >> Util::String::Skip() >> accession >> taxid;
else
Util::String::Tokenizer(f.line, "\t") >> accession >> taxid;
}
catch (Util::String::TokenizerException&) {
throw std::runtime_error("Malformed input in line " + std::to_string(f.line_count));
}

if (accession.empty())
throw std::runtime_error("Empty accession field in line " + std::to_string(f.line_count));
Expand Down

0 comments on commit 40e440f

Please sign in to comment.