You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some stats programs have things like kmer (with -K) reports and probe-id
counting (with -D).
These programs can consume a lot of RAM (>10GB), even with the highly efficient
sparsehash library on very large files (> 200 mil reads).
The use of a disk-backed key-value store, like levelDB could see decent
performance, like a hash, but would also allow growth past available RAM with
decent performance. I'm thinking that the code should switch to a DB-backed
store at the 200 mil record level. This would slow things down by about 3x
(from 1 mil writes/sec to 300k writes/sec), but would also allow infinte
growth. Enabling a large LRU cache could it perform so similarly that the
sparse hash can be abandoned, especially if the db remains an insigificant
fraction of the stats collection process.
Original issue reported on code.google.com by earone...@gmail.com on 9 Jul 2014 at 2:26
The text was updated successfully, but these errors were encountered:
Going to do this by a) allowing detection of a pre-sorting by probe-id when run
with -D ... if detected... RAM is freed and duplication detection proceeds
without the need for a hash. Other hashes (like kmers) can bw switched to
some sort of counting bloom filter
Original comment by earone...@gmail.com on 8 Sep 2014 at 8:16
Original issue reported on code.google.com by
earone...@gmail.com
on 9 Jul 2014 at 2:26The text was updated successfully, but these errors were encountered: