Error 68 : Unfinished stream #42

adamant-pwn · 2024-02-09T17:27:53Z

Hi there! In jermp/sshash#39, @jermp suggested that I use ggcat to produce input datasets for sshash.

I tried using ggcat, but unfortunately something seems wrong:

$ gzip -d se.ust.k31.fa.gz 
$ ggcat build -k 31 -j 8 --eulertigs se.ust.k31.fa 
...
Final output saved to: output.fasta.lz4
$ lz4 -d output.fasta.lz4 
Decoding file output.fasta 
Error 68 : Unfinished stream

This is using se.ust.k31.fa.gz as an input dataset for ggcat. Ultimately, I want to apply ggcat to compute eulertigs of Homo_sapiens.GRCh38.dna.toplevel.fa.gz for k=127, but with that dataset I too end up having Unfinished stream errors, and the resulting file is much smaller than I anticipate. Could you please advice if I'm doing anything wrong here?

The text was updated successfully, but these errors were encountered:

Guilucand · 2024-03-09T16:08:18Z

Hi! This problem is due to 2 different things:

the abundance cutoff (-s flag), that by default is set to 2, thus removing all kmers that do not appear at least 2 times
a bug in the closing of fasta files, that does not properly terminate empty output files

I just fixed the second problem, but probably all you want is to pass the flag -s 1 to ggcat to lower the cutoff.

enricorox · 2024-05-07T16:43:21Z

Same problem here. It would be nice to print a warning in case the fasta file is empty.

adamant-pwn mentioned this issue Feb 9, 2024

Support larger alphabets and k via generic kmer_t jermp/sshash#39

Merged

Guilucand closed this as completed in 3c5b0fa Mar 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error 68 : Unfinished stream #42

Error 68 : Unfinished stream #42

adamant-pwn commented Feb 9, 2024

Guilucand commented Mar 9, 2024

enricorox commented May 7, 2024

Error 68 : Unfinished stream #42

Error 68 : Unfinished stream #42

Comments

adamant-pwn commented Feb 9, 2024

Guilucand commented Mar 9, 2024

enricorox commented May 7, 2024