Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error 68 : Unfinished stream #42

Closed
adamant-pwn opened this issue Feb 9, 2024 · 2 comments
Closed

Error 68 : Unfinished stream #42

adamant-pwn opened this issue Feb 9, 2024 · 2 comments

Comments

@adamant-pwn
Copy link

Hi there! In jermp/sshash#39, @jermp suggested that I use ggcat to produce input datasets for sshash.

I tried using ggcat, but unfortunately something seems wrong:

$ gzip -d se.ust.k31.fa.gz 
$ ggcat build -k 31 -j 8 --eulertigs se.ust.k31.fa 
...
Final output saved to: output.fasta.lz4
$ lz4 -d output.fasta.lz4 
Decoding file output.fasta 
Error 68 : Unfinished stream 

This is using se.ust.k31.fa.gz as an input dataset for ggcat. Ultimately, I want to apply ggcat to compute eulertigs of Homo_sapiens.GRCh38.dna.toplevel.fa.gz for k=127, but with that dataset I too end up having Unfinished stream errors, and the resulting file is much smaller than I anticipate. Could you please advice if I'm doing anything wrong here?

@Guilucand
Copy link
Collaborator

Hi! This problem is due to 2 different things:

  • the abundance cutoff (-s flag), that by default is set to 2, thus removing all kmers that do not appear at least 2 times
  • a bug in the closing of fasta files, that does not properly terminate empty output files

I just fixed the second problem, but probably all you want is to pass the flag -s 1 to ggcat to lower the cutoff.

@enricorox
Copy link

Same problem here. It would be nice to print a warning in case the fasta file is empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants