Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge temporary files when running gretel on very large contigs #31

Open
SamStudio8 opened this issue Jan 7, 2020 · 7 comments
Open

Comments

@SamStudio8
Copy link
Owner

Although Gretel is not designed for recovering large haplotypes, it should at least try its best. Apparently very large contigs will cause Gretel to write very large temporary files and lead to an OSError.

[...]
  File "/home/epi_mher/miniconda2/envs/py3/lib/python3.5/multiprocessing/heap.py", line 231, in malloc
    (arena, start, stop) = self._malloc(size)
  File "/home/epi_mher/miniconda2/envs/py3/lib/python3.5/multiprocessing/heap.py", line 129, in _malloc
    arena = Arena(length)
  File "/home/epi_mher/miniconda2/envs/py3/lib/python3.5/multiprocessing/heap.py", line 81, in __init__
    assert f.tell() == size
OSError: [Errno 28] No space left on device

First reported by @mherold1 in #30.

@SamStudio8
Copy link
Owner Author

Although this is not desired behaviour, it is not high priority as it is off-label use of gretel.

@jsgounot
Copy link

jsgounot commented Nov 9, 2020

Hi. Do you think this behavior will be resolved or managed at some point ? At least it should be specified, I almost crashed my computer trying Gretel just minutes ago. Moreover, it could be good to specify that VCF file has to be bziped, otherwise you got an uninformative pyVCF error. Thanks.

@SamStudio8
Copy link
Owner Author

SamStudio8 commented Nov 9, 2020

Hi @jsgounot, thanks for the comment and I'm sorry about locking up all your storage! I don't intend to resolve this any time soon as Gretel is designed for local haplotyping on "short" regions (intuition here https://www.biorxiv.org/content/10.1101/2020.08.10.244848v1). I would love to get the time in future to improve the storage requirements for Hansel to help with this problem but I can't promise anything. Locking up your machine is totally undesired behaviour though, and I should try and catch this use-case with a warning (perhaps one that can be overrriden with --force or something). Out of interest, what was the size of the region you specified?

On your second point I note the requirement is stressed in the README, but you are absolutely right in that it should raise an error on the CLI if it looks the wrong format. Thanks. (#33)

@jsgounot
Copy link

jsgounot commented Nov 9, 2020

Thanks for the reply. Well, I guess it was by far exceeding what we can call a short region, I will try with a real and shorter one (I used a way too large and random test bamfile with hundreds of kb).

@SamStudio8
Copy link
Owner Author

No problem - thanks for taking the time to report. Good luck!

@kangxiongbin
Copy link

Although Gretel is not designed for recovering large haplotypes, it should at least try its best. Apparently very large contigs will cause Gretel to write very large temporary files and lead to an OSError.

Hi @SamStudio8,
I want to know how large haplotypes Gretel can recover? Can I use Gretel to recover some bacterial genomes in metagenome data? The genome size of these bacteria may be 2~7M.

@SamStudio8
Copy link
Owner Author

SamStudio8 commented Oct 19, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants