Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File locking blocks indefinitely in writePileUps #6

Closed
a-ludi opened this issue May 25, 2020 · 15 comments
Closed

File locking blocks indefinitely in writePileUps #6

a-ludi opened this issue May 25, 2020 · 15 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@a-ludi
Copy link
Owner

a-ludi commented May 25, 2020

Hi Arne,

[...] The job seems to be stuck in “dentist collect”. The file “workdir/pile-ups.db” was created but it is empty. The node that it is running on shows that 71G of memory is used and 52G available.

Here are the last entries in the “collect.log” file.

{"thread":140737354013504,"timestamp":637255135504311334,"numPileUps":260,"numAlignmentChains":3186}
{"thread":140737354013504,"timestamp":637255135504323016,"state":"exit","function":"dentist.commands.collectPileUps.PileUpCollector.buildPileUps","timeElapsed":25742682}
{"thread":140737354013504,"timestamp":637255135504332559,"state":"enter","function":"dentist.commands.collectPileUps.PileUpCollector.writePileUps"}

Do you have any ideas? Could there be a problem with the pipeline before it hit dentist collect? Watching it run is a thing of beauty!

Regards,

Randy

Originally posted by @BradleyRan in #3 (comment)

@a-ludi a-ludi self-assigned this May 25, 2020
@a-ludi a-ludi added the bug Something isn't working label May 25, 2020
@a-ludi
Copy link
Owner Author

a-ludi commented May 25, 2020

Hi Randy,

I don't have a clue so far. It looks like some kind of memory leak in the writer for the binary pileups format. Here are some questions to get some light into the dark:

  1. Is the memory consumption rising over time?
  2. How big is the input .las files? numAlignmentChains is just 3186 so I guess it's rather small.
  3. Is it possible to share the input files? If yes, please drop them into my ownCloud so I can try reproducing the bug.
  4. Can you please share the full command that is being executed (you may remove sensitive names from the paths)? You can make snakemake report the command by running it with options -np.

@a-ludi a-ludi changed the title Hi Arne, High memory consumption in writePileUps May 25, 2020
@BradleyRan
Copy link

BradleyRan commented May 25, 2020 via email

@BradleyRan
Copy link

BradleyRan commented May 27, 2020 via email

@a-ludi
Copy link
Owner Author

a-ludi commented May 28, 2020

Sure, perfect!

@BradleyRan
Copy link

BradleyRan commented May 28, 2020 via email

@a-ludi
Copy link
Owner Author

a-ludi commented Jun 2, 2020

Hey Randy,

got the files and they are OK. I will write to you once I have news.

@a-ludi
Copy link
Owner Author

a-ludi commented Jun 4, 2020

So, in my setup it worked. It took 08:55 hours with 2 CPUs and consumed max. RSS of 86.5 GB. Most of the time (6.7h) was spent reading the large alignment file. The routine is not very optimized, I have to admit.

Now that you know how many resources are required, can you try again? I ran the job with 24h time limit and max allowed RSS of 128G.

@a-ludi a-ludi added help wanted Extra attention is needed and removed bug Something isn't working labels Jun 4, 2020
@BradleyRan
Copy link

BradleyRan commented Jun 7, 2020 via email

@BradleyRan
Copy link

BradleyRan commented Jun 8, 2020 via email

@BradleyRan
Copy link

BradleyRan commented Jun 11, 2020 via email

a-ludi added a commit that referenced this issue Jun 12, 2020
might resolve issue #6
@a-ludi
Copy link
Owner Author

a-ludi commented Jun 12, 2020

Hi Randy,

the write-protection is likely not the cause of your issue because it is made by snakemake after dentist finished successfully. You may double-check this by verifying if dentist is still active.

I guess it might be file-locking: dentist tries to lock files it reads or writes via flockfile. If an error occurs (like it does on our cluster because file locking is not implemented) it will just open the file without locking and continue. In contrast, it will just get stuck if the file lock cannot be acquired for some reason.

I created another version of dentist that allows skipping the locking step entirely. I hope it just works because I did not test it at all.

dentist.v1.0.0-beta.1-6-gce64df8.x86_64.tar.gz

@BradleyRan
Copy link

BradleyRan commented Jun 12, 2020 via email

@BradleyRan
Copy link

BradleyRan commented Jun 13, 2020 via email

@a-ludi
Copy link
Owner Author

a-ludi commented Jun 13, 2020

Sorry, I forgot to mention: SKIP_FILE_LOCKING=1 should do the trick.

@BradleyRan
Copy link

BradleyRan commented Jun 13, 2020 via email

@a-ludi a-ludi changed the title High memory consumption in writePileUps File locking blocks indefinitely in writePileUps Jul 23, 2020
@a-ludi a-ludi added the bug Something isn't working label Jul 23, 2020
@a-ludi a-ludi closed this as completed Jul 23, 2020
a-ludi added a commit that referenced this issue Jul 23, 2020
might resolve issue #6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants