Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing tmp_nonduplicates.##### #102

Open
BarryDigby opened this issue Mar 25, 2022 · 0 comments
Open

Writing tmp_nonduplicates.##### #102

BarryDigby opened this issue Mar 25, 2022 · 0 comments

Comments

@BarryDigby
Copy link

BarryDigby commented Mar 25, 2022

Hi Tobias,

I've been running DCC on a dataset and have noticed writing tmp_nonduplicates.# files is taking an extremely long time. For context, here is the _tmp_DCC/ after 23 hours of running:

total 1.1G
-rw-r--r-- 1 bdigby 373M Mar 24 14:53 fust1_1.Chimeric.out.junction.PLJSNR
-rw-r--r-- 1 bdigby  15M Mar 25 14:12 tmp_duplicates.8D5DA9
-rw-r--r-- 1 bdigby 248M Mar 24 14:53 tmp_merged
-rw-r--r-- 1 bdigby  78M Mar 25 14:47 tmp_nonduplicates.8D5DA9
-rw-r--r-- 1 bdigby 164M Mar 24 14:53 tmp_printcirclines.8D5DA9
-rw-r--r-- 1 bdigby 248M Mar 24 14:53 tmp_twochimera

The resources requested for this job are as follows:

#!/bin/bash
#SBATCH -D /data/bdigby/Projects/large_test_data/work/19/da9c1aa6ff81627fd501b664e03b81
#SBATCH -J nf-DCC_(fust1_1)
#SBATCH -o /data/bdigby/Projects/large_test_data/work/19/da9c1aa6ff81627fd501b664e03b81/.command.log
#SBATCH --no-requeue
#SBATCH -c 16
#SBATCH -t 72:00:00
#SBATCH --mem 112640M
#SBATCH -p highmem
# NEXTFLOW TASK: DCC (fust1_1)

Can you offer any insights on what might be limiting this step? i.e do you think perhaps increasing/reducing available resources might expedite the process?

It would also be useful to get an idea of the final size of the tmp_nonduplicates.# - will it be a similar size to tmp_printcircles.#? This can help me gauge an appropriate TimeLimit through trial and error.


Another layer to this is two of the six samples have stopped running but bizarrely did not produce an exit code error. See below for the line in the nextflow log:

Mar-25 12:47:22.254 [Task monitor] DEBUG nextflow.executor.GridTaskHandler - Failed to get exit status for process TaskHandler[jobId: 6058404; id: 97; name: DCC (N2_1); status: RUNNING; exit: -; error: -; workDir: /data/bdigby/Projects/large_test_data/work/fd/6a0841a7f3d2471b4483b52d998f6e started: 1648130944677; exited: -; ] -- exitStatusReadTimeoutMillis: 270000; delta: 270018

I contacted the system administrator but he was not able to see any evidence of resources being exceeded (nextflow would have also reported this).

Any insights as to why this step might fail would be extremely useful.


N.B The analysis is on WBcel253, having used DCC multiple times on human datasets, I am surprised by this behaviour with a relatively small reference genome.

Thanks in advance,

Barry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant