Request for ntCard option to specify directory for temporary files #5

mahesh-panchal · 2017-08-08T13:59:54Z

Hi,

Could I request an option to write temporary files to a user specified location please?
I was just testing ntCard with my data and the job gave no output. The cluster reported that the hard disk quota had been reached where my input files were, although the output was supposed to be written to a folder in my home directory. I guess this implies that temporary files were being written to the project folder which is at max quota.

My script:

#! /bin/bash

#SBATCH -J "ntCard test"
#SBATCH -A b2010042
#SBATCH -t 7-00:00:00
#SBATCH -n 16
#SBATCH -p node
#SBATCH -e log.ntcard_Spruce.2017-08-08_13.00-%j.out
#SBATCH -o log.ntcard_Spruce.2017-08-08_13.00-%j.out

export PATH="~/bin/ntCard/bin:$PATH"
FQDIR=/proj/b2010042/nobackup/douglas/fosmid-pool-data/raw-data/
time ntcard -t $SLURM_NPROCS -p spruce_freq $FQDIR/*.fq.gz

Thank you for telling me about ntCard. It was nice to meet you at ISMB.
Regards,
Mahesh.

The text was updated successfully, but these errors were encountered:

mohamadi · 2017-08-08T17:17:39Z

@mahesh-panchal Hi Mahesh, ntCard does not generate intermediate files. The output histogram files from ntCard are about few hundred bytes.

From your script I see you're using .gz files as inputs. So, the issue may be related to OS TEMP space related to gzip processes. Can you change your TMPDIR to somewhere with enough space such as /var/tmp?

export TMPDIR=/var/tmp

Another solution could be reducing the number of processes in your script, i.e. $SLURM_NPROCS.

mahesh-panchal · 2017-08-09T08:24:28Z

Hmm. Interesting. The problem cannot be the TMPDIR since this is set to use node scratch disk (/scratch/<job_id>) and that has quite a bit of space.

What is the reasoning behind using reducing the number of cores used?

mahesh-panchal · 2017-08-09T12:02:00Z

I've tried some other things too now (including setting TMPDIR), like cd'ing to the output directory, symlinking the files to the output directory, but I'm still not getting output. The cluster hasn't reported an error this time either, but the weird thing is the absence of output now. All the input files definitely exist, and are not broken symlinks.

Is there supposed to be more written to the screen than:

Runtime(sec): 4638.3723

real    77m18.667s
user    262m28.935s
sys     14m13.065s

?

mohamadi · 2017-08-09T17:35:24Z

@mahesh-panchal

What is the reasoning behind using reducing the number of cores used?

Every thread will be working on separate .gz file in parallel. Each thread will fork a gzip process to read the fq.gz files and the higher the number of threads, the more TMP space required for gzip processes in total.

Is there supposed to be more written to the screen than:

I just realized you haven't specified the value(s) of k in your script. Please include it by adding -k option. For example for k=64 use:

time ntcard -k 64 -t $SLURM_NPROCS -p spruce_freq $FQDIR/*.fq.gz

By default the outputs will be written on freq_k$k.hist in the CWD. In your script you have specified spruce_freq as prefix for outputs, so you should see spruce_freq_k64.hist in the current or specified working directory.

mahesh-panchal · 2017-08-10T09:58:16Z

Thank you Hamid,

After including the -k option, it now works. The output is now there.

Thanks again for puzzling this through with me.

Regards,
Mahesh.

mohamadi self-assigned this Aug 8, 2017

mohamadi added the help wanted label Aug 8, 2017

mahesh-panchal closed this as completed Aug 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for ntCard option to specify directory for temporary files #5

Request for ntCard option to specify directory for temporary files #5

mahesh-panchal commented Aug 8, 2017

mohamadi commented Aug 8, 2017 •

edited

Loading

mahesh-panchal commented Aug 9, 2017

mahesh-panchal commented Aug 9, 2017 •

edited

Loading

mohamadi commented Aug 9, 2017 •

edited

Loading

mahesh-panchal commented Aug 10, 2017

Request for ntCard option to specify directory for temporary files #5

Request for ntCard option to specify directory for temporary files #5

Comments

mahesh-panchal commented Aug 8, 2017

mohamadi commented Aug 8, 2017 • edited Loading

mahesh-panchal commented Aug 9, 2017

mahesh-panchal commented Aug 9, 2017 • edited Loading

mohamadi commented Aug 9, 2017 • edited Loading

mahesh-panchal commented Aug 10, 2017

mohamadi commented Aug 8, 2017 •

edited

Loading

mahesh-panchal commented Aug 9, 2017 •

edited

Loading

mohamadi commented Aug 9, 2017 •

edited

Loading