Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize single-threaded read_converter step and move *.seq files out of --tmp-dir after every input library is converted #67

Open
mmokrejs opened this issue Jan 21, 2018 · 9 comments

Comments

@mmokrejs
Copy link

Hi,
although I provided 19 input files the code run in a single thread. To further scale it could also do the conversion in multiple chunks on each file?

spades.py \
--only-assembler \
--pe1-1 /scratch/mygenome/paired_end_497bp_201709/HT5V3BCXY.2.tt_16D1C3L12.trimmomatic.paired.prinseq.minlen20.3091.3091.pairs_1.fastq \
--pe1-2 /scratch/mygenome/paired_end_497bp_201709/HT5V3BCXY.2.tt_16D1C3L12.trimmomatic.paired.prinseq.minlen20.3091.3091.pairs_2.fastq \
--pe2-1 /scratch/mygenome/paired_end_619bp/HKMHTBCXX.1.tt_16D1C3L12.trimmomatic.paired.prinseq.minlen20.19552.19552.pairs_1.fastq \
--pe2-2 /scratch/mygenome/paired_end_619bp/HKMHTBCXX.1.tt_16D1C3L12.trimmomatic.paired.prinseq.minlen20.19552.19552.pairs_2.fastq \
--pe3-1 /scratch/mygenome/mate_pairs_201709/HWFNLBCXY.2.tt_16D1C3L12.trimmomatic.bbduk.splitnextera.fragments_1.fastq \
--pe3-2 /scratch/mygenome/mate_pairs_201709/HWFNLBCXY.2.tt_16D1C3L12.trimmomatic.bbduk.splitnextera.fragments_2.fastq \
--pe3-s /scratch/mygenome/mate_pairs_201709/HWFNLBCXY.2.tt_16D1C3L12.trimmomatic.bbduk.splitnextera.singletons.fq \
--pe4-1 /scratch/mygenome/mate_pairs_201609/HFYJ5AFXX.5kb.trimmomatic.bbduk.splitnextera.fragments_1.fastq \
--pe4-2 /scratch/mygenome/mate_pairs_201609/HFYJ5AFXX.5kb.trimmomatic.bbduk.splitnextera.fragments_2.fastq \
--pe4-s /scratch/mygenome/mate_pairs_201609/HFYJ5AFXX.5kb.trimmomatic.bbduk.splitnextera.singletons.fq \
--pe5-1 /scratch/mygenome/mate_pairs_201609/HFYJ5AFXX.8kb.trimmomatic.bbduk.splitnextera.fragments_1.fastq \
--pe5-2 /scratch/mygenome/mate_pairs_201609/HFYJ5AFXX.8kb.trimmomatic.bbduk.splitnextera.fragments_2.fastq \
--pe5-s /scratch/mygenome/mate_pairs_201609/HFYJ5AFXX.8kb.trimmomatic.bbduk.splitnextera.singletons.fq \
--mp1-1 /scratch/mygenome/mate_pairs_201709/HWFNLBCXY.2.tt_16D1C3L12.trimmomatic.bbduk.splitnextera.lmp_1.fastq \
--mp1-2 /scratch/mygenome/mate_pairs_201709/HWFNLBCXY.2.tt_16D1C3L12.trimmomatic.bbduk.splitnextera.lmp_2.fastq \
--mp2-1 /scratch/mygenome/mate_pairs_201609/HFYJ5AFXX.5kb.trimmomatic.bbduk.splitnextera.lmp_1.fastq \
--mp2-2 /scratch/mygenome/mate_pairs_201609/HFYJ5AFXX.5kb.trimmomatic.bbduk.splitnextera.lmp_2.fastq \
--mp3-1 /scratch/mygenome/mate_pairs_201609/HFYJ5AFXX.8kb.trimmomatic.bbduk.splitnextera.lmp_1.fastq \
--mp3-2 /scratch/mygenome/mate_pairs_201609/HFYJ5AFXX.8kb.trimmomatic.bbduk.splitnextera.lmp_2.fastq \
--trusted-contigs /scratch/work/project/bio/open-9-41/assemblies/tadpole_k165/tt_16D1C3L12.tadpole.contigs.k165.fa \
-t 104 --nanopore /scratch/mygenome/OxfordNanopore/tt_16D1C3L12.OxNano.fastq -m 3000 -k 55,77,99,127 -o tt_16D1C3L12__SPAdes3.11.1_noecc

This probably won't happen soon but let me open a feature request for this. Current version is SPAdes3.11.1. Thank you.

@asl
Copy link
Member

asl commented Jan 21, 2018

This part is normally I/O bound, so multiple threads would make the situation even worse.

@mmokrejs
Copy link
Author

mmokrejs commented Jan 21, 2018

We have a parallel filesystem (LustreFS) served by I think 54 working slave machines, and infiniband inbetween. How are the data laid over the many hosts and drives is user configurable per directory or even per file. Stripe size is currently 1MB I think.

And would I be sure the data fits into memory, I would use ramdisk for the actual processing and then move the resulting files into storage filesystem. Oh yes, it does:

$ du -sh mygenome__SPAdes3.11.1_noecc/.bin_reads/
56G	mygenome__SPAdes3.11.1_noecc/.bin_reads/
$

The input uncompressed FASTQ files occupied 435.86GB.

@mmokrejs
Copy link
Author

mmokrejs commented Jan 21, 2018

Here you can see the "disc" traffic is 102MB/s on average, more reading than writing.

112 x86_64 Intel(R) Xeon(R) CPU E5-4627 v2 @ 3.30GHz are available with 3.2TB physical, local RAM

memory_usage__binary-conversion
cpu_load__binary-conversion
filesytem-usage__binary-conversion

@asl
Copy link
Member

asl commented Jan 21, 2018

Here you can see the "disc" traffic is 104MB/s on average, more reading than writing.

This is how it should be. We're reading FASTQ (text format) and convert to the internal binary format. The read:write ratio 9:1 is very close to the text FASTQ : SPAdes binary format file size ratio.

@mmokrejs
Copy link
Author

mmokrejs commented Jan 21, 2018

Here is what the filesystem handles if applications are properly written to read/write in large chunks. A very efficient alternative. bamsort comes from https://github.com/gt1/biobambam2

# samtools sort of a 149GB BAM file takes 1.2TB RAM and uses only a single thread despite '-@ 15' argument
# samtools sort -@ $xthreads -m "$gb_mem_per_thread"G -O bam -T "$1" -o "$2".sorted.bam "$2".bam || exit 255
# 
# bamsort comes from https://github.com/gt1/biobambam2
LIBMAUS2_POSIXFDINPUT_BLOCKSIZE_OVERRIDE==1m
export LIBMAUS2_POSIXFDINPUT_BLOCKSIZE_OVERRIDE
bamsort SO=coordinate blockmb="$take_memory" inputthreads="$input_threads" outputthreads="$output_threads" level=9 index=1 I="$2".bam O="$2".sorted.bam

bamsort_lustrefs_usage__stripe54__8cpus__no_hugepage_defrag__job897845 isrv5

The currently running SPAdes process running read_converter.hpp/binary_converter.hpp supposedly overloaded metadata servers of LustreFS and the kernel after 40minutes of attempts to flush buffers (see high system CPU load in red color in figures below) gave up. I see similar issues when apps write many and too small chunks appending to existing files. Running truss or strace or similar profiling tool should reveal the actual write size of SPAdes binaries.

spades_binary_read_conversion_supposedly_overloaded_lustrefs_metadata_servers

spades_binary_read_conversion_supposedly_overloaded_lustrefs_metadata_servers__cpu_load
spades_binary_read_conversion_supposedly_overloaded_lustrefs_metadata_servers__cpu_usage

@mmokrejs
Copy link
Author

mmokrejs commented Jan 24, 2018

I cannot login to the cluster node to verify this but although I am running spades.py --tmp-dir /ramdisk/$PBS_JOBID it seems it is still reading and writing at same pace to LustreFS (~100 kBps). Although I do not see any improvements in terms of the times how quickly spades.py moves to process the many input FASTA files.

And, while the log says now:

0:46:19.694 12M / 700M INFO General (read_converter.hpp : 84) Converting reads to binary format for library #6 (takes a while)

I should not see the paired_6_*.seq files on the networked filesystem until this step is complete, right? They should be still in --tmp-dir.

@asl
Copy link
Member

asl commented Jan 24, 2018

These files will be in the output dir since they are reused across iterations (= long living). Everything else will be on scratch.

@mmokrejs
Copy link
Author

mmokrejs commented Jan 24, 2018

I don't understand. The paired_6_*.seq have same modification timestamp because they were continually updated for some while during processing the library #6 of input files. This should have happened in --tmp-dir and then the paired_6_*.seq files should have been moved to tt_16D1C3L12__SPAdes3.11.1_noecc_ramdisk/.bin_reads/. But, until library #7 processing started these files should not be existing in tt_16D1C3L12__SPAdes3.11.1_noecc_ramdisk/.bin_reads/, so what am I missing?

@asl
Copy link
Member

asl commented Jan 24, 2018

This is not how it is done currently. We may consider doing this in some next SPAdes versions. Patches are always welcome though.

@mmokrejs mmokrejs changed the title Parallelize single-threaded read_converter step Parallelize single-threaded read_converter step and move *.seq files out of --tmp-dir after every input library is converted Jan 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants