In fastp v1.3.2, it appears that the -w argument does not bound total CPU usage; the process and its helpers collectively use significantly more CPUs than requested, whereas fastp v1.0.0 correctly respects -w.
fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz
I have changed the names of the files for this particular issue, but the report reflect the results.
no matter the specific number of threads, either default (3) or as above, the used number of threads when i check on htop is much higher.
So i have measured with/usr/bin/time and get the following report
`
Read1 before filtering:
total reads: 82352888
total bases: 12352933200
Q20 bases: 12050099606(97.5485%)
Q30 bases: 11581709517(93.7568%)
Q40 bases: 11581709517(93.7568%)
Read2 before filtering:
total reads: 82352888
total bases: 12352933200
Q20 bases: 12116273303(98.0842%)
Q30 bases: 11594441971(93.8598%)
Q40 bases: 11594441971(93.8598%)
Read1 after filtering:
total reads: 81975844
total bases: 10237966871
Q20 bases: 10197281628(99.6026%)
Q30 bases: 10023375747(97.904%)
Q40 bases: 10023375747(97.904%)
Read2 after filtering:
total reads: 81975844
total bases: 10239358182
Q20 bases: 10186133760(99.4802%)
Q30 bases: 9987975154(97.5449%)
Q40 bases: 9987975154(97.5449%)
Filtering result:
reads passed filter: 163951688
reads failed due to low quality: 555552
reads failed due to too many N: 2946
reads failed due to too short: 195024
reads failed due to adapter dimer: 566
reads with adapter trimmed: 87029494
bases trimmed due to adapters: 3841774077
Duplication rate: 15.9739%
Insert size peak (evaluated by paired-end reads): 112
JSON report: fastp.json
HTML report: fastp.html
fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz
fastp v1.3.2, time used: 172 seconds
Command being timed: "fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz"
User time (seconds): 1777.35
System time (seconds): 2234.69
Percent of CPU this job got: 2332%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:51.98
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1318620
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 9
Minor (reclaiming a frame) page faults: 3654687
Voluntary context switches: 22138170
Involuntary context switches: 37415
Swaps: 0
File system inputs: 981088
File system outputs: 16435776
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
`
So i can see in htop and from the report above it's an overhead of around 23 threads. Similarly, when I used the default value, the estimated number of threads is 15.
So when trying with the 12 or 16 threads, as I can see previous issues/comments have suggested, the actual number of used threads increases, which makes it difficult to plan ahead if you're running in Snakemake or in parallel.
However, when I download and try fastp 1.0.0 using the same command as previously stated, I get
`
/usr/bin/time -v fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz
Read1 before filtering:
total reads: 82352888
total bases: 12352933200
Q20 bases: 12050099606(97.5485%)
Q30 bases: 11581709517(93.7568%)
Q40 bases: 11581709517(93.7568%)
Read2 before filtering:
total reads: 82352888
total bases: 12352933200
Q20 bases: 12116273303(98.0842%)
Q30 bases: 11594441971(93.8598%)
Q40 bases: 11594441971(93.8598%)
Read1 after filtering:
total reads: 82063022
total bases: 10261844968
Q20 bases: 10219309314(99.5855%)
Q30 bases: 10039114627(97.8295%)
Q40 bases: 10039114627(97.8295%)
Read2 after filtering:
total reads: 82063022
total bases: 10261844968
Q20 bases: 10206334051(99.4591%)
Q30 bases: 10002769030(97.4753%)
Q40 bases: 10002769030(97.4753%)
Filtering result:
reads passed filter: 164126044
reads failed due to low quality: 576778
reads failed due to too many N: 2954
reads failed due to too short: 0
reads with adapter trimmed: 87028738
bases trimmed due to adapters: 4099873562
Duplication rate: 16.6401%
Insert size peak (evaluated by paired-end reads): 114
JSON report: fastp.json
HTML report: fastp.html
fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz
fastp v1.0.0, time used: 190 seconds
Command being timed: "fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz"
User time (seconds): 1651.50
System time (seconds): 22.39
Percent of CPU this job got: 882%
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:09.71
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1436024
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1
Minor (reclaiming a frame) page faults: 3198393
Voluntary context switches: 3086040
Involuntary context switches: 2003
Swaps: 0
File system inputs: 1758664
File system outputs: 16363864
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
`
So in this version 1.0.0, the number of threads used (9) matches the command (-w 8) when accounting for worker threads and file compression.
This makes it difficult with the newer version or impossible to set correct threads: values in Snakemake or request fair resources on shared HPC systems.
Best regards,
Rasmus
In fastp v1.3.2, it appears that the -w argument does not bound total CPU usage; the process and its helpers collectively use significantly more CPUs than requested, whereas fastp v1.0.0 correctly respects -w.
fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gzI have changed the names of the files for this particular issue, but the report reflect the results.
no matter the specific number of threads, either default (3) or as above, the used number of threads when i check on htop is much higher.
So i have measured with/usr/bin/time and get the following report
`
Read1 before filtering:
total reads: 82352888
total bases: 12352933200
Q20 bases: 12050099606(97.5485%)
Q30 bases: 11581709517(93.7568%)
Q40 bases: 11581709517(93.7568%)
Read2 before filtering:
total reads: 82352888
total bases: 12352933200
Q20 bases: 12116273303(98.0842%)
Q30 bases: 11594441971(93.8598%)
Q40 bases: 11594441971(93.8598%)
Read1 after filtering:
total reads: 81975844
total bases: 10237966871
Q20 bases: 10197281628(99.6026%)
Q30 bases: 10023375747(97.904%)
Q40 bases: 10023375747(97.904%)
Read2 after filtering:
total reads: 81975844
total bases: 10239358182
Q20 bases: 10186133760(99.4802%)
Q30 bases: 9987975154(97.5449%)
Q40 bases: 9987975154(97.5449%)
Filtering result:
reads passed filter: 163951688
reads failed due to low quality: 555552
reads failed due to too many N: 2946
reads failed due to too short: 195024
reads failed due to adapter dimer: 566
reads with adapter trimmed: 87029494
bases trimmed due to adapters: 3841774077
Duplication rate: 15.9739%
Insert size peak (evaluated by paired-end reads): 112
JSON report: fastp.json
HTML report: fastp.html
fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz
fastp v1.3.2, time used: 172 seconds
Command being timed: "fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz"
User time (seconds): 1777.35
System time (seconds): 2234.69
Percent of CPU this job got: 2332%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:51.98
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1318620
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 9
Minor (reclaiming a frame) page faults: 3654687
Voluntary context switches: 22138170
Involuntary context switches: 37415
Swaps: 0
File system inputs: 981088
File system outputs: 16435776
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
`
So i can see in htop and from the report above it's an overhead of around 23 threads. Similarly, when I used the default value, the estimated number of threads is 15.
So when trying with the 12 or 16 threads, as I can see previous issues/comments have suggested, the actual number of used threads increases, which makes it difficult to plan ahead if you're running in Snakemake or in parallel.
However, when I download and try fastp 1.0.0 using the same command as previously stated, I get
`
/usr/bin/time -v fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz
Read1 before filtering:
total reads: 82352888
total bases: 12352933200
Q20 bases: 12050099606(97.5485%)
Q30 bases: 11581709517(93.7568%)
Q40 bases: 11581709517(93.7568%)
Read2 before filtering:
total reads: 82352888
total bases: 12352933200
Q20 bases: 12116273303(98.0842%)
Q30 bases: 11594441971(93.8598%)
Q40 bases: 11594441971(93.8598%)
Read1 after filtering:
total reads: 82063022
total bases: 10261844968
Q20 bases: 10219309314(99.5855%)
Q30 bases: 10039114627(97.8295%)
Q40 bases: 10039114627(97.8295%)
Read2 after filtering:
total reads: 82063022
total bases: 10261844968
Q20 bases: 10206334051(99.4591%)
Q30 bases: 10002769030(97.4753%)
Q40 bases: 10002769030(97.4753%)
Filtering result:
reads passed filter: 164126044
reads failed due to low quality: 576778
reads failed due to too many N: 2954
reads failed due to too short: 0
reads with adapter trimmed: 87028738
bases trimmed due to adapters: 4099873562
Duplication rate: 16.6401%
Insert size peak (evaluated by paired-end reads): 114
JSON report: fastp.json
HTML report: fastp.html
fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz
fastp v1.0.0, time used: 190 seconds
Command being timed: "fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz"
User time (seconds): 1651.50
System time (seconds): 22.39
Percent of CPU this job got: 882%
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:09.71
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1436024
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1
Minor (reclaiming a frame) page faults: 3198393
Voluntary context switches: 3086040
Involuntary context switches: 2003
Swaps: 0
File system inputs: 1758664
File system outputs: 16363864
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
`
So in this version 1.0.0, the number of threads used (9) matches the command (-w 8) when accounting for worker threads and file compression.
This makes it difficult with the newer version or impossible to set correct threads: values in Snakemake or request fair resources on shared HPC systems.
Best regards,
Rasmus