fastp v1.3.2: -w does not bound total CPU usage (regression from older versions)

In fastp v1.3.2, it appears that the -w argument does not bound total CPU usage; the process and its helpers collectively use significantly more CPUs than requested, whereas fastp v1.0.0 correctly respects -w.

`fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz` 

I have changed the names of the files for this particular issue, but the report reflect the results.

no matter the specific number of threads, either default (3) or as above, the used number of threads when i check on htop is much higher.

So i have measured with/usr/bin/time and get the following report

`
Read1 before filtering:
total reads: 82352888
total bases: 12352933200
Q20 bases: 12050099606(97.5485%)
Q30 bases: 11581709517(93.7568%)
Q40 bases: 11581709517(93.7568%)

Read2 before filtering:
total reads: 82352888
total bases: 12352933200
Q20 bases: 12116273303(98.0842%)
Q30 bases: 11594441971(93.8598%)
Q40 bases: 11594441971(93.8598%)

Read1 after filtering:
total reads: 81975844
total bases: 10237966871
Q20 bases: 10197281628(99.6026%)
Q30 bases: 10023375747(97.904%)
Q40 bases: 10023375747(97.904%)

Read2 after filtering:
total reads: 81975844
total bases: 10239358182
Q20 bases: 10186133760(99.4802%)
Q30 bases: 9987975154(97.5449%)
Q40 bases: 9987975154(97.5449%)

Filtering result:
reads passed filter: 163951688
reads failed due to low quality: 555552
reads failed due to too many N: 2946
reads failed due to too short: 195024
reads failed due to adapter dimer: 566
reads with adapter trimmed: 87029494
bases trimmed due to adapters: 3841774077

Duplication rate: 15.9739%

Insert size peak (evaluated by paired-end reads): 112

JSON report: fastp.json
HTML report: fastp.html

fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz
fastp v1.3.2, time used: 172 seconds
        Command being timed: "fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz"
        User time (seconds): 1777.35
        System time (seconds): 2234.69
        Percent of CPU this job got: 2332%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:51.98
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1318620
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 9
        Minor (reclaiming a frame) page faults: 3654687
        Voluntary context switches: 22138170
        Involuntary context switches: 37415
        Swaps: 0
        File system inputs: 981088
        File system outputs: 16435776
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
`

So i can see in htop and from the report above it's an overhead of around 23 threads. Similarly, when I used the default value, the estimated number of threads is 15.  

So when trying with the 12 or 16 threads, as I can see previous issues/comments have suggested, the actual number of used threads increases, which makes it difficult to plan ahead if you're running in Snakemake or in parallel.

However, when I download and try fastp 1.0.0 using the same command as previously stated, I get

`
/usr/bin/time -v fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz
Read1 before filtering:
total reads: 82352888
total bases: 12352933200
Q20 bases: 12050099606(97.5485%)
Q30 bases: 11581709517(93.7568%)
Q40 bases: 11581709517(93.7568%)

Read2 before filtering:
total reads: 82352888
total bases: 12352933200
Q20 bases: 12116273303(98.0842%)
Q30 bases: 11594441971(93.8598%)
Q40 bases: 11594441971(93.8598%)

Read1 after filtering:
total reads: 82063022
total bases: 10261844968
Q20 bases: 10219309314(99.5855%)
Q30 bases: 10039114627(97.8295%)
Q40 bases: 10039114627(97.8295%)

Read2 after filtering:
total reads: 82063022
total bases: 10261844968
Q20 bases: 10206334051(99.4591%)
Q30 bases: 10002769030(97.4753%)
Q40 bases: 10002769030(97.4753%)

Filtering result:
reads passed filter: 164126044
reads failed due to low quality: 576778
reads failed due to too many N: 2954
reads failed due to too short: 0
reads with adapter trimmed: 87028738
bases trimmed due to adapters: 4099873562

Duplication rate: 16.6401%

Insert size peak (evaluated by paired-end reads): 114

JSON report: fastp.json
HTML report: fastp.html

fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz
fastp v1.0.0, time used: 190 seconds
        Command being timed: "fastp -w 8 --in1 /data/seq_raw_R1_001.fastq.gz --in2 /data/seq_raw_R2_001.fastq.gz --out1 data/seq_trim_R1.fastq.gz --out2 data/seq_trim_R2.fastq.gz"
        User time (seconds): 1651.50
        System time (seconds): 22.39
        Percent of CPU this job got: 882%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 3:09.71
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1436024
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 1
        Minor (reclaiming a frame) page faults: 3198393
        Voluntary context switches: 3086040
        Involuntary context switches: 2003
        Swaps: 0
        File system inputs: 1758664
        File system outputs: 16363864
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
`

So in this version 1.0.0, the number of threads used (9) matches the command (-w 8) when accounting for worker threads and file compression.

This makes it difficult with the newer version or impossible to set correct threads: values in Snakemake or request fair resources on shared HPC systems.

Best regards,
Rasmus

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fastp v1.3.2: -w does not bound total CPU usage (regression from older versions) #685

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fastp v1.3.2: -w does not bound total CPU usage (regression from older versions) #685

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions