Summary
Two related bugs in v1.1.0 cause the user-supplied blocklist to be silently
ignored during ATAC peak calling, even when correctly configured in the YAML.
Both reproduce on a single-species hg38 setup with the official ENCODE
blacklist v2 (https://github.com/Boyle-Lab/Blacklist).
Bug 1: blocklist emptied to 0 bytes in workflow: "full" with mixed: false
FORMAT_BLOCKLIST (modules/atac/format_blocklist.nf) prepends a "species"
prefix derived from the filename to every chromosome in the blocklist BED.
FILTER_BLOCKLISTS (modules/atac/filter_blocklists.nf) then filters those
prefixed chromosome names against the FASTA contigs.
In single-species mode (mixed: false), COMBINE_REFERENCES is skipped, so
the FASTA contigs are NOT prefixed. Result: zero matches, and the published
file <refdir>/filtered.<basename>.bed is 0 bytes. The pipeline runs to
completion without errors, but the blocklist has no effect on peak calling.
This is a silent failure: the report HTML shows peak/FRiP/TSS metrics
identical to a run without any blocklist.
Workaround: use workflow: "analysis" instead, which bypasses
FORMAT_BLOCKLIST and passes the user's BED directly to CLEAN_PEAKS.
Bug 2: atacAnnotateFragments.R crashes on standard 4-column ENCODE BED
The R script bin/atacAnnotateFragments.R reads the blocklist with:
bl <- fread(blocklist_file, col.names = c("chr", "start", "end"))
The official ENCODE blacklist v2 (hg38-blacklist.v2.bed) is a 4-column BED
(the 4th column is the category: High_Signal_Region / Low_Mappability).
fread receives 4 columns but only 3 column names and exits with status 1.
In workflow: "analysis", this causes ANNOTATE_FRAGMENTS to fail on every
chromosome of every sample (thousands of failures), collapsing the downstream
pipeline.
Workaround: strip the BED to 3 columns before passing it to Omnition:
cut -f1-3 hg38-blacklist.v2.bed > blocklist.bed
Suggested fixes
- In
format_blocklist.nf / filter_blocklists.nf, skip the species
prefixing logic entirely when params.atac.mixed == false, or apply the
prefix consistently to the FASTA contigs as well.
- In
atacAnnotateFragments.R, read only the first 3 columns of the BED
(e.g. fread(..., select = 1:3, col.names = c("chr","start","end"))),
making the script tolerant of standard 3+ column BED files.
Reproduction
- Omnition v1.1.0
- Reference: Homo_sapiens.GRCh38.dna.primary_assembly + GRCh38.106.gtf
(Ensembl naming)
- Blocklist: ENCODE GRCh38 blacklist v2 (Amemiya et al. 2019), converted to
Ensembl naming (sed 's/^chr//') and saved as filtered.blocklist.bed
- YAML:
mixed: false, workflow: "full", blocklist parameter set under
atac.reference.blocklist
Empirical confirmation
After fixing the BED to 3 columns and switching to workflow: "analysis",
verification with bedtools intersect -u on the final
<sample>.fixedwidthpeaks.bed returns 0 peaks overlapping the blocklist on
all samples, and the report shows "High Quality Pairs Remaining after
Blocklist Removal" finally lower than upstream "High-Quality Read Pairs"
(56.0% vs 56.4%, vs identical 56.4% = 56.4% in the buggy runs).
Summary
Two related bugs in v1.1.0 cause the user-supplied blocklist to be silently
ignored during ATAC peak calling, even when correctly configured in the YAML.
Both reproduce on a single-species hg38 setup with the official ENCODE
blacklist v2 (https://github.com/Boyle-Lab/Blacklist).
Bug 1: blocklist emptied to 0 bytes in
workflow: "full"withmixed: falseFORMAT_BLOCKLIST(modules/atac/format_blocklist.nf) prepends a "species"prefix derived from the filename to every chromosome in the blocklist BED.
FILTER_BLOCKLISTS(modules/atac/filter_blocklists.nf) then filters thoseprefixed chromosome names against the FASTA contigs.
In single-species mode (
mixed: false),COMBINE_REFERENCESis skipped, sothe FASTA contigs are NOT prefixed. Result: zero matches, and the published
file
<refdir>/filtered.<basename>.bedis 0 bytes. The pipeline runs tocompletion without errors, but the blocklist has no effect on peak calling.
This is a silent failure: the report HTML shows peak/FRiP/TSS metrics
identical to a run without any blocklist.
Workaround: use
workflow: "analysis"instead, which bypassesFORMAT_BLOCKLIST and passes the user's BED directly to CLEAN_PEAKS.
Bug 2:
atacAnnotateFragments.Rcrashes on standard 4-column ENCODE BEDThe R script
bin/atacAnnotateFragments.Rreads the blocklist with:The official ENCODE blacklist v2 (hg38-blacklist.v2.bed) is a 4-column BED
(the 4th column is the category:
High_Signal_Region/Low_Mappability).freadreceives 4 columns but only 3 column names and exits with status 1.In
workflow: "analysis", this causesANNOTATE_FRAGMENTSto fail on everychromosome of every sample (thousands of failures), collapsing the downstream
pipeline.
Workaround: strip the BED to 3 columns before passing it to Omnition:
cut -f1-3 hg38-blacklist.v2.bed > blocklist.bed
Suggested fixes
format_blocklist.nf/filter_blocklists.nf, skip the speciesprefixing logic entirely when
params.atac.mixed == false, or apply theprefix consistently to the FASTA contigs as well.
atacAnnotateFragments.R, read only the first 3 columns of the BED(e.g.
fread(..., select = 1:3, col.names = c("chr","start","end"))),making the script tolerant of standard 3+ column BED files.
Reproduction
(Ensembl naming)
Ensembl naming (sed 's/^chr//') and saved as
filtered.blocklist.bedmixed: false,workflow: "full", blocklist parameter set underatac.reference.blocklistEmpirical confirmation
After fixing the BED to 3 columns and switching to
workflow: "analysis",verification with
bedtools intersect -uon the final<sample>.fixedwidthpeaks.bedreturns 0 peaks overlapping the blocklist onall samples, and the report shows "High Quality Pairs Remaining after
Blocklist Removal" finally lower than upstream "High-Quality Read Pairs"
(56.0% vs 56.4%, vs identical 56.4% = 56.4% in the buggy runs).