Releases: carstenerickson/pileup-aadr
Releases · carstenerickson/pileup-aadr
v0.5.0 — randomHaploid default + --calling-mode
Added
--calling-modeCLI option (randomHaploiddefault /randomDiploid/
majorityCall). Stage 3's genotype-calling mode is now user-selectable instead
of hardcoded.randomHaploid(the default) andmajorityCallare pseudo-haploid
(0% het) and match the pseudo-haploid AADR 1240K panel — the correct choice when
projecting a modern WGS target onto ancient DNA.randomDiploidis retained as a
legacy escape hatch for callers who specifically want diploid genotypes.
Changed
- The
.pseudohaploid.jsonsidecar is now mode-honest.calling_moderecords
the actual--calling-modeused, andpseudohaploidis1for the pseudo-haploid
modes but0forrandomDiploid(which produces diploid het-bearing calls) — so the
downstream f2 consumer (pgen-samplebind) never mistakes diploid data for
pseudo-haploid. Selecting--calling-mode randomDiploidalso emits a stderr warning
(from both the CLI and the orchestrator, so programmatic callers see it too). --seedis forwarded for every mode: the random modes use it for read sampling, andmajorityCalluses it to break equal-allele-depth ties (so the run stays reproducible).validatenow probes the installedpileupCallerfor all three mode flags.- Modes are defined once on the
CallingModetype (CALLING_MODES,
PSEUDOHAPLOID_MODES,mode_is_pseudohaploid): the CLI choices and the
validateprobe derive from it, and the pseudo-haploid check is an allowlist
(a future mode is treated as diploid until explicitly added — fails closed).
Fixed
- Stage 3 now calls
pileupCaller --randomHaploid, not--randomDiploid. This is
a correctness fix. pileup-aadr produces pseudo-haploid genotypes at AADR 1240K sites
to co-analyze a target with the AADR panel, which is itself pseudo-haploid (one random
read per site).--randomDiploidsamples two reads → a diploid call (~13% het
on a modern WGS target), yet the.pseudohaploid.jsonsidecar labelled that output
"pseudohaploid": 1"by construction" — mislabeling het-bearing diploid data as
pseudo-haploid and creating a diploid-vs-pseudo-haploid data-type mismatch with the
panel in downstream f-statistics (reference-bias + artificial-drift heterogeneity;
Lazaridis et al. 2017, Souilmi et al. 2022).--randomHaploid(one random read → a
haploid call, 0% het) is the correct, panel-matching mode. The provenance sidecar
(calling_mode,note) and Stage-3 status line are updated accordingly. Verified on a
33× modern WGS target (track_e phase4): randomDiploid → 13% het vs randomHaploid → 0%
het, with qpAdm admixture weights invariant across calling modes.