Releases: PoHsuanLai/gsem
v0.2.0
Added
-
Ported
read_fusion(gsem::io::fusion_reader+ CLIgsem read-fusion). Reads raw FUSION TWAS.datassociation files and converts
each trait'sTWAS.Zinto a standardized gene-expression effect/SE (binary
traits via the liability conversioneffect/√(effect²·HSQ + π²/3),
continuous viaZ/√(N·HSQ)), supports the permutation-Z path, then
inner-joins traits on Gene+Panel into the merged TWAS format that
twas_reader/multiGene/userGWAS(TWAS=TRUE)consume. This is the
on-ramp from raw FUSION output into the Rust TWAS path (previously only
pre-merged files could be read). Validated against live Rread_fusion. -
Ported
subSV(gsem_matrix::vech::subset_sv). Subsetsvech(S)and
theVsampling-covariance block by a set of 1-based vech positions, for
both the with-diagonal (TYPE="S"/"S_Stand") and off-diagonal
(TYPE="R") numbering. Validated against R to 1e-12. (R's matrix-input
path has an undefined-RMATRIXbug; the port matches the bug-free
LDSC_OBJECTpath.) -
Ported
summaryGLSbandsnumeric core
(gsem::stats::gls::summary_gls_bands). GLS fit plus the confidence-band
data — predictor grid, fitted line, and ±BAND_SIZE·SE envelope at
INTERVALSpoints, withINTERCEPT/QUAD/CONTROLVARSsupport.
Validated against R to 1e-9. The ggplot rendering is not ported. -
Per-option R-equivalence coverage for the drop-in surface. New
real-package fixtures and tests for previously-untested option branches:
userGWASestimation="ML"/GC={conserv,none}/Q_SNP/std.lv;
ldscstand=TRUE(S_Stand/V_Stand) /select="ODD"/chisq.max/
liability-scale;sumstatsambig=TRUE(full numeric parity); a
non-degenerate misspecified-model CFI/chisq fixture; thepaLDSCdiag
branch; and bindings parity suites (R testthat + Python pytest). -
Coverage instrumentation in CI. A
cargo-llvm-covcoverage job with a
ratcheting floor, Codecov upload + README badge, and an advisory R-side
covr job. -
R documentation site (pkgdown). A published docs site for the
gsemr
binding (reference index + Compatibility/Architecture articles generated
from the repo-root docs), deployed to GitHub Pages, with a navbar link to
the upstream GenomicSEM project.
Fixed
-
std.lvleft the factor scale unidentified.parse_model(std_lv=true)
freed the auto-added latent variance instead of fixing it to 1 (lavaan's
std.lv=TRUEconvention:auto.fix.first=FALSE+ latent variances fixed
to 1). Any std.lv model —userGWAS/usermodel/rgmodel(std_lv=)—
estimated an unidentified factor scale and was ~12% off R. Now the
auto-add fixes the latent variance understd_lv. (crates/gsem-sem/src/syntax.rs) -
Q_SNP heterogeneity statistic + LDSC intercept floor.
compute_q_snp
computed the wrong quadratic form, and the LDSC intercept diagonal was not
floored at 1.0 before building the per-SNPV(R'suserGWAS.R:156).
Both fixed; Q_SNP now matches R to ~1e-9.
(crates/gsem/src/gwas/{q_snp,gc_correction,user_gwas}.rs)
The following entries accumulated on master after 0.1.3 and ship in 0.2.0:
Added
- R-equivalence coverage for
multiSNP,multiGene,simLDSC, and
hdl. New live-R fixtures and tests validate the four previously
untested functions, closing the last function-coverage gaps.multiSNP
andmultiGenereproduce R's augmentedS_Full/V_Fullmatrices to
1e-12/1e-14;simLDSC's deterministic per-SNP Z covariance matches R's
exactvarZ/covZalgebra to 1e-9;hdlreproduces R's genetic-
covariance matrixSto optimiser-level precision on a synthetic LD
panel built in R's exact.rda/.bimformat.
Changed
hdlevaluated in the eigenspace (match R). gsem's HDL likelihood
was fed raw LD scores aslamand the raw per-SNPbhat, whereas R
GenomicSEM/HDL evaluates the likelihood in the LD-block eigenspace:
lam= eigenvalues of the block correlation matrix and
bstar = Vᵀ·bhat(projection onto eigenvectors).LdPiecenow carries
the per-piece eigenvalues/eigenvectors (read from the.rdapanel, or
the newchr*.eigen.tsvtext file emitted byconvert_hdl_panels), the
per-trait reference N uses R'smedian(N), the likelihood floor matches
R'sexp(-18), and the per-piece optimiser is a projected-gradient
method mirroring R'soptim(L-BFGS-B). (crates/gsem-ldsc/src/hdl.rs)
Fixed
-
simLDSCphenotypic-overlap term. The off-diagonal environmental
contribution to the per-SNP Z covariance was
rPheno·n_overlap·√(NᵢNⱼ)/n_snps, carrying a spurious√(NᵢNⱼ)/n_snps
factor. R GenomicSEM usesrPheno_ij · N_ij/√(NᵢNⱼ)(=rPheno·n_overlap
for scalar sample overlap). Fixed and exposed as the deterministic
per_snp_z_cov, validated against R's construction.
(crates/gsem/src/stats/simulation.rs) -
multiSNP/multiGeneV_Fullconstruction.run_multi_snpbuilt
V_Fullas a diagonal approximation that ignored the LDSC intercept
matrix. The newbuild_multi_snp_svreproduces R's full sampling
covariance — SNP-trait variances(SE·I_diag·varSNP)², cross-trait
within-SNP, cross-SNP within-trait, and cross-SNP cross-trait blocks,
plus the trait-traitV_LDblock — matching R bit-for-bit.
(crates/gsem/src/gwas/multi_snp.rs)
Documented R bugs (gsem implements the correct behavior)
-
multiSNP/multiGeneconstant cross-SNP cross-trait LD. R weights
every cross-SNP cross-trait sampling-covariance cell by a single
constant LD value (LD2[(f²−f)/2], the last lower-triangle entry)
rather than the actual SNP-pair LD. gsem uses the correct per-pair
LD[a,b]; the two coincide when the off-diagonal LD is constant (as in
the equivalence fixtures). A unit test locks gsem's corrected path. -
multiGeneaborts for k ≥ 2 traits. R'smultiGeneassigns into
V_SNP[y,x]— a variable that does not exist inmultiGene(a typo for
V_Gene) — so the function errors withobject 'V_SNP' not found
whenever there is more than one trait. gsem implements the corrected
algorithm (identical tomultiSNPwith gene heritabilities as the
"variances"); the reference is generated from a minimally-patched
multiGene. -
hdlintercept weak identification. The per-piece HDL intercept
enters the likelihood only asint·lam/Nand is weakly identified on
small panels; R'soptimdrifts on some LD blocks while gsem's gradient
method recovers the construction-true intercept. The genetic-covariance
matrixS(HDL's primary output) is robustly identified and matches R. -
sumstatsbeta standardization. gsem'ssumstatswrote the raw
input betas/SEs instead of standardizing them like R GenomicSEM. Since
userGWAS/commonfactorGWASbuildcov(SNP, trait) = varSNP · beta,
this propagated asqrt(varSNP)scale error into per-SNP GWAS output.
Now implements all of R's modes —OLS(beta = Z/√(N·varSNP)),
linprob,se.logit, the default logistic "none" transform, and
betaspass-through — reading the P and N columns and deriving Z from
P. Validated formula-for-formula against R for every mode.
(crates/gsem/src/sumstats.rs) -
sumstatsambiguous-SNP default. R GenomicSEM removes
strand-ambiguous SNPs (A/T, C/G) only whenambig=TRUE; its default
keeps them. The R and Python bindings (and the CLI) mapped
keep_ambig = ambig, so their default dropped ~⅓ of SNPs where R keeps
them. Fixed tokeep_ambig = !ambigacross the R binding, Python
binding, and CLI (which gains an--ambigflag), matching R's default. -
rgmodelV_R dimension.rgmodelreturnedV_Ras the full
kstar × kstarsampling covariance ofvech(R)(including the fixed
diagonal-1 correlations, which have ~zero variance). R GenomicSEM returns
V_Rfor the off-diagonal correlations only (k(k−1)/2square). Fixed to
extract the off-diagonal vech block, matching R's shape and ordering (so
the bindings now return R's shape too). (crates/gsem-sem/src/rgmodel.rs) -
enrichmodel-based functional enrichment. gsem'senrich
(enrichment_test) computed the classic LDSC heritability-enrichment ratio
(prop_h²/prop_SNPs, the original Finucane statistic), not R GenomicSEM's
model-basedenrich. R fits a SEM to a baseline annotation, fixes the
regressions/loadings (perfix) to their baseline estimates, re-fits each
annotation freeing the remaining parameters, and reports per-parameter
enrichment = (est_annot/est_baseline)/Propwith SE and a 1-sided p-value.
Addedgsem_sem::enrich_model::model_enrichment(+ a sharedgsem_sem::fit
helper) implementing this with the regressions/covariances/variances fix
modes, wired through the R binding (enrich(s_covstruc, model, params, fix))
and the Python binding (model_enrichment). Validated against live R
GenomicSEM on a factor-variance covstruc. gsem's classic ratio is retained
asenrichment_test. (crates/gsem-sem/src/enrich_model.rs) -
s_ldscoverlap-weighted partitioned heritability. Stratified LDSC
returned per-annotationSas the raw coefficient contributiontau · M
(R'sS_Tau), missing R's overlap weighting. R computes each category's
partitioned (co)heritability asS = overlap · (tau · M), where
overlap[f,a] = M(f,a)/M_aandM(f,a)is the number of SNPs in both
annotationsfanda(from the.annot.gzmembership +.frqMAF
filter). These agree only for disjoint annotations and diverge for the
realistic overlapping baselineLD model. Added a.annot.gz/.frq
cross-product reader and applied the overlap transform to bothSand the
jackknife-basedV; the result now exposess_annot/v_annot(R's
S/V) ands_tau/v_tau(R'sS_Tau/V_Tau). Threaded through the R
binding, Python binding, and CLI (new--frqflag). Validated against R
on synt...
v0.1.3
Critical fix for cross-trait LDSC
Fixed
- LDSC cross-trait allele alignment. Z-scores were not flipped when A1 differed between traits, producing wrong-sign genetic covariance and ~10x inflated cross-trait intercepts. Diagonal (heritability) estimates were unaffected. All users running multi-trait LDSC should upgrade.
- LDSC per-trait chi² filtering now matches R GenomicSEM's pre-filter-then-merge behavior. Cross-trait SNP counts now agree exactly with R.
Full Changelog: v0.1.2...v0.1.3
v0.1.2
What's Changed
Fixed
- Equality constraints (
a*V1 + a*V2syntax) now correctly share a single free parameter. Previously each labelled term got its own independent parameter, silently ignoring the constraint. - Labelled first-indicator loadings (e.g.
F1 =~ a*V1 + a*V2) are now free, not fixed to 1. Matches lavaan semantics. - Observed variable detection for covariance-only models (
RL ~~ MDD) no longer fails with "0 observed variables". - L-BFGS convergence flag: when the line search exhausts all step sizes, the optimizer is at a stationary point — now correctly reports
converged = true.
Added
- Heywood case warnings: negative variance estimates are detected after every SEM fit and logged at WARN level via the
logcrate. Surfaces in R, Python, and CLI. - New tests for equality constraints (shared params, value propagation, implied covariance, Jacobian) and negative variance detection.
Full Changelog: v0.1.1...v0.1.2
v0.1.1
Second release of GenomicSEM-rs. Patch release on top of v0.1.0 with
pre-built R binaries, Windows R support, the CHANGELOG.md file, and
a handful of R CMD check fixes.
Full diff: v0.1.0...v0.1.1
Added
- Pre-built R binary packages for Linux, macOS, and Windows attached
to every GitHub release. Users no longer need a Rust toolchain to
installgsemr— pick the platform-native file from the release page
and runinstall.packages(<url>, repos = NULL). Source tarball +
remotes::install_githubremain as the "from source" fallback for
unmatched R versions and dev builds. - R package now ships with full Windows build support:
configure.win,
src/Makevars.win, and-Wl,--export-all-symbolsin the MinGW link
line soR .Call("wrap__*_rust", ...)resolves at runtime on
Windows.R CMD checkis green on ubuntu / macos / windows. workflow_dispatchtrigger onpublish.ymlwith arelease_tag
input, so the R-only job path can be manually re-run against an
existing release (to ship a late R binary or fix without burning a
crates.io / PyPI version slot).readme = "README.md"and[project.urls]in
bindings/python/pyproject.tomlso the PyPI project page renders a
description with links to the repo, issue tracker, and architecture
docs. (The 0.1.0 wheels shipped before this landed — the PyPI 0.1.0
page will continue to show "no project description" until 0.1.1 is
uploaded.)
Changed
bindings/python/README.md: absolute GitHub URLs instead of
relative../../API_COMPAT.md/../../ARCHITECTURE.mdlinks
(which don't render on PyPI); function list expanded from 10 to all
17 exports and grouped as Core pipeline vs Advanced.- Root
README.md: R install section restructured — binary
packages as the primary path, source install demoted to a "From
source" subsection. New Rust MSRV (1.88+) badge. - R package internal cleanup:
- Explicit
importFrom(stats, pnorm, setNames)/
importFrom(utils, read.table, write.table)inNAMESPACE,
clearing the "no visible global function definition" notes. @param outadded tosumstats.R+ regeneratedsumstats.Rd
(fixes an "Undocumented arguments in Rd file" WARNING).{chr}inldsc.R's@param lddescription escaped to
\code{<chr>.l2.ldscore.gz}(fixes "Lost braces" NOTE).configure/configure.winprintrustc --version/cargo --versionbefore the build (fixes "No rustc version reported"
WARNING).configure/configure.winnowrm -rf src/rust/targetafter
copying outlibgsemr.a, so R CMD check doesn't scan a duplicate
rust/target/release/libgsemr.aand double-report the Rust
stdlib's_exit/abort/exitsymbols.
- Explicit
Removed
PyLdscResult.to_json()andgsem.LdscResult.from_json(s)— the
json: Stringfield and theconversions::json_to_ldschelper that
backed them. These were dead compat from the pre-NumPy era; every hot
path has been readings/v/i_matthrough NumPy getters for a
while. This is the only user-visible break in 0.1.1. Callers who
were serializing LDSC results to disk should usepickleor pass the
result object directly into downstream functions (all 17 exported
functions accept it).
Fixed
read_sumstatsaborted on NA tokens, so.sumstats.gzfiles that
worked inGenomicSEM::ldscfailed ingsemr::ldscwith a cryptic
gsemr::ldsc error: invalid Nlong beforesample.prev/
population.prevwere even evaluated. GenomicSEM tolerates these
files because its reader callsna.omit(read_delim(...)); gsemr now
matches that behaviour — rows whoseNorZfield is an NA token
(empty,.,NA,NaN,N/A,NULL; case-insensitive) are
silently dropped, with a one-line INFO log naming the file and the
dropped-row count. Genuine parse failures now report file path, line
number, and the offending value, andload_trait_datawraps the
error with the failing file path so multi-trait runs tell you which
input went bad. (crates/gsem/src/io/gwas_reader.rs)publish.ymlskip cascade:check-r-packageand
upload-r-to-releaseno longer inherit the implicitif: success()
gate, which was evaluating to false onworkflow_dispatchruns
(becausepublish-cratesis skipped there). They now use
if: always() && needs.<upstream>.result == 'success'and proceed
as long as their direct needs are green.- Versioned workspace deps: root
Cargo.tomlinternal deps now
carryversion = "0.1.0"alongsidepath, required by
cargo publishfor inter-crate references. - R binding Cargo patch:
bindings/r/src/rust/Cargo.tomldeclares
internal crates at crates.io versions and uses[patch.crates-io]
to redirect to the in-repo sources when available.configure
strips the patch block when the local crates aren't reachable
(tarball install), so cargo falls back to the published crates
cleanly.
v0.1.0
First tagged release of GenomicSEM-rs.
A Rust rewrite of R GenomicSEM, shipped as a 6-crate workspace with
R (gsemr) and Python (genomicsem) bindings plus a standalone CLI
(gsem).
What's in this release
Core engines (crates.io)
gsem-matrix— matrix utilities (nearest PD, half-vec, PSD smoothing)gsem-ldsc— LD Score Regression with block jackknifegsem-sem— SEM engine (DWLS/ML, lavaan syntax, sandwich SEs)gsem— pipeline + CLI binary (munge, ldsc, s_ldsc, hdl, sumstats,
commonfactor, usermodel, userGWAS, commonfactorGWAS, rgmodel,
write.model, paLDSC, enrich, multiSNP, simLDSC, summaryGLS)
Bindings
gsemr— R package (source tarball attached below)genomicsem— Python package (PyPI wheels for Linux/macOS/Windows)
Highlights
- End-to-end drop-in compatibility with R GenomicSEM's public API on
all 18 user-facing functions, with the same argument shapes, output
layouts, and file formats. - Significant wall-clock speedups on the PGC benchmark (see README
comparison table). - Every user-facing function on every surface (R, Python, CLI) ships
with a worked Examples block in its help output.
See README.md for the installation and usage walk-through.