feat(ptm): stratified AF-depletion analysis (`hvantk ptm constraint`) by enriquea · Pull Request #89 · bigbio/hvantk

enriquea · 2026-04-13T22:22:37Z

Summary

Adds hvantk ptm constraint — a stratified allele-frequency depletion analysis that compares gnomAD AF distributions between PTM-proximal and non-PTM variants, grouped by tissue / cell-type / custom metadata fields. Turns the multi-dataset EDA prototyped in Notebooks E–I (GTEx RNA, GTEx protein, HCA adult, Farah mid-fetal, Asp early fetal) into a reusable CLI.
New modules under hvantk/ptm/: constraint.py (orchestrator + 5 statistical tests), constraint_expression.py (Hail MT / AnnData / tabular backend adapter), constraint_plots.py (4-panel figure renderer), constraint_report.py (self-contained HTML report). τ computation is delegated to the maintained tspex package via a thin wrapper in hvantk/utils/tissue_specificity.py.
Follows the library-preference policy from the design spec (prefer established ecosystem packages over reimplementing τ / aggregation / differential expression).

Test plan

pytest hvantk/tests/test_ptm_constraint.py — 6 smoke tests pass (tspex-vs-Yanai reference, tabular / AnnData adapter contracts, config validation, gene-feature computation, CLI help).
Full fast test suite: 289 tests pass.
hvantk ptm --help and hvantk ptm constraint --help render cleanly.
End-to-end synthetic run (no Hail): 500 variants x 30 genes x 6 tissues → 5 TSVs, 4 PNGs, 1 HTML report produced.
End-to-end run on a real PTM-annotated variant HT + GTEx / AnnData source (pending a larger fixture; not gated on this PR).

Refs: local/planning/2026-04-13-hvantk-ptm-constraint-implementation-plan.md, local/planning/2026-04-13-ptm-constraint-design.md.

🤖 Generated with Claude Code

The file was rewritten for AnnData during scverse migration but kept the Hail-era filename. The leftover Hail histogram function had no live callers (only @pytest.mark.hail tests against removed Hail builders) and a broken ``"ad.AnnData"`` annotation that triggered an F821 lint error. - New hvantk/visualization/expression/anndata.py exposes visualize_expression_distribution(adata, ...) operating on adata.X. - Old hail.py removed; visualization facade and subpackage doc updated. - Test import switched to the new module.

…ze_expression_ad Completes the cleanup planned in the scverse-integration plan (Task 10 step 5). The Hail-based functions in matrix_utils.py were superseded by their _ad equivalents when expression I/O migrated to AnnData; their only remaining callers were two skipped @pytest.mark.hail test files. - Remove annotate_column_summary, describe_expression_mt, summarize_expression, summarize_matrix, filter_by_metadata, filter_by_gene_list, filter_by_expression, get_top_expressed_genes (and the optional hail import). - Delete tests/test_matrix_utils.py and tests/test_summarize_expression.py. - summarize_expression_ad: stop mutating adata.obs (was writing _group_label into the caller's object) and replace the per-row index.get_loc lookup with groupby indices for O(n_cells) splitting. - gene_sets.py: refresh extract_marker_gene_sets docstring to point at the AnnData summarizer and note its long-format output requires pivot or scanpy rank_genes_groups for direct use.

…hail_context - converters.anndata_to_hail_mt previously built the long-format DataFrame with a Python double loop over n_obs * n_var entries (200M dict allocs for a modest 10k x 20k matrix). Replace with np.repeat / np.tile / X.ravel — single pass, no Python loop. - The np.bool monkey-patch (Hail 0.2.x compat with NumPy >= 1.24) lived as a side-effect at the top of converters.py. Move it into hail_context.py so it runs before any Hail import regardless of which module pulls Hail in first. converters.py now imports hl via hail_context to guarantee the shim is applied.

test_dataset_download_returns_intermediate_tsv mocked urllib.request.urlretrieve, but the production code calls requests.get. The mock was a no-op, so the test attempted a real network download and hung pytest -q indefinitely. Mock requests.get with a context-manager fake response that streams the bundled fixture zip.

- matrix_builders.py, cptac.py, expression_atlas.py: remove inner ``import anndata as ad`` repeats inside builder functions; module-level import already in scope. - test_matrix_utils_anndata.py: replace the importlib.util workaround (which sidestepped hvantk.utils.__init__ over a non-existent Python 3.10 syntax issue) with a plain import. Project requires Python >= 3.10 where PEP 604 unions are valid.

Implements the `hvantk ptm constraint` CLI and `hvantk.ptm.constraint` module from the 2026-04-13 implementation plan. Compares gnomAD allele- frequency distributions between PTM-proximal and non-PTM variants, stratified by tissue, cell type, or any categorical metadata field. Modules added: - hvantk/ptm/constraint.py — orchestrator, PTMConstraintConfig/Result, five statistical tests (per-group ranking, τ quartile, τ × LOEUF factorial, PTM category × group heatmap, within-gene Wilcoxon) - hvantk/ptm/constraint_expression.py — unified backend adapter for Hail MatrixTable / AnnData / tabular expression sources - hvantk/ptm/constraint_plots.py — four-panel figure renderer - hvantk/ptm/constraint_report.py — self-contained HTML report - hvantk/utils/tissue_specificity.py — thin tspex wrapper for τ / TSI / Gini / Shannon metrics Wiring: - hvantk/commands/ptm_cli.py — `ptm constraint` subcommand - hvantk/ptm/__init__.py — lazy exports for the new API - pyproject.toml — `constraint` extra (tspex, matplotlib, seaborn) Docs: - docs_site/tools/ptm-constraint.md - docs_site/guide/usage.md (10-line recipe) - docs_site/architecture.md (module index) Tests: 6 smoke tests in hvantk/tests/test_ptm_constraint.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-13T22:22:44Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f23d0862-d1b5-4a31-a1b8-413d71859eb7

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ptm-constraint

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codacy-production · 2026-04-13T22:25:12Z

Not up to standards ⛔

🔴 Issues 24 high · 19 medium · 31 minor

Alerts:
⚠ 74 issues (≤ 0 issues of at least minor severity)

Results:
74 new issues

Category Results

UnusedCode 2 medium

Documentation 30 minor

ErrorProne 5 high

Security 3 medium
19 high

CodeStyle 1 minor

Complexity 14 medium

View in Codacy

🟢 Metrics 186 complexity · 5 duplication

Metric Results

Complexity 186

Duplication 5

View in Codacy

_{TIP This summary will be updated as you push new changes. Give us feedback}

…ield Code-review follow-ups to the initial constraint implementation: - _anndata_value_column unconditionally returned "mean", so --expression-metric median and median_nonzero were silent no-ops on the AnnData backend. Replaced the value-column helper with a direct numpy aggregation (_aggregate_anndata_direct) that runs per-group np.median / np.nanmedian (for nonzero) over adata.X when the user requests a median metric. - _load_variants silently skipped label filtering when the configured label_field was absent from the variants HT, leading to results that claim to be filtered to TN/TP but actually include every label. Now raises ValueError with a clear remediation hint unless the user explicitly passes --label-filter all. - pyproject.toml + requirements.txt: added tspex so CI (which installs from requirements.txt) can import the wrapper and the constraint smoke tests do not fail with ImportError. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

enriquea · 2026-04-13T22:54:04Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance against the constraint feature files (hvantk/ptm/constraint*.py, hvantk/utils/tissue_specificity.py, hvantk/commands/ptm_cli.py, hvantk/ptm/__init__.py, pyproject.toml, added docs).

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

The Conda CI workflow has been failing since the AnnData migration landed on dev because environment.yml never gained anndata/scanpy. The constraint work also needs tspex; add it alongside so `python-package- conda.yml` can import every test module. Added: anndata, scanpy, pysam, PyYAML, psutil, scikit-learn, tspex.

enriquea and others added 6 commits April 13, 2026 23:48

enriquea merged commit e18b4ae into dev Apr 14, 2026
2 of 3 checks passed

enriquea deleted the ptm-constraint branch April 14, 2026 07:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ptm): stratified AF-depletion analysis (`hvantk ptm constraint`)#89

feat(ptm): stratified AF-depletion analysis (`hvantk ptm constraint`)#89
enriquea merged 8 commits intodevfrom
ptm-constraint

enriquea commented Apr 13, 2026

Uh oh!

coderabbitai bot commented Apr 13, 2026 •

edited

Loading

Review skipped

Uh oh!

codacy-production bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

enriquea commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

enriquea commented Apr 13, 2026

Summary

Test plan

Uh oh!

coderabbitai bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

codacy-production bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Not up to standards ⛔

Uh oh!

enriquea commented Apr 13, 2026

Code review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Apr 13, 2026 •

edited

Loading

codacy-production bot commented Apr 13, 2026 •

edited

Loading