feat(num_feat): add NumericalFeature.feature_matrix (#337) by breimanntools · Pull Request #347 · breimanntools/aaanalysis

breimanntools · 2026-07-04T21:01:40Z

Closes #337.

Summary

Adds NumericalFeature.feature_matrix(features, dict_num_parts, df_scales=..., n_jobs=1), the
numerical analog of SequenceFeature.feature_matrix: it turns CPP.run_num-selected features back
into a model matrix X, preserving the per-residue context that per-AA-averaged sequence features
discard.

Details

Values are reconstructed exactly the way CPP.run_num does — the SPLIT in each feature id is
re-applied to the part's 0-based residue axis (arange(L_part)), the SCALE selects the column, and
the selected residues are nanmean-averaged (round 5).
The df_feat positions column is a JMD-offset display numbering (e.g. 21..30 for a TMD), not a
tensor index, so it is deliberately not used for value lookup (documented in the method Notes).
Output is byte-identical to run_num's recompute_feature_matrix (verified for uniform and
variable-length parts); per-part real lengths are inferred from the NaN padding get_parts emits.
Heavy lifting lives in NumericalFeature's own _backend/num_feat/feature_matrix.py (reusing the
shared CPP split/parse helpers). @staticmethod, consistent with the class.
No __init__.py change (method on an already-exported class).

Ripple

numpydoc docstring (named Returns / Raises / Examples include)
executed example notebook examples/nf_feature_matrix.ipynb (every public parameter, display_df tables)
27 unit tests (per-parameter positive+negative, golden hand-computed means, run_num consistency, ragged parts)
release-notes Unreleased entry

Part of epic #336.

🤖 Generated with Claude Code

Add NumericalFeature.feature_matrix(features, dict_num_parts, df_scales=..., n_jobs=1), the numerical analog of SequenceFeature.feature_matrix: it turns CPP.run_num-selected features back into a model matrix X while preserving the per-residue context that per-AA-averaged sequence features discard. Values are reconstructed exactly the way CPP.run_num does — the SPLIT in each feature id is re-applied to the part's 0-based residue axis (arange(L_part)), the SCALE selects the D column, and the selected residues are nanmean-averaged (round 5). The df_feat 'positions' column is a JMD-offset display numbering (e.g. 21..30 for a TMD), NOT a tensor index, so it is deliberately not used for value lookup; this is documented in the method Notes. Output is byte-identical to run_num's recompute_feature_matrix (verified for uniform and variable-length parts). Per-part real lengths are inferred from the NaN padding get_parts emits. Heavy lifting lives in NumericalFeature's own _backend/num_feat/feature_matrix.py (reusing the shared cpp split/parse helpers). Ripple: numpydoc docstring with named Returns / Raises / Examples include; executed examples notebook nf_feature_matrix.ipynb (every public parameter, display_df tables); 27 unit tests (per-parameter positive+negative, golden hand-computed means, run_num consistency, ragged parts); release-notes Unreleased entry. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

codecov · 2026-07-04T21:52:02Z

Codecov Report

❌ Patch coverage is 92.04545% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.83%. Comparing base (7dcc8d8) to head (a2d28f9).
⚠️ Report is 9 commits behind head on master.

Files with missing lines	Patch %	Lines
...analysis/feature_engineering/_numerical_feature.py	89.47%	3 Missing and 1 partial ⚠️
...re_engineering/_backend/num_feat/feature_matrix.py	94.00%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           master     #347    +/-   ##
========================================
  Coverage   94.83%   94.83%            
========================================
  Files         196      197     +1     
  Lines       18767    18870   +103     
  Branches     3175     3196    +21     
========================================
+ Hits        17797    17895    +98     
- Misses        633      637     +4     
- Partials      337      338     +1

Files with missing lines	Coverage Δ
...re_engineering/_backend/num_feat/feature_matrix.py	`94.00% <94.00%> (ø)`
...analysis/feature_engineering/_numerical_feature.py	`96.26% <89.47%> (-3.74%)`	⬇️

... and 9 files with indirect coverage changes

Components	Coverage Δ
cpp_core	`94.95% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

breimanntools force-pushed the feat/337-numericalfeature-feature-matrix branch from 840b3af to a2d28f9 Compare July 4, 2026 21:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(num_feat): add NumericalFeature.feature_matrix (#337)#347

feat(num_feat): add NumericalFeature.feature_matrix (#337)#347
breimanntools wants to merge 1 commit into
masterfrom
feat/337-numericalfeature-feature-matrix

breimanntools commented Jul 4, 2026

Uh oh!

codecov Bot commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

breimanntools commented Jul 4, 2026

Summary

Details

Ripple

Uh oh!

codecov Bot commented Jul 4, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant