Skip to content

feat(num_feat): add NumericalFeature.feature_matrix (#337)#347

Draft
breimanntools wants to merge 1 commit into
masterfrom
feat/337-numericalfeature-feature-matrix
Draft

feat(num_feat): add NumericalFeature.feature_matrix (#337)#347
breimanntools wants to merge 1 commit into
masterfrom
feat/337-numericalfeature-feature-matrix

Conversation

@breimanntools

Copy link
Copy Markdown
Owner

Closes #337.

Summary

Adds NumericalFeature.feature_matrix(features, dict_num_parts, df_scales=..., n_jobs=1), the
numerical analog of SequenceFeature.feature_matrix: it turns CPP.run_num-selected features back
into a model matrix X, preserving the per-residue context that per-AA-averaged sequence features
discard.

Details

  • Values are reconstructed exactly the way CPP.run_num does — the SPLIT in each feature id is
    re-applied to the part's 0-based residue axis (arange(L_part)), the SCALE selects the column, and
    the selected residues are nanmean-averaged (round 5).
  • The df_feat positions column is a JMD-offset display numbering (e.g. 21..30 for a TMD), not a
    tensor index, so it is deliberately not used for value lookup (documented in the method Notes).
    Output is byte-identical to run_num's recompute_feature_matrix (verified for uniform and
    variable-length parts); per-part real lengths are inferred from the NaN padding get_parts emits.
  • Heavy lifting lives in NumericalFeature's own _backend/num_feat/feature_matrix.py (reusing the
    shared CPP split/parse helpers). @staticmethod, consistent with the class.
  • No __init__.py change (method on an already-exported class).

Ripple

  • numpydoc docstring (named Returns / Raises / Examples include)
  • executed example notebook examples/nf_feature_matrix.ipynb (every public parameter, display_df tables)
  • 27 unit tests (per-parameter positive+negative, golden hand-computed means, run_num consistency, ragged parts)
  • release-notes Unreleased entry

Part of epic #336.

🤖 Generated with Claude Code

Add NumericalFeature.feature_matrix(features, dict_num_parts, df_scales=...,
n_jobs=1), the numerical analog of SequenceFeature.feature_matrix: it turns
CPP.run_num-selected features back into a model matrix X while preserving the
per-residue context that per-AA-averaged sequence features discard.

Values are reconstructed exactly the way CPP.run_num does — the SPLIT in each
feature id is re-applied to the part's 0-based residue axis (arange(L_part)),
the SCALE selects the D column, and the selected residues are nanmean-averaged
(round 5). The df_feat 'positions' column is a JMD-offset display numbering
(e.g. 21..30 for a TMD), NOT a tensor index, so it is deliberately not used for
value lookup; this is documented in the method Notes. Output is byte-identical
to run_num's recompute_feature_matrix (verified for uniform and variable-length
parts). Per-part real lengths are inferred from the NaN padding get_parts emits.

Heavy lifting lives in NumericalFeature's own _backend/num_feat/feature_matrix.py
(reusing the shared cpp split/parse helpers). Ripple: numpydoc docstring with
named Returns / Raises / Examples include; executed examples notebook
nf_feature_matrix.ipynb (every public parameter, display_df tables); 27 unit
tests (per-parameter positive+negative, golden hand-computed means, run_num
consistency, ragged parts); release-notes Unreleased entry.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@breimanntools breimanntools force-pushed the feat/337-numericalfeature-feature-matrix branch from 840b3af to a2d28f9 Compare July 4, 2026 21:24
@codecov

codecov Bot commented Jul 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.04545% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.83%. Comparing base (7dcc8d8) to head (a2d28f9).
⚠️ Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
...analysis/feature_engineering/_numerical_feature.py 89.47% 3 Missing and 1 partial ⚠️
...re_engineering/_backend/num_feat/feature_matrix.py 94.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           master     #347    +/-   ##
========================================
  Coverage   94.83%   94.83%            
========================================
  Files         196      197     +1     
  Lines       18767    18870   +103     
  Branches     3175     3196    +21     
========================================
+ Hits        17797    17895    +98     
- Misses        633      637     +4     
- Partials      337      338     +1     
Files with missing lines Coverage Δ
...re_engineering/_backend/num_feat/feature_matrix.py 94.00% <94.00%> (ø)
...analysis/feature_engineering/_numerical_feature.py 96.26% <89.47%> (-3.74%) ⬇️

... and 9 files with indirect coverage changes

Components Coverage Δ
cpp_core 94.95% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: NumericalFeature.feature_matrix for numeric CPP (run_num) outputs

1 participant