Skip to content

feat: implement feature extractors for issue #82#120

Merged
crashfrog merged 2 commits into
mainfrom
worktree-agent-a017fae4294a430a4
Jun 2, 2026
Merged

feat: implement feature extractors for issue #82#120
crashfrog merged 2 commits into
mainfrom
worktree-agent-a017fae4294a430a4

Conversation

@crashfrog
Copy link
Copy Markdown
Member

Summary

Implements the three feature extractors required for issue #82:

  • extract_cigar_ops(cigar: &str) → usize: Counts M/I/D operations in CIGAR strings (complexity metric)
  • extract_allele_frequency(all_alleles: &HashMap<u8, usize>, allele: u8) → f64: Computes allele frequency from count distributions
  • extract_multi_map_fraction(position: u32, query_index: &QueryIndex) → f64: Calculates the fraction of reads with multiple alignments at a reference position

Implementation Details

extract_cigar_ops: Filters CIGAR string characters and counts matches (M), insertions (I), and deletions (D).

extract_allele_frequency: Divides allele count by total sum; returns 0.0 for empty maps (graceful handling).

extract_multi_map_fraction:

  • Counts reads that have at least one alignment at the target position
  • Among those reads, counts how many have multiple alignments overall (.len() > 1)
  • Returns the fraction; 0.0 for positions with no coverage

Test Results

All 24 feature extractor tests pass:

  • 8 CIGAR operation tests (single/multiple/complex/mixed operations)
  • 8 allele frequency tests (simple/four-base/missing allele/empty/single base/equal distribution)
  • 8 multi-mapping fraction tests (no multimap/partial/all/empty/not covered/mixed positions/single read/large dataset)

Closes #82

🤖 Generated with Claude Code

crashfrog and others added 2 commits June 2, 2026 15:30
Implement RED acceptance tests for derive metric computation:
- extract_cigar_ops: count M/I/D operations for CIGAR complexity
- extract_allele_frequency: compute frequency for specific allele
- extract_multi_map_fraction: fraction of reads multi-mapping at position

Tests cover happy path, edge cases (empty inputs), and error conditions.
All 24 tests registered with issue_82 prefix. 18 pass, 6 fail (RED).

Failing tests identify correct multi-mapping logic not yet implemented.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- extract_cigar_ops: counts M/I/D operations from CIGAR strings
- extract_allele_frequency: computes allele frequency from count maps
- extract_multi_map_fraction: calculates proportion of reads with multiple alignments at a position

Fixed extract_multi_map_fraction logic: counts reads that have multiple alignment records overall (indicating multi-mapping), for reads that align to the target position. Returns the fraction of multi-mapping reads among all reads at that position.

All 24 feature extractor tests now pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@crashfrog crashfrog mentioned this pull request Jun 2, 2026
8 tasks
@crashfrog crashfrog merged commit 8728933 into main Jun 2, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature extractors

1 participant