Stage Module Enhancements:
Add comprehensive multi-group data layout support enabling real statistical
test execution instead of placeholder notes. The Stage now handles four
distinct data configurations:
- Single group: context.outputs or context.metrics for bootstrap CI
- Two groups: context.control and context.treatment for t-test and
Mann-Whitney U tests
- Multiple groups: context.groups list for ANOVA and Kruskal-Wallis
- Paired groups: context.before and context.after for paired t-test
and Wilcoxon signed-rank tests
Implement automatic metrics merging where statistical results flow into
context.metrics with standardized keys: bench_n, bench_mean, bench_sd,
bench_median, and test-specific p-values like bench_ttest_p_value.
Add conditional behaviour declaration that enables compile-time interface
checking when crucible_framework is available as a dependency.
Add type specifications for all public Stage functions including custom
types for context, opts, error_reason, and data_type.
Code Quality Improvements:
Resolve all 18 Credo strict compliance issues across the codebase:
- Apply number formatting with underscores (20700 becomes 20_700)
- Alphabetically sort alias declarations in 7 modules
- Replace Enum.map followed by Enum.join with Enum.map_join
- Reduce function arity by using map parameters in t_test.ex
- Extract helper functions to reduce cyclomatic complexity in
distributions.ex, normality_tests.ex, and variance_tests.ex
- Reduce nesting depth in eval_log.ex and normality_tests.ex
- Replace length check with Enum.empty? where appropriate
Documentation:
Add comprehensive documentation in docs/20251225 directory:
- current_state.md: Complete module reference with line numbers
- gaps.md: Gap analysis identifying improvement opportunities
- implementation_prompt.md: Detailed guide for Stage enhancements
Update README.md with Advanced Stage Configuration section showing
multi-group usage examples and Metrics Merging section explaining
automatic pipeline integration.
Update crucible_bench.svg with professional bell curve design featuring
statistical symbols, significance threshold indicators, and test type
markers.
Dependencies:
- Upgrade crucible_ir from 0.1.1 to 0.2.0
- Upgrade eval_ex from 0.1.2 to 0.1.4
- Add credo 1.7 as dev/test dependency
Testing:
Add comprehensive test suites covering:
- Two-group comparisons with t-test and Mann-Whitney
- Multi-group comparisons with ANOVA and Kruskal-Wallis
- Paired comparisons with paired t-test and Wilcoxon
- Metrics merging with existing and new metrics maps
- Behaviour compliance verification