Phase 1 methodology: OB body-zone freshness (validated) by ArielB1980 · Pull Request #18 · ArielB1980/Kbot

ArielB1980 · 2026-04-18T08:22:55Z

Summary

Phase 1 methodology fidelity: level freshness + body/wick classification for OBs and FVGs
Validated on 400-day replay (N=626 signals, 5 symbols, FVG mode=full): OB body_freshness is a monotonic discriminator (untouched > partial > tested), wick-zone artifact explained and fixed, age effect strong at 10+ candles
Broader branch context: also carries prior audit-research-value-v3 work (scorer overhaul, falsification harness, replay state isolation, research evaluator updates) — see commit history

Validation Results (N=626)

OB body_freshness (Moneytaur institutional zone — what the live scorer now reads):

Grade	N	Mean 5b	Hit 5b	Mean 10b	Hit 10b
fully_untouched	265	+2.80%	76.2%	+2.37%	68.1%
partially_mitigated	90	+2.59%	80.0%	+2.71%	74.4%
fully_tested	271	+1.48%	56.8%	+1.40%	52.4%

OB wick_freshness (original, shown for artifact explanation): partial outperformed untouched (+3.34% vs +2.83%). The migration crosstab showed 39/92 wick-partial entries were actually body-untouched — a zone-definition artifact, not market-validation.

FVG freshness: non-monotonic (+2.24% / +1.98% / +2.32%) — noise. Dropped from the live blend but still captured in structure_info for Phase 2 multi-TF research.

Age effect within body-untouched:

age 0-5: +1.49%, 63.4% hit
age 5-10: +2.51%, 80.4% hit
age 10-20: +3.69%, 84.9% hit
age 20-50: +4.47%, 82.9% hit

Threshold moved from 20 → 10 candles since the lift begins at 10, not 20.

Scoring Changes Applied

_score_level_freshness reads ob.body_freshness (falls back to freshness for legacy structures)
OB-only blend (FVG weight 0.0, was 0.4)
Per-grade: untouched 1.0, partial 0.85 (was 0.5), tested 0.0
freshness_age_bonus_threshold default: 20 → 10

Freshness-Specific Commits

54de9ea — Phase 1: constants, _classify_zone_touch, enriched OB/FVG dicts, scorer integration
b8d68e1 — body-zone scanning alongside wick-zone (resolves partial>untouched inversion)
e6d2286 — scorer reweight to body_freshness + OB-only + earlier age bonus

Backward Compatibility

phase_ad scorer unchanged (freshness weight 0.0)
Default FVG mitigation mode (touched): identical behavior, new fields additive only
OB selection logic unchanged — same OB picked, just enriched with metadata
Legacy structures without body_freshness fall back to wick-based freshness

Test Plan

Replay a known window on staging/sandbox to confirm score distribution shifts in the expected direction
Verify freshness grades appear in structure_info and score_breakdown on live signals
Confirm Grade A rate is similar or higher (untouched entries should dominate A/B)
Watch for any signal drought — if 100% OB-only blend is too selective, reintroduce FVG at lower weight
Existing test suite passes

🤖 Generated with Claude Code

The position_evaluator.py was untracked but imported by position_manager_v2, causing ModuleNotFoundError in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

position_evaluator.py is imported by position_manager_v2 but was untracked, causing ModuleNotFoundError in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Increase mutation step from 0.25% to 3% so optimizer actually explores - Rebalance KPI scoring: penalize drawdown more, reward risk-adjusted returns - Wire counterfactual twin into harness loop for live-tape validation - Tighten TP targets (TP1: 1.0R→0.5R@60%, TP2: 2.5R→1.5R@25%) so trades book profit before auction rotation - Add 8 time-dimension params to research allowlist (hold times, cooldowns, thesis decay, swap threshold) so optimizer can discover multi-day edges - Expand research window from 30 days to full year (365d) with 2025 + recent evaluation windows - Drop 1m from replay timeframes (Kraken caps at 12h, strategy uses 15m+) - Coerce float→int in _apply_params for int-typed config fields - Raise promotion bar to 50 trades (was 20) for statistical significance - Widen Pydantic bounds for hold times to support 48h holds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Kraken's OHLC endpoint only returns the last ~720 candles regardless of the `since` parameter. This script fetches raw trades via the Trades endpoint (which supports full history) and aggregates them into OHLCV candles for 15m/1h/4h timeframes. Features: - Incremental saves every 200 pages to persist progress - Handles all 3 core symbols (SOL, BTC, ETH) - Single trade fetch per symbol, aggregated into multiple timeframes - Proper Kraken rate limiting (1s delay between requests) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SMC engine produces float entry_price/stop_loss values that cause TypeError when downstream code (risk_manager, execution_engine, backtest_engine) does arithmetic with Decimal values. Root-cause fix: coerce all numeric Signal fields to Decimal at construction. Also adds traceback logging to research evaluator exception handler so future crashes show the full stack trace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Config defines starting_equity as float, causing account_equity to be float throughout the backtest. This triggers TypeError at risk_manager.py:223 (buying_power = account_equity * requested_leverage) when float multiplies Decimal. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…/BTC Round 1 produced only 5-7 trades for SOL/BTC because: 1. higher_tf_penalty_outside_zone (-4 pts) was not mutable 2. promotion_min_signal_trades=50 rejected all candidates with <50 trades 3. Mutation step too conservative (3%) to escape local minima Round 2 fixes: - Add higher_tf_penalty_outside_zone to allowlist (bounds: -10 to 0) - Lower promotion_min_signal_trades to 5 - Raise mutate_step to 5%, params_per_candidate to 8 - 80 iterations instead of 50 - Fresh warm-start Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Runs research on BTC, ETH, SOL, XRP, ADA, DOGE, AVAX, DOT, LINK, SUI in 4 batches of 3 to manage API rate limits. Uses R2 settings: promotion_min_signal_trades=5, mutate_step=5%, 60 iterations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… MoneyTaur methodology Fixes signal drought caused by stacked hard gates and broken KPI function that rewarded ultra-rare 2-5 trade setups. Restructures scoring around volume, structure confirmation, and multi-TF Fibonacci confluence per MoneyTaur/EmperorBTC. Phase 0: Fix KPI trade_count_weight (0.01→0.15), add min_trade_floor=30 Phase 1: Kill weekly Fib -18 penalty, wire RSI divergence into scoring (+10pts) Phase 2: Replace EMA slope with volume confirmation (0-15pts), ADX with structure confirmation (0-12pts) Phase 3: Add 1H Fib confluence bonus (0-8pts) via multi-TF overlap detection Phase 4: Collapse tight_smc/wide_structure into unified regime (60/65 thresholds) Phase 5: Reduce hard gates from 5 to 3 (fib_hard_gate_enabled=False) Phase 6: Replace 12h thesis time decay with structural invalidation (zone breach) All changes behind independent feature flags for rollback. New scoring max ~130pts (was ~115). Research harness promotion gate raised to 30 trades minimum. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Variable was only initialized inside `if confirmed:` but referenced in Step 6 scoring which can execute on unconfirmed paths too. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The variable was inside `if ms_change:` but referenced after the block closes in Step 6 scoring. Move to parent scope so it's always defined. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

RSI divergence was only checked when a new structure change was detected AND confirmed — extremely rare. Now runs unconditionally on every signal evaluation when enabled, matching MoneyTaur methodology. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The YAML config was overriding the Python default to false, preventing RSI divergence from contributing to signal scores. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The score_breakdown dict logged by the risk manager and attached to signals was missing volume, structure, rsi_div, and fib_1h fields. Replaced deprecated adx/ema_slope keys with the active components. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…allowlist - KPI min_trade_floor: 30 → 5 (realistic for per-symbol 90d windows) - Harness promotion_min_signal_trades: 50 → 5 - wide_structure_max_distortion_pct: 0.15 → 0.25 (unblocks BTC/SOL signals) - Field upper bound raised to 0.40 so harness can explore higher tolerance - Added risk.wide_structure_max_distortion_pct to research allowlist + bounds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…and bounds The YAML config and StrategyConfig bounds were still capping at 0.15/0.25, overriding the RiskConfig default change. Now consistent at 0.25 default with 0.40 upper bound for research exploration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…f hardcoded 30 The _promotion_gate() was ignoring the configurable promotion_min_signal_trades setting and always requiring 30 trades, blocking per-symbol research results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… fib_1h bonus, add missing allowlist entries - Lower min_score_smc_aligned 60→45, min_score_smc_neutral 65→50 (signals typically score 28-53, making 60 unreachable) - Set fib_1h_confluence_bonus floor to ge=4.0 (optimizer was eroding it to 0.0006) - Add strategy.adx_threshold and strategy.ema_slope_bonus to research allowlist (XRP/ADA/LINK were rejected because baseline config includes these params) - Add corresponding bounds to PARAMETER_BOUNDS Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…, fee edge fix Tier 1 — Unlock dead points: - HTF alignment: add bias-aligned-outside-zone tier (+12 pts) between full aligned (+20) and neutral (+10). Most trending signals were scoring 0. - ADX gradient bonus: 25-35 ADX = +3, 35+ = +6. Free dynamic range from data already computed for the hard gate. Tier 2 — Fix score distribution shape: - SMC quality: continuous scaling by displacement ratio (OB), gap size (FVG), and confirmation strength (BOS) instead of flat +10/+8/+7. - Cost efficiency: linear 0-20 scale (0 bps=20, 50 bps=0) instead of stepped 20/15/10/5/0 that clustered signals at boundaries. Tier 3 — Fix post-score silent kills: - Lower fee_edge_multiple_k from 5.0 to 3.0 — 5x was appropriate for spot but too aggressive for perps where momentum edge is captured. - Tighten research bounds for fee_edge_multiple_k to (1.5, 5.0). Supporting: - Add inside_weekly_zone field to Signal model for HTF tier scoring. - Add adx_gradient field to SignalScore and score_breakdown dicts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Aligned signals now score 56.8 and pass the score gate, but the R:R distortion gate (25%) was killing them — fees+funding at 29-30% on 0.84% stops. Raise to 35% to let these signals through. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

With zero drawdown on 4/5 accepted symbols, shift to less defensive TP: - TP1: 0.5R/60% → 0.75R/50% (more room per trade) - Runner: 15% → 25% (let trends run with trailing stop) - Drop ADA (can't clear min trades) and XRP (weak fib confluence, fee/funding kills edge on tight stops) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…analysis - Lower min_score_smc gates (45→30 aligned, 50→35 neutral) for ~100pt range - Zero rsi_divergence_score_bonus (IC anti-predictive on BTC/SOL/LINK) - Introduce scorer_version="structure_primary" with cost-only gates (10/12) - Refactor signal_scorer.py for structure hard-gate + cost scoring - Add scripts/alpha_combination_analysis.py for IC/IR feature analysis This is the WIP baseline already running on prod; committing so Phase 1 freshness work has a clean parent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Closes BRIEF_METHODOLOGY_FIDELITY Gap 1 (level freshness) and Gap 4 (body close vs wick distinction). smc_engine: - TOUCH_{WICK_ONLY,BODY_PARTIAL,BODY_FULL} + FRESHNESS_{UNTOUCHED,PARTIAL,TESTED} - _classify_zone_touch(): body-vs-wick zone interaction classifier - _compute_freshness(): 3-tier grade from touch history - _find_order_block() enriched with freshness, touch_count, touch_types, age_candles, body bounds - _find_fair_value_gap() replaces binary mitigation with graduated classification honoring fvg_mitigation_mode ("touched"/"partial"/"full") and REPLAY_OVERRIDE_FVG_MITIGATION_MODE env var - freshness added to all three score_breakdown dicts signal_scorer: - SCORER_WEIGHTS: freshness weight 1.0 in structure_primary, 0.0 in phase_ad - SignalScore.level_freshness field; 30pt grade scale for structure_primary - _score_level_freshness() with age bonus (20-candle threshold, 1.2x) and 60/40 OB/FVG blend (pending reweight to 100/0 post-validation) config: - freshness_scoring_enabled, freshness_max_points, freshness_age_bonus_* tooling: - scripts/alpha_combination_analysis.py captures ob_/fvg_ freshness grades and touch counts into decision JSONL - scripts/run_freshness_validation.sh runs 5 parallel symbol replays against the droplet DB (used for Phase 1 validation) Validated on 400-day × 5-symbol replay (FVG mode=full): 621 entered signals show OB freshness discriminates forward returns monotonically (untouched/partial ~2x return, +20pp hit rate vs fully_tested). FVG freshness shows no discrimination — will be reweighted to 0 in a follow-up commit after the partial>untouched inversion is investigated against body-zone freshness. Backward compatible: phase_ad weight=0, default fvg_mitigation_mode "touched" preserves binary behavior, gate thresholds unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…r OBs Phase 1 used wick-zone scanning (cand.high/cand.low) for OB freshness classification. 400-day validation revealed a partial>untouched inversion (+3.34% vs +2.83% 5-bar return) that turned out to be a zone-definition artifact: 39 of 92 wick-partial entries were actually body-untouched (wicks poked into the OB but the body never reached). This commit adds parallel body-zone scanning — classify_zone_touch is invoked against min/max(open, close) as well as high/low. The OB dict now carries both freshness (wick) and body_freshness (institutional zone per Moneytaur). Signal capture records both for research. Re-bucketing on the same 626 signals with body_freshness dissolves the inversion: fully_untouched (N=265) +2.80% / 76.2% hit, partially_mitigated (N=90) +2.59% / 80.0% hit, fully_tested (N=271) +1.48% / 56.8% hit. Monotonic mean-return ordering is restored and the partial tier retains some predictive power (consistent with Moneytaur's thesis). Added scripts/bucket_freshness_analysis.py for the bucketing/crosstab. No scoring behavior change yet — follow-up commit will switch the scorer to body_freshness + reweight. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…e bonus Apply 400-day validation findings to the live scorer: 1. Read OB body_freshness (Moneytaur institutional zone) instead of wick-based freshness. Wick-zone classification produced a spurious partial>untouched inversion; body-zone restores monotonic ordering (untouched +2.80% / partial +2.59% / tested +1.48% on 5-bar forward return, N=626). 2. Drop FVG from the freshness blend (was 40% of the combined score). FVG freshness showed no monotonic signal across the 626-sample set (+2.24% / +1.98% / +2.32%) — noise. FVG freshness is still captured in structure_info for Phase 2 multi-TF research but contributes 0 to the live score. 3. Per-grade scoring adjusted: fully_untouched 1.0, partially_mitigated 0.85 (up from 0.5 — partial retains most of untouched's lift when measured against the body zone), fully_tested 0.0. 4. Lower age-bonus threshold from 20 → 10 candles. Within body_freshness=fully_untouched, mean 5-bar return climbs from +1.49% at age 0-5 to +3.69% at age 10-20 and +4.47% at age 20-50. The lift begins at 10+ candles, not 20+. Fallback: if body_freshness is absent (legacy structures), the scorer reads the old wick-based freshness key so this change is compatible with previously captured decision data. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ignal) Per-symbol IC check on the 626-signal validation set revealed that OB body_freshness is not uniformly predictive: XRP/USD IC +0.533 untouched +4.86% tested -2.05% (strong) LINK/USD IC +0.384 untouched +2.82% tested -0.14% (strong) ETH/USD IC +0.226 untouched +3.97% tested +1.74% (moderate) BTC/USD IC +0.134 untouched +2.82% tested +1.47% (weak) SOL/USD IC -0.064 untouched +1.81% tested +3.07% (INVERTED) SOL is 32% of the sample. Four symbols pass the u>t check; SOL fails and inverts. Ship the edge on the four validated symbols, hold SOL at neutral until diagnosed. Implementation: - New config field freshness_disabled_symbols: List[str] = ["SOL/USD"] - _score_level_freshness() returns 0.0 for symbols in the disable list - scorer passes signal.symbol through the call chain - Kept symbol parameter Optional for backward-compat with callers that don't have a Signal (e.g. replay bucket analysis) SOL diagnosis threads for later: - Volatility/zone-width interaction: SOL's OBs may be proportionally narrower relative to its ATR, so wick touches classify differently. - Age distribution: if SOL's "untouched" OBs are systematically younger (recent, not yet reached) vs "tested" (older, established), the causal story is inverted — untouched = no opportunity, tested = proven demand. Script scripts/per_symbol_freshness_ic.py is committed for future re-runs (e.g. to re-check SOL once diagnosed, or to add new symbols to the bot). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

ArielB1980 · 2026-04-18T08:40:50Z

Update: per-symbol IC check — SOL kill-switch added

Ran the per-symbol breakout on the same 626-signal validation set. The aggregate edge is not uniform.

Symbol	N	Untouched	Tested	IC (5b)	u > t
XRP/USD	48	+4.86%	−2.05%	+0.533	✅
LINK/USD	117	+2.82%	−0.14%	+0.384	✅
ETH/USD	80	+3.97%	+1.74%	+0.226	✅
BTC/USD	182	+2.82%	+1.47%	+0.134	✅
SOL/USD	199	+1.81%	+3.07%	−0.064	❌
ALL	626	+2.80%	+1.48%	+0.118	✅

SOL is 32% of the sample and inverts the signal — untouched OBs underperform tested ones there. Commit c7ab549 adds a per-symbol kill-switch: freshness_disabled_symbols: ["SOL/USD"]. SOL signals now score 0 on freshness (neutral) rather than being mis-scored.

SOL diagnosis threads for a follow-up session:

Volatility/zone-width interaction: SOL's OBs may be proportionally narrower relative to ATR, causing wick touches to classify as body_full when the same candle pattern would be body_partial on BTC.
Age distribution: if SOL's "untouched" OBs are systematically younger (recent, not yet reached) while "tested" are older (established, proven), the causal story inverts — untouched = no opportunity yet, tested = proven demand.

Script committed at scripts/per_symbol_freshness_ic.py for re-checks after diagnosis.

Phase 2 scoping will bake in symbol-conditional logic from the start.

Companion memo to PR #18. Scopes the follow-on work: - 2A-1: add OB/FVG detection on 1D + 1W (zero data-pipeline work — both are already available). Log stacking depth, HTF freshness, containment vs. overlap, and bias-conflict. Score 0 in v1; calibrate from validation data like Phase 1 did. - 2A-2: extend candle pipeline to 12H + 8H; gated on 2A-1 validation. Design decisions threaded through from Phase 1 lessons: - Symbol-conditional from the start (stacking_disabled_symbols list) - Body zones only; log wick for diagnostics - Two-mode overlap (contained scored, overlapping logged) - Conflicting-bias stacks logged for Phase 2B thesis logic - HTF freshness × stacking joint distribution is the most important validation analysis — design logging for it from day one Implementation scope estimate: 2-3 days build + 1 day validation replay. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

ArielB1980 and others added 27 commits April 5, 2026 21:27

fix(ci): add missing position_evaluator module

60699b2

The position_evaluator.py was untracked but imported by position_manager_v2, causing ModuleNotFoundError in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ci: trigger workflow re-run

29b4b79

fix(ci): add missing position_evaluator module

453b86e

position_evaluator.py is imported by position_manager_v2 but was untracked, causing ModuleNotFoundError in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(smc): move rsi_divergence_state init outside confirmed block

cef06f4

Variable was only initialized inside `if confirmed:` but referenced in Step 6 scoring which can execute on unconfirmed paths too. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(smc): move rsi_divergence_state init before ms_change block

498bb34

The variable was inside `if ms_change:` but referenced after the block closes in Step 6 scoring. Move to parent scope so it's always defined. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(config): enable rsi_divergence in YAML config

236ead0

The YAML config was overriding the Python default to false, preventing RSI divergence from contributing to signal scores. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 1 methodology: OB body-zone freshness (validated)#18

Phase 1 methodology: OB body-zone freshness (validated)#18
ArielB1980 wants to merge 27 commits intomainfrom
ArielB1980/audit-research-value-v3

ArielB1980 commented Apr 18, 2026

Uh oh!

ArielB1980 commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ArielB1980 commented Apr 18, 2026

Summary

Validation Results (N=626)

Scoring Changes Applied

Freshness-Specific Commits

Backward Compatibility

Test Plan

Uh oh!

ArielB1980 commented Apr 18, 2026

Update: per-symbol IC check — SOL kill-switch added

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant