Phase 1 methodology: OB body-zone freshness (validated)#18
Open
ArielB1980 wants to merge 27 commits intomainfrom
Open
Phase 1 methodology: OB body-zone freshness (validated)#18ArielB1980 wants to merge 27 commits intomainfrom
ArielB1980 wants to merge 27 commits intomainfrom
Conversation
The position_evaluator.py was untracked but imported by position_manager_v2, causing ModuleNotFoundError in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
position_evaluator.py is imported by position_manager_v2 but was untracked, causing ModuleNotFoundError in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Increase mutation step from 0.25% to 3% so optimizer actually explores - Rebalance KPI scoring: penalize drawdown more, reward risk-adjusted returns - Wire counterfactual twin into harness loop for live-tape validation - Tighten TP targets (TP1: 1.0R→0.5R@60%, TP2: 2.5R→1.5R@25%) so trades book profit before auction rotation - Add 8 time-dimension params to research allowlist (hold times, cooldowns, thesis decay, swap threshold) so optimizer can discover multi-day edges - Expand research window from 30 days to full year (365d) with 2025 + recent evaluation windows - Drop 1m from replay timeframes (Kraken caps at 12h, strategy uses 15m+) - Coerce float→int in _apply_params for int-typed config fields - Raise promotion bar to 50 trades (was 20) for statistical significance - Widen Pydantic bounds for hold times to support 48h holds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Kraken's OHLC endpoint only returns the last ~720 candles regardless of the `since` parameter. This script fetches raw trades via the Trades endpoint (which supports full history) and aggregates them into OHLCV candles for 15m/1h/4h timeframes. Features: - Incremental saves every 200 pages to persist progress - Handles all 3 core symbols (SOL, BTC, ETH) - Single trade fetch per symbol, aggregated into multiple timeframes - Proper Kraken rate limiting (1s delay between requests) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SMC engine produces float entry_price/stop_loss values that cause TypeError when downstream code (risk_manager, execution_engine, backtest_engine) does arithmetic with Decimal values. Root-cause fix: coerce all numeric Signal fields to Decimal at construction. Also adds traceback logging to research evaluator exception handler so future crashes show the full stack trace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Config defines starting_equity as float, causing account_equity to be float throughout the backtest. This triggers TypeError at risk_manager.py:223 (buying_power = account_equity * requested_leverage) when float multiplies Decimal. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…/BTC Round 1 produced only 5-7 trades for SOL/BTC because: 1. higher_tf_penalty_outside_zone (-4 pts) was not mutable 2. promotion_min_signal_trades=50 rejected all candidates with <50 trades 3. Mutation step too conservative (3%) to escape local minima Round 2 fixes: - Add higher_tf_penalty_outside_zone to allowlist (bounds: -10 to 0) - Lower promotion_min_signal_trades to 5 - Raise mutate_step to 5%, params_per_candidate to 8 - 80 iterations instead of 50 - Fresh warm-start Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Runs research on BTC, ETH, SOL, XRP, ADA, DOGE, AVAX, DOT, LINK, SUI in 4 batches of 3 to manage API rate limits. Uses R2 settings: promotion_min_signal_trades=5, mutate_step=5%, 60 iterations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… MoneyTaur methodology Fixes signal drought caused by stacked hard gates and broken KPI function that rewarded ultra-rare 2-5 trade setups. Restructures scoring around volume, structure confirmation, and multi-TF Fibonacci confluence per MoneyTaur/EmperorBTC. Phase 0: Fix KPI trade_count_weight (0.01→0.15), add min_trade_floor=30 Phase 1: Kill weekly Fib -18 penalty, wire RSI divergence into scoring (+10pts) Phase 2: Replace EMA slope with volume confirmation (0-15pts), ADX with structure confirmation (0-12pts) Phase 3: Add 1H Fib confluence bonus (0-8pts) via multi-TF overlap detection Phase 4: Collapse tight_smc/wide_structure into unified regime (60/65 thresholds) Phase 5: Reduce hard gates from 5 to 3 (fib_hard_gate_enabled=False) Phase 6: Replace 12h thesis time decay with structural invalidation (zone breach) All changes behind independent feature flags for rollback. New scoring max ~130pts (was ~115). Research harness promotion gate raised to 30 trades minimum. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Variable was only initialized inside `if confirmed:` but referenced in Step 6 scoring which can execute on unconfirmed paths too. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The variable was inside `if ms_change:` but referenced after the block closes in Step 6 scoring. Move to parent scope so it's always defined. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RSI divergence was only checked when a new structure change was detected AND confirmed — extremely rare. Now runs unconditionally on every signal evaluation when enabled, matching MoneyTaur methodology. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The YAML config was overriding the Python default to false, preventing RSI divergence from contributing to signal scores. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The score_breakdown dict logged by the risk manager and attached to signals was missing volume, structure, rsi_div, and fib_1h fields. Replaced deprecated adx/ema_slope keys with the active components. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…allowlist - KPI min_trade_floor: 30 → 5 (realistic for per-symbol 90d windows) - Harness promotion_min_signal_trades: 50 → 5 - wide_structure_max_distortion_pct: 0.15 → 0.25 (unblocks BTC/SOL signals) - Field upper bound raised to 0.40 so harness can explore higher tolerance - Added risk.wide_structure_max_distortion_pct to research allowlist + bounds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…and bounds The YAML config and StrategyConfig bounds were still capping at 0.15/0.25, overriding the RiskConfig default change. Now consistent at 0.25 default with 0.40 upper bound for research exploration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…f hardcoded 30 The _promotion_gate() was ignoring the configurable promotion_min_signal_trades setting and always requiring 30 trades, blocking per-symbol research results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… fib_1h bonus, add missing allowlist entries - Lower min_score_smc_aligned 60→45, min_score_smc_neutral 65→50 (signals typically score 28-53, making 60 unreachable) - Set fib_1h_confluence_bonus floor to ge=4.0 (optimizer was eroding it to 0.0006) - Add strategy.adx_threshold and strategy.ema_slope_bonus to research allowlist (XRP/ADA/LINK were rejected because baseline config includes these params) - Add corresponding bounds to PARAMETER_BOUNDS Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…, fee edge fix Tier 1 — Unlock dead points: - HTF alignment: add bias-aligned-outside-zone tier (+12 pts) between full aligned (+20) and neutral (+10). Most trending signals were scoring 0. - ADX gradient bonus: 25-35 ADX = +3, 35+ = +6. Free dynamic range from data already computed for the hard gate. Tier 2 — Fix score distribution shape: - SMC quality: continuous scaling by displacement ratio (OB), gap size (FVG), and confirmation strength (BOS) instead of flat +10/+8/+7. - Cost efficiency: linear 0-20 scale (0 bps=20, 50 bps=0) instead of stepped 20/15/10/5/0 that clustered signals at boundaries. Tier 3 — Fix post-score silent kills: - Lower fee_edge_multiple_k from 5.0 to 3.0 — 5x was appropriate for spot but too aggressive for perps where momentum edge is captured. - Tighten research bounds for fee_edge_multiple_k to (1.5, 5.0). Supporting: - Add inside_weekly_zone field to Signal model for HTF tier scoring. - Add adx_gradient field to SignalScore and score_breakdown dicts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Aligned signals now score 56.8 and pass the score gate, but the R:R distortion gate (25%) was killing them — fees+funding at 29-30% on 0.84% stops. Raise to 35% to let these signals through. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
With zero drawdown on 4/5 accepted symbols, shift to less defensive TP: - TP1: 0.5R/60% → 0.75R/50% (more room per trade) - Runner: 15% → 25% (let trends run with trailing stop) - Drop ADA (can't clear min trades) and XRP (weak fib confluence, fee/funding kills edge on tight stops) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…analysis - Lower min_score_smc gates (45→30 aligned, 50→35 neutral) for ~100pt range - Zero rsi_divergence_score_bonus (IC anti-predictive on BTC/SOL/LINK) - Introduce scorer_version="structure_primary" with cost-only gates (10/12) - Refactor signal_scorer.py for structure hard-gate + cost scoring - Add scripts/alpha_combination_analysis.py for IC/IR feature analysis This is the WIP baseline already running on prod; committing so Phase 1 freshness work has a clean parent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes BRIEF_METHODOLOGY_FIDELITY Gap 1 (level freshness) and Gap 4
(body close vs wick distinction).
smc_engine:
- TOUCH_{WICK_ONLY,BODY_PARTIAL,BODY_FULL} + FRESHNESS_{UNTOUCHED,PARTIAL,TESTED}
- _classify_zone_touch(): body-vs-wick zone interaction classifier
- _compute_freshness(): 3-tier grade from touch history
- _find_order_block() enriched with freshness, touch_count, touch_types,
age_candles, body bounds
- _find_fair_value_gap() replaces binary mitigation with graduated
classification honoring fvg_mitigation_mode ("touched"/"partial"/"full")
and REPLAY_OVERRIDE_FVG_MITIGATION_MODE env var
- freshness added to all three score_breakdown dicts
signal_scorer:
- SCORER_WEIGHTS: freshness weight 1.0 in structure_primary, 0.0 in phase_ad
- SignalScore.level_freshness field; 30pt grade scale for structure_primary
- _score_level_freshness() with age bonus (20-candle threshold, 1.2x)
and 60/40 OB/FVG blend (pending reweight to 100/0 post-validation)
config:
- freshness_scoring_enabled, freshness_max_points, freshness_age_bonus_*
tooling:
- scripts/alpha_combination_analysis.py captures ob_/fvg_ freshness grades
and touch counts into decision JSONL
- scripts/run_freshness_validation.sh runs 5 parallel symbol replays
against the droplet DB (used for Phase 1 validation)
Validated on 400-day × 5-symbol replay (FVG mode=full): 621 entered
signals show OB freshness discriminates forward returns monotonically
(untouched/partial ~2x return, +20pp hit rate vs fully_tested).
FVG freshness shows no discrimination — will be reweighted to 0 in a
follow-up commit after the partial>untouched inversion is investigated
against body-zone freshness.
Backward compatible: phase_ad weight=0, default fvg_mitigation_mode
"touched" preserves binary behavior, gate thresholds unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…r OBs Phase 1 used wick-zone scanning (cand.high/cand.low) for OB freshness classification. 400-day validation revealed a partial>untouched inversion (+3.34% vs +2.83% 5-bar return) that turned out to be a zone-definition artifact: 39 of 92 wick-partial entries were actually body-untouched (wicks poked into the OB but the body never reached). This commit adds parallel body-zone scanning — classify_zone_touch is invoked against min/max(open, close) as well as high/low. The OB dict now carries both freshness (wick) and body_freshness (institutional zone per Moneytaur). Signal capture records both for research. Re-bucketing on the same 626 signals with body_freshness dissolves the inversion: fully_untouched (N=265) +2.80% / 76.2% hit, partially_mitigated (N=90) +2.59% / 80.0% hit, fully_tested (N=271) +1.48% / 56.8% hit. Monotonic mean-return ordering is restored and the partial tier retains some predictive power (consistent with Moneytaur's thesis). Added scripts/bucket_freshness_analysis.py for the bucketing/crosstab. No scoring behavior change yet — follow-up commit will switch the scorer to body_freshness + reweight. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e bonus Apply 400-day validation findings to the live scorer: 1. Read OB body_freshness (Moneytaur institutional zone) instead of wick-based freshness. Wick-zone classification produced a spurious partial>untouched inversion; body-zone restores monotonic ordering (untouched +2.80% / partial +2.59% / tested +1.48% on 5-bar forward return, N=626). 2. Drop FVG from the freshness blend (was 40% of the combined score). FVG freshness showed no monotonic signal across the 626-sample set (+2.24% / +1.98% / +2.32%) — noise. FVG freshness is still captured in structure_info for Phase 2 multi-TF research but contributes 0 to the live score. 3. Per-grade scoring adjusted: fully_untouched 1.0, partially_mitigated 0.85 (up from 0.5 — partial retains most of untouched's lift when measured against the body zone), fully_tested 0.0. 4. Lower age-bonus threshold from 20 → 10 candles. Within body_freshness=fully_untouched, mean 5-bar return climbs from +1.49% at age 0-5 to +3.69% at age 10-20 and +4.47% at age 20-50. The lift begins at 10+ candles, not 20+. Fallback: if body_freshness is absent (legacy structures), the scorer reads the old wick-based freshness key so this change is compatible with previously captured decision data. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ignal)
Per-symbol IC check on the 626-signal validation set revealed that OB
body_freshness is not uniformly predictive:
XRP/USD IC +0.533 untouched +4.86% tested -2.05% (strong)
LINK/USD IC +0.384 untouched +2.82% tested -0.14% (strong)
ETH/USD IC +0.226 untouched +3.97% tested +1.74% (moderate)
BTC/USD IC +0.134 untouched +2.82% tested +1.47% (weak)
SOL/USD IC -0.064 untouched +1.81% tested +3.07% (INVERTED)
SOL is 32% of the sample. Four symbols pass the u>t check; SOL fails
and inverts. Ship the edge on the four validated symbols, hold SOL at
neutral until diagnosed.
Implementation:
- New config field freshness_disabled_symbols: List[str] = ["SOL/USD"]
- _score_level_freshness() returns 0.0 for symbols in the disable list
- scorer passes signal.symbol through the call chain
- Kept symbol parameter Optional for backward-compat with callers that
don't have a Signal (e.g. replay bucket analysis)
SOL diagnosis threads for later:
- Volatility/zone-width interaction: SOL's OBs may be proportionally
narrower relative to its ATR, so wick touches classify differently.
- Age distribution: if SOL's "untouched" OBs are systematically younger
(recent, not yet reached) vs "tested" (older, established), the causal
story is inverted — untouched = no opportunity, tested = proven demand.
Script scripts/per_symbol_freshness_ic.py is committed for future re-runs
(e.g. to re-check SOL once diagnosed, or to add new symbols to the bot).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Owner
Author
Update: per-symbol IC check — SOL kill-switch addedRan the per-symbol breakout on the same 626-signal validation set. The aggregate edge is not uniform.
SOL is 32% of the sample and inverts the signal — untouched OBs underperform tested ones there. Commit SOL diagnosis threads for a follow-up session:
Script committed at Phase 2 scoping will bake in symbol-conditional logic from the start. |
ArielB1980
added a commit
that referenced
this pull request
Apr 19, 2026
Companion memo to PR #18. Scopes the follow-on work: - 2A-1: add OB/FVG detection on 1D + 1W (zero data-pipeline work — both are already available). Log stacking depth, HTF freshness, containment vs. overlap, and bias-conflict. Score 0 in v1; calibrate from validation data like Phase 1 did. - 2A-2: extend candle pipeline to 12H + 8H; gated on 2A-1 validation. Design decisions threaded through from Phase 1 lessons: - Symbol-conditional from the start (stacking_disabled_symbols list) - Body zones only; log wick for diagnostics - Two-mode overlap (contained scored, overlapping logged) - Conflicting-bias stacks logged for Phase 2B thesis logic - HTF freshness × stacking joint distribution is the most important validation analysis — design logging for it from day one Implementation scope estimate: 2-3 days build + 1 day validation replay. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Validation Results (N=626)
OB body_freshness (Moneytaur institutional zone — what the live scorer now reads):
OB wick_freshness (original, shown for artifact explanation): partial outperformed untouched (+3.34% vs +2.83%). The migration crosstab showed 39/92 wick-partial entries were actually body-untouched — a zone-definition artifact, not market-validation.
FVG freshness: non-monotonic (+2.24% / +1.98% / +2.32%) — noise. Dropped from the live blend but still captured in
structure_infofor Phase 2 multi-TF research.Age effect within body-untouched:
Threshold moved from 20 → 10 candles since the lift begins at 10, not 20.
Scoring Changes Applied
_score_level_freshnessreadsob.body_freshness(falls back tofreshnessfor legacy structures)freshness_age_bonus_thresholddefault: 20 → 10Freshness-Specific Commits
54de9ea— Phase 1: constants,_classify_zone_touch, enriched OB/FVG dicts, scorer integrationb8d68e1— body-zone scanning alongside wick-zone (resolves partial>untouched inversion)e6d2286— scorer reweight to body_freshness + OB-only + earlier age bonusBackward Compatibility
phase_adscorer unchanged (freshness weight 0.0)touched): identical behavior, new fields additive onlybody_freshnessfall back to wick-basedfreshnessTest Plan
structure_infoandscore_breakdownon live signals🤖 Generated with Claude Code