# Bayesian Sidebet Optimization Analysis

**Goal**: Optimize sidebet placement timing using probabilistic methods

**Focus**:
- 70%: WHEN to place a sidebet (optimal tick)
- 30%: WHETHER to place at all (game selection)

**Data**: 568 deduplicated games with complete tick-by-tick price data

**Key Finding**: Late-game bets (tick 200-500) are nearly break-even at 19.9% win rate (vs 20% breakeven for 5x payout)

In [1]:
# Import the analysis module
import sys

sys.path.insert(0, "/home/devops/Desktop/VECTRA-PLAYER/notebooks")

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from bayesian_sidebet_analysis import (
    BayesianSurvivalModel,
    breakeven_probability,
    build_conditional_probability_matrix,
    calculate_information_gain,
    conditional_rug_probability,
    create_training_dataset,
    expected_value,
    extract_features,
    kelly_criterion,
    load_game_data,
)

%matplotlib inline
plt.rcParams["figure.figsize"] = (14, 6)

ImportError: cannot import name 'build_conditional_probability_matrix' from 'bayesian_sidebet_analysis' (/home/devops/Desktop/VECTRA-PLAYER/notebooks/bayesian_sidebet_analysis.py)

## 1. Load Data and Create Training Set

In [None]:
# Load complete game data
print("Loading game data...")
games_df = load_game_data(min_ticks=10)

print(f"Loaded {len(games_df)} games")
print(f"Median ticks: {games_df['tick_count'].median():.0f}")
print(f"Range: {games_df['tick_count'].min():.0f} - {games_df['tick_count'].max():.0f}")

# Show first few games
games_df[["game_id", "tick_count", "peak_multiplier", "rug_tick"]].head()

In [None]:
# Create training dataset with features
print("Creating training dataset...")
training_df = create_training_dataset(games_df, sidebet_window=40)

print(f"Generated {len(training_df)} training samples")
print(f"Base rug rate (40-tick window): {training_df['rug_in_window'].mean():.1%}")

# Show sample features
training_df.head(10)

## 2. Bayesian Survival Analysis

**Goal**: Model P(rug in next N ticks | game_age)

**Method**:
- Compute baseline hazard function h(t) = P(rug at tick t | survived to t)
- Derive survival function S(t) = P(game survives past tick t)
- Use Bayesian updates with feature conditioning

In [None]:
# Fit Bayesian survival model
print("Fitting Bayesian survival model...")
survival_model = BayesianSurvivalModel(games_df)

# Plot hazard and survival functions
fig, ax = plt.subplots(1, 2, figsize=(14, 5))
survival_model.plot_hazard_and_survival(ax)
plt.savefig("/home/devops/rugs_data/analysis/survival_functions.png", dpi=150, bbox_inches="tight")
plt.show()

print("Saved: survival_functions.png")

### Interpretation:

**Hazard Rate h(t)**:
- Shows instantaneous rug risk at each tick
- Spikes indicate high-risk periods
- Early spikes = "instarug" risk
- Late increases = "long games are due to rug"

**Survival Function S(t)**:
- Probability game survives past tick t
- Steep drops = high rug rate periods
- Flattening = game entering "extended" phase

In [None]:
# Test predictions at different ticks
test_ticks = [50, 100, 150, 200, 250, 300, 400, 500]

predictions = []
for tick in test_ticks:
    p_win = survival_model.predict_rug_probability(tick, window=40)
    ev_5x = expected_value(p_win, payout_multiplier=5)
    kelly_5x = kelly_criterion(p_win, payout_multiplier=5)

    predictions.append(
        {
            "tick": tick,
            "p_win_40_ticks": p_win,
            "EV_5x_payout": ev_5x,
            "kelly_fraction_5x": kelly_5x,
            "positive_EV": ev_5x > 0,
        }
    )

pred_df = pd.DataFrame(predictions)
pred_df

In [None]:
# Visualize prediction evolution
fig, ax = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

# Win probability
breakeven_5x = breakeven_probability(5)
ax[0].plot(pred_df["tick"], pred_df["p_win_40_ticks"], "o-", label="P(win)", color="blue")
ax[0].axhline(breakeven_5x, color="red", linestyle="--", label=f"Breakeven ({breakeven_5x:.1%})")
ax[0].fill_between(
    pred_df["tick"],
    breakeven_5x,
    pred_df["p_win_40_ticks"],
    where=pred_df["p_win_40_ticks"] > breakeven_5x,
    alpha=0.3,
    color="green",
)
ax[0].set_ylabel("Win Probability")
ax[0].set_title("P(rug in next 40 ticks) vs Game Age")
ax[0].legend()
ax[0].grid(True, alpha=0.3)

# Expected value
ax[1].plot(pred_df["tick"], pred_df["EV_5x_payout"], "o-", label="EV (5x payout)", color="purple")
ax[1].axhline(0, color="red", linestyle="--", label="Break-even")
ax[1].fill_between(
    pred_df["tick"],
    0,
    pred_df["EV_5x_payout"],
    where=pred_df["EV_5x_payout"] > 0,
    alpha=0.3,
    color="green",
    label="Positive EV",
)
ax[1].fill_between(
    pred_df["tick"],
    0,
    pred_df["EV_5x_payout"],
    where=pred_df["EV_5x_payout"] <= 0,
    alpha=0.3,
    color="red",
    label="Negative EV",
)
ax[1].set_xlabel("Tick")
ax[1].set_ylabel("Expected Value (SOL per 0.001 bet)")
ax[1].set_title("Expected Value vs Game Age")
ax[1].legend()
ax[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig("/home/devops/rugs_data/analysis/ev_by_tick.png", dpi=150, bbox_inches="tight")
plt.show()

## 3. Feature Importance Analysis

**Goal**: Which features best predict rug timing?

**Method**: Information gain (reduction in entropy)

In [None]:
# Calculate feature importance (this may take a minute)
feature_cols = [
    "age",
    "distance_from_peak",
    "volatility_5",
    "volatility_10",
    "momentum_3",
    "momentum_5",
    "price_acceleration",
    "ticks_since_peak",
    "mean_reversion",
    "price",
]

print("Computing feature importance (information gain)...")
importance_results = []
for feature in feature_cols:
    ig = calculate_information_gain(training_df, feature, n_bins=5)
    importance_results.append({"feature": feature, "information_gain": ig})

importance_df = pd.DataFrame(importance_results)
importance_df = importance_df.sort_values("information_gain", ascending=False)
importance_df["rank"] = range(1, len(importance_df) + 1)

print("\nTop 5 Features:")
print(importance_df.head().to_string(index=False))

In [None]:
# Visualize feature importance
plt.figure(figsize=(10, 6))
plt.barh(importance_df["feature"], importance_df["information_gain"])
plt.xlabel("Information Gain (bits)")
plt.title("Feature Importance for Rug Prediction")
plt.gca().invert_yaxis()
plt.tight_layout()
plt.savefig("/home/devops/rugs_data/analysis/feature_importance.png", dpi=150, bbox_inches="tight")
plt.show()

## 4. Conditional Probability Analysis

**Goal**: P(rug | age, price, volatility, etc.)

**Use Case**: Identify high-probability windows for sidebet placement

In [None]:
# Build age √ó price conditional probability matrix
print("Building conditional probability matrix...")
prob_matrix = build_conditional_probability_matrix(training_df, age_bins=20, price_bins=15)

# Plot heatmap
plt.figure(figsize=(14, 10))
sns.heatmap(
    prob_matrix,
    annot=True,
    fmt=".3f",
    cmap="RdYlGn_r",
    center=0.20,
    cbar_kws={"label": "P(rug in 40 ticks)"},
)
plt.title("Conditional Rug Probability: P(rug in 40 ticks | age, price)")
plt.xlabel("Price Range")
plt.ylabel("Age (ticks)")
plt.tight_layout()
plt.savefig(
    "/home/devops/rugs_data/analysis/conditional_probability_heatmap.png",
    dpi=150,
    bbox_inches="tight",
)
plt.show()

print("\nInterpretation:")
print("- GREEN cells: Low rug probability (< 20% = negative EV for 5x)")
print("- YELLOW cells: Near breakeven (~20%)")
print("- RED cells: High rug probability (> 20% = positive EV for 5x)")

In [None]:
# Find best conditions for sidebet placement
best_conditions = []

# Test various age ranges
age_ranges = [(0, 100), (100, 200), (200, 300), (300, 500)]
for age_min, age_max in age_ranges:
    p_rug = conditional_rug_probability(training_df, {"age": (age_min, age_max)})
    ev = expected_value(p_rug, 5)
    best_conditions.append(
        {
            "condition": f"age {age_min}-{age_max}",
            "p_win": p_rug,
            "EV_5x": ev,
            "positive_EV": ev > 0,
        }
    )

# Test volatility conditions
vol_ranges = [(0, 0.05), (0.05, 0.1), (0.1, 0.2), (0.2, 1.0)]
for vol_min, vol_max in vol_ranges:
    p_rug = conditional_rug_probability(training_df, {"volatility_10": (vol_min, vol_max)})
    ev = expected_value(p_rug, 5)
    best_conditions.append(
        {
            "condition": f"vol10 {vol_min}-{vol_max}",
            "p_win": p_rug,
            "EV_5x": ev,
            "positive_EV": ev > 0,
        }
    )

conditions_df = pd.DataFrame(best_conditions)
conditions_df = conditions_df.sort_values("EV_5x", ascending=False)

print("\nBest Conditions for Sidebet Placement (5x payout):")
print(conditions_df.to_string(index=False))

## 5. Optimal Tick Finding

**Goal**: Given a specific game's price history, when should we place a sidebet?

**Method**: Scan through ticks, compute EV at each, find maximum

In [None]:
# Pick a representative game
sample_game_idx = 100
sample_game = games_df.iloc[sample_game_idx]

print(f"Analyzing game: {sample_game['game_id'][-8:]}")
print(f"  Tick count: {sample_game['tick_count']}")
print(f"  Peak: {sample_game['peak_multiplier']:.2f}x")
print(f"  Rug tick: {sample_game['rug_tick']}")

# Compute EV at every tick
prices = sample_game["prices"]
ticks = range(50, min(500, len(prices)))
evs = []
p_wins = []

for tick in ticks:
    features = extract_features(prices, tick)
    p_win = survival_model.predict_rug_probability(tick, window=40, features=features)
    ev = expected_value(p_win, 5)
    evs.append(ev)
    p_wins.append(p_win)

# Find optimal tick
best_idx = np.argmax(evs)
best_tick = ticks[best_idx]
best_ev = evs[best_idx]

print("\nOptimal sidebet placement:")
print(f"  Tick: {best_tick}")
print(f"  P(win): {p_wins[best_idx]:.2%}")
print(f"  Expected Value: {best_ev:.6f} SOL")
print(f"  Price at placement: {prices[best_tick]:.4f}x")
print(f"  Actual rug tick: {sample_game['rug_tick']}")
print(f"  Would have won: {best_tick < sample_game['rug_tick'] < best_tick + 40}")

In [None]:
# Visualize EV evolution for this game
fig, ax = plt.subplots(3, 1, figsize=(14, 10), sharex=True)

# Price history
ax[0].plot(prices, label="Price", color="black", linewidth=1)
ax[0].axvline(sample_game["rug_tick"], color="red", linestyle="--", label="Rug", linewidth=2)
ax[0].axvline(best_tick, color="green", linestyle="--", label="Optimal Entry", linewidth=2)
ax[0].axvspan(best_tick, best_tick + 40, alpha=0.2, color="green", label="Sidebet Window")
ax[0].set_ylabel("Price (multiplier)")
ax[0].set_title(f"Game {sample_game['game_id'][-8:]} - Optimal Sidebet Timing")
ax[0].legend()
ax[0].grid(True, alpha=0.3)

# Win probability
breakeven = breakeven_probability(5)
ax[1].plot(ticks, p_wins, label="P(win)", color="purple", linewidth=1.5)
ax[1].axhline(breakeven, color="orange", linestyle="--", label=f"Breakeven ({breakeven:.1%})")
ax[1].axvline(best_tick, color="green", linestyle="--", alpha=0.5)
ax[1].fill_between(
    ticks, breakeven, p_wins, where=np.array(p_wins) > breakeven, alpha=0.3, color="green"
)
ax[1].set_ylabel("Win Probability")
ax[1].legend()
ax[1].grid(True, alpha=0.3)

# Expected value
ax[2].plot(ticks, evs, label="EV (5x)", color="blue", linewidth=1.5)
ax[2].axhline(0, color="red", linestyle="--")
ax[2].axvline(
    best_tick, color="green", linestyle="--", label=f"Optimal ({best_ev:.6f})", linewidth=2
)
ax[2].fill_between(ticks, 0, evs, where=np.array(evs) > 0, alpha=0.3, color="green")
ax[2].fill_between(ticks, 0, evs, where=np.array(evs) <= 0, alpha=0.3, color="red")
ax[2].set_xlabel("Tick")
ax[2].set_ylabel("Expected Value (SOL)")
ax[2].legend()
ax[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(
    "/home/devops/rugs_data/analysis/optimal_sidebet_timing.png", dpi=150, bbox_inches="tight"
)
plt.show()

## 6. Kelly Criterion for Bet Sizing

**Goal**: How much of our bankroll should we bet?

**Kelly Formula**: f* = (p √ó b - q) / b
- f* = optimal fraction of bankroll
- p = win probability
- q = 1 - p
- b = net payout odds (4 for 5x payout)

In [None]:
# Kelly sizing at different win probabilities
p_range = np.linspace(0.10, 0.40, 50)
kelly_5x = [kelly_criterion(p, 5) for p in p_range]
kelly_10x = [kelly_criterion(p, 10) for p in p_range]

plt.figure(figsize=(12, 6))
plt.plot(p_range, kelly_5x, label="5x payout", linewidth=2)
plt.plot(p_range, kelly_10x, label="10x payout", linewidth=2)
plt.axvline(breakeven_probability(5), color="red", linestyle="--", label="5x breakeven", alpha=0.5)
plt.axvline(
    breakeven_probability(10), color="orange", linestyle="--", label="10x breakeven", alpha=0.5
)
plt.axhline(0, color="black", linestyle="-", alpha=0.3)
plt.xlabel("Win Probability")
plt.ylabel("Kelly Fraction (% of bankroll)")
plt.title("Optimal Bet Size by Kelly Criterion")
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("/home/devops/rugs_data/analysis/kelly_criterion.png", dpi=150, bbox_inches="tight")
plt.show()

print("\nKelly Interpretation:")
print("- Fraction of 0 = Don't bet (negative EV)")
print("- Fraction of 0.05 = Bet 5% of bankroll")
print("- Fraction > 0.20 = Very high confidence bet")
print("\nPractical tip: Use 1/2 Kelly or 1/4 Kelly to reduce variance")

## 7. Summary & Actionable Strategy

Based on the analysis above, here are concrete rules for an RL agent:

In [None]:
print("=" * 70)
print("SIDEBET PLACEMENT STRATEGY")
print("=" * 70)

# Historical performance by tick range
print("\n1. WIN RATES BY GAME AGE (Historical):")
for tick_min, tick_max in [(0, 100), (100, 200), (200, 300), (300, 500)]:
    filtered = training_df[(training_df["age"] >= tick_min) & (training_df["age"] < tick_max)]
    if len(filtered) > 0:
        win_rate = filtered["rug_in_window"].mean()
        ev = expected_value(win_rate, 5)
        print(f"  Ticks {tick_min:3d}-{tick_max:3d}: {win_rate:.1%} win rate, EV = {ev:+.6f} SOL")

# Breakeven points
print("\n2. BREAKEVEN THRESHOLDS:")
for mult in [5, 10, 20]:
    be = breakeven_probability(mult)
    print(f"  {mult:2d}x payout: Need P(win) > {be:.2%} for positive EV")

# Feature thresholds
print("\n3. HIGH-RISK INDICATORS (Bayesian adjustments):")
print("  - rapid_fall = True: 2.0x rug probability")
print("  - volatility_10 > 0.1: 1.5x rug probability")
print("  - ticks_since_peak > 20: 1.3x rug probability")
print("  - distance_from_peak > 0.3: 1.2x rug probability")

print("\n4. RECOMMENDED STRATEGY:")
print("  ‚úÖ PLACE SIDEBET when:")
print("     - Game age > 200 ticks")
print("     - P(win) > 20% (use survival model)")
print("     - High volatility OR long time since peak")
print("  ")
print("  ‚ùå AVOID SIDEBET when:")
print("     - Game age < 100 ticks (too early, low win rate)")
print("     - Rapid rise in price (momentum suggests continued pump)")
print("     - P(win) < 16% (negative EV for 5x)")
print("  ")
print("  üí∞ BET SIZING:")
print("     - Use 1/4 Kelly criterion")
print("     - If P(win) = 25%, Kelly suggests ~6.25% of bankroll")
print("     - Conservative: Use 1.5-2% of bankroll for 5x sidebets")

print("\n" + "=" * 70)

## 8. Next Steps for RL Integration

**Observation Space** (from these features):
```python
obs = [
    game_age,              # Ticks since start
    current_price,         # Current multiplier
    distance_from_peak,    # % below peak
    volatility_10,         # 10-tick volatility
    ticks_since_peak,      # Time since peak
    p_rug_40_ticks,        # From survival model
]
```

**Action Space**:
```python
actions = [
    0: HOLD (don't place sidebet)
    1: PLACE_5X (place 5x sidebet)
    2: PLACE_10X (place 10x sidebet)
]
```

**Reward Function**:
```python
if action == PLACE_5X:
    if rug_in_next_40_ticks:
        reward = +4 * bet_amount  # Won 5x payout
    else:
        reward = -1 * bet_amount  # Lost bet
else:
    reward = 0  # No action
```

**Training Approach**:
1. Use historical games for offline RL (PPO or DQN)
2. Initialize Q-values using expected values from this analysis
3. Fine-tune policy with exploration in live environment
4. Use Kelly criterion for position sizing