# Previews データ予測モデル

対象：2016年～2025年の過去データから、当日のPreviewsデータを予測

## 実装内容
1. **展示タイム予測モデル**（GradientBoostingRegressor） ★★★★★
2. **進入コース予測モデル**（GradientBoostingClassifier） ★★★★★
3. **スタート展示予測モデル**（GradientBoostingRegressor） ★★★★☆
4. **チルト調整予測モデル**（GradientBoostingRegressor） ★★★☆☆

## 出力
予測Previewsデータを `data/prediction-preview/YYYY/MM/DD.csv` に保存

## 目標精度
- 展示タイム: MAE < 0.05秒
- 進入コース: 的中率 > 70%
- スタート展示: MAE < 0.10秒
- チルト調整: 的中率 > 60%

## セットアップ

In [1]:
from pathlib import Path
import calendar
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler, LabelEncoder
import lightgbm as lgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, accuracy_score
import pickle
import warnings
warnings.filterwarnings('ignore')

cwd = Path.cwd()
repo_root = cwd if (cwd / 'data').exists() else cwd.parent.parent

print(f'Repository root: {repo_root}')
print('Setup complete')

Repository root: /Users/mahiguch/dev/boatrace/data
Setup complete


## データ変形関数の定義

In [2]:
def reshape_programs(df):
    """
    Programs を艇単位に変形
    Programs の枠 (1枠_～) を艇番として扱う
    """
    frames = []
    race_cols = ['レースコード', 'レース日', 'レース場', 'レース回']
    
    for frame in range(1, 7):
        prefix = f'{frame}枠_'
        cols = [c for c in df.columns if c.startswith(prefix)]
        if cols:
            tmp = df[race_cols + cols].copy()
            tmp.columns = race_cols + [c[len(prefix):] for c in cols]
            tmp['艇番'] = frame  # 枠番号 = 艇番
            frames.append(tmp)
    
    return pd.concat(frames, ignore_index=True) if frames else pd.DataFrame()

def reshape_previews(df):
    """
    Previews を艇単位に変形
    各艇の情報を1行に集約
    """
    frames = []
    
    race_cols = ['レースコード', 'レース日', 'レース場', 'レース回']
    race_attrs = ['風速(m)', '風向', '波の高さ(cm)', '天候', '気温(℃)', '水温(℃)']
    
    for boat in range(1, 7):
        prefix = f'艇{boat}_'
        boat_cols = [c for c in df.columns if c.startswith(prefix)]
        if boat_cols:
            tmp = df[race_cols + race_attrs + boat_cols].copy()
            boat_col_names = [c[len(prefix):] for c in boat_cols]
            tmp.columns = race_cols + race_attrs + boat_col_names
            tmp['艇番'] = boat
            frames.append(tmp)
    
    return pd.concat(frames, ignore_index=True) if frames else pd.DataFrame()

def reshape_results(df):
    """
    Results を艇単位に変形
    着順情報を艇番とマッチング
    """
    result_list = []
    
    for idx, row in df.iterrows():
        race_code = row['レースコード']
        
        for place in range(1, 7):
            boat_col = f'{place}着_艇番'
            if boat_col in df.columns and pd.notna(row[boat_col]):
                try:
                    boat_num = int(row[boat_col])
                    if boat_num < 1 or boat_num > 6:
                        continue
                    result_list.append({
                        'レースコード': race_code,
                        '艇番': boat_num,
                        '着順': place
                    })
                except (ValueError, TypeError):
                    continue
    
    return pd.DataFrame(result_list) if result_list else pd.DataFrame()

print('Reshape functions ready')

Reshape functions ready


## 2016～2025年の過去データで特徴量を抽出

### 1. データ読み込み

In [3]:
# Load data for 2025 only
all_data = {}
year = '2025'

for month in range(1, 13):
    _, max_day = calendar.monthrange(int(year), month)
    for day in range(1, max_day + 1):
        month_str = f'{month:02d}'
        day_str = f'{day:02d}'
        prog_path = repo_root / 'data' / 'programs' / year / month_str / f'{day_str}.csv'
        prev_path = repo_root / 'data' / 'previews' / year / month_str / f'{day_str}.csv'
        res_path = repo_root / 'data' / 'results' / year / month_str / f'{day_str}.csv'
        
        if prog_path.exists() and prev_path.exists() and res_path.exists():
            date_key = f'{year}-{month_str}-{day_str}'
            try:
                all_data[date_key] = {
                    'programs': pd.read_csv(prog_path),
                    'previews': pd.read_csv(prev_path),
                    'results': pd.read_csv(res_path)
                }
            except Exception as e:
                pass

print(f'✓ Loaded {len(all_data)} days (2025 only)')

✓ Loaded 365 days (2025 only)


### 2. Stadium name to number mapping

In [4]:
STADIUM_NAME_TO_NUMBER = {
    'ボートレース桐生': 1,
    'ボートレース戸田': 2,
    'ボートレース江戸川': 3,
    'ボートレース平和島': 4,
    'ボートレース多摩川': 5,
    'ボートレース浜名湖': 6,
    'ボートレース蒲郡': 7,
    'ボートレース常滑': 8,
    'ボートレース津': 9,
    'ボートレース三国': 10,
    'ボートレースびわこ': 11,
    'ボートレース琵琶湖': 11,
    'ボートレース住之江': 12,
    'ボートレース尼崎': 13,
    'ボートレース鳴門': 14,
    'ボートレース丸亀': 15,
    'ボートレース児島': 16,
    'ボートレース宮島': 17,
    'ボートレース徳山': 18,
    'ボートレース下関': 19,
    'ボートレース若松': 20,
    'ボートレース芦屋': 21,
    'ボートレース福岡': 22,
    'ボートレース唐津': 23,
    'ボートレース大村': 24,
}

def map_stadium_name_to_number(stadium_name):
    if pd.isna(stadium_name):
        return np.nan
    stadium_name = str(stadium_name).strip()
    return STADIUM_NAME_TO_NUMBER.get(stadium_name, np.nan)

print('Stadium mapping ready')

Stadium mapping ready


### 3. データ統合

In [5]:
# Combine programs, previews, and results
# Using exact logic from stadium.ipynb
combined_data = []

for date_str, data in all_data.items():
    try:
        prog = reshape_programs(data['programs'])
        prev = reshape_previews(data['previews'])
        res = reshape_results(data['results'])
        
        if prev.empty or prog.empty or res.empty:
            continue
        
        # Step 1: Merge previews + programs
        # Handle overlapping columns
        prog_cols = set(prog.columns)
        prev_cols = set(prev.columns)
        overlap_cols = prog_cols & prev_cols - {'レースコード', '艇番'}
        
        # Remove overlapping columns from programs (keep previews version)
        prog_to_merge = prog.drop(columns=list(overlap_cols))
        
        merged = prev.merge(
            prog_to_merge,
            on=['レースコード', '艇番'],
            how='left'
        )
        
        if merged.empty:
            continue
        
        # Step 2: Merge with results
        merged = merged.merge(
            res[['レースコード', '艇番', '着順']],
            on=['レースコード', '艇番'],
            how='left'
        )
        
        merged['日付'] = date_str
        combined_data.append(merged)
        
        # Count features (columns not in metadata)
        metadata_cols = {'レースコード', 'レース日', 'レース場', 'レース回', '艇番', '日付', '着順'}
        feature_count = len([c for c in merged.columns if c not in metadata_cols])
        print(f'✓ {date_str}: {merged.shape} (features: {feature_count})')
    except Exception as e:
        print(f'✗ {date_str}: {type(e).__name__}: {str(e)[:80]}')

if combined_data:
    final_data = pd.concat(combined_data, ignore_index=True)
    print(f'\nBefore filtering: {final_data.shape}')
    
    # Remove abnormal exhibition times (0 is invalid)
    initial_count = len(final_data)
    final_data = final_data[final_data['展示タイム'] != 0].reset_index(drop=True)
    removed_count = initial_count - len(final_data)
    
    print(f'Removed rows with 展示タイム = 0: {removed_count}')
    print(f'After filtering: {final_data.shape}')
    print(f'Dates: {final_data["日付"].nunique()}')
    print(f'Stadiums: {final_data["レース場"].nunique()}')
else:
    print('No data merged')

✓ 2025-01-01: (1002, 46) (features: 39)


✓ 2025-01-02: (1146, 46) (features: 39)
✓ 2025-01-03: (1303, 46) (features: 39)
✓ 2025-01-04: (1284, 46) (features: 39)


✓ 2025-01-05: (1146, 46) (features: 39)


✓ 2025-01-06: (1008, 46) (features: 39)
✓ 2025-01-07: (1074, 46) (features: 39)
✓ 2025-01-08: (1002, 46) (features: 39)


✓ 2025-01-09: (1002, 46) (features: 39)


✓ 2025-01-10: (858, 46) (features: 39)
✓ 2025-01-11: (936, 46) (features: 39)
✓ 2025-01-12: (864, 46) (features: 39)


✓ 2025-01-13: (1008, 46) (features: 39)


✓ 2025-01-14: (936, 46) (features: 39)
✓ 2025-01-15: (792, 46) (features: 39)


✓ 2025-01-16: (864, 46) (features: 39)


✓ 2025-01-17: (1074, 46) (features: 39)


✓ 2025-01-18: (1074, 46) (features: 39)
✓ 2025-01-19: (1140, 46) (features: 39)
✓ 2025-01-20: (930, 46) (features: 39)


✓ 2025-01-21: (858, 46) (features: 39)


✓ 2025-01-22: (858, 46) (features: 39)
✓ 2025-01-23: (930, 46) (features: 39)
✓ 2025-01-24: (931, 46) (features: 39)


✓ 2025-01-25: (936, 46) (features: 39)


✓ 2025-01-26: (1080, 46) (features: 39)
✓ 2025-01-27: (864, 46) (features: 39)
✓ 2025-01-28: (942, 46) (features: 39)


✓ 2025-01-29: (792, 46) (features: 39)


✓ 2025-01-30: (936, 46) (features: 39)
✓ 2025-01-31: (1152, 46) (features: 39)
✓ 2025-02-01: (1074, 46) (features: 39)


✓ 2025-02-02: (1074, 46) (features: 39)


✓ 2025-02-03: (1074, 46) (features: 39)
✓ 2025-02-04: (930, 46) (features: 39)
✓ 2025-02-05: (720, 46) (features: 39)
✓ 2025-02-06: (786, 46) (features: 39)


✓ 2025-02-07: (720, 46) (features: 39)


✓ 2025-02-08: (792, 46) (features: 39)
✓ 2025-02-09: (1008, 46) (features: 39)
✓ 2025-02-10: (1074, 46) (features: 39)


✓ 2025-02-11: (858, 46) (features: 39)


✓ 2025-02-12: (786, 46) (features: 39)
✓ 2025-02-13: (786, 46) (features: 39)


✓ 2025-02-14: (918, 46) (features: 39)


✓ 2025-02-15: (1080, 46) (features: 39)
✓ 2025-02-16: (1008, 46) (features: 39)
✓ 2025-02-17: (978, 46) (features: 39)


✓ 2025-02-18: (1003, 46) (features: 39)


✓ 2025-02-19: (1080, 46) (features: 39)
✓ 2025-02-20: (858, 46) (features: 39)
✓ 2025-02-21: (930, 46) (features: 39)


✓ 2025-02-22: (1068, 46) (features: 39)


✓ 2025-02-23: (1134, 46) (features: 39)
✓ 2025-02-24: (1056, 46) (features: 39)
✓ 2025-02-25: (1080, 46) (features: 39)


✓ 2025-02-26: (1008, 46) (features: 39)


✓ 2025-02-27: (1008, 46) (features: 39)
✓ 2025-02-28: (864, 46) (features: 39)
✓ 2025-03-01: (1008, 46) (features: 39)


✓ 2025-03-02: (930, 46) (features: 39)


✓ 2025-03-03: (948, 46) (features: 39)
✓ 2025-03-04: (840, 46) (features: 39)
✓ 2025-03-05: (852, 46) (features: 39)
✓ 2025-03-06: (912, 46) (features: 39)


✓ 2025-03-07: (918, 46) (features: 39)


✓ 2025-03-08: (1056, 46) (features: 39)
✓ 2025-03-09: (930, 46) (features: 39)
✓ 2025-03-10: (864, 46) (features: 39)


✓ 2025-03-11: (792, 46) (features: 39)


✓ 2025-03-12: (864, 46) (features: 39)
✓ 2025-03-13: (1014, 46) (features: 39)
✓ 2025-03-14: (1224, 46) (features: 39)
✓ 2025-03-15: (1218, 46) (features: 39)


✓ 2025-03-16: (1146, 46) (features: 39)
✓ 2025-03-17: (930, 46) (features: 39)
✓ 2025-03-18: (642, 46) (features: 39)
✓ 2025-03-19: (774, 46) (features: 39)
✓ 2025-03-20: (1002, 46) (features: 39)


✓ 2025-03-21: (930, 46) (features: 39)
✓ 2025-03-22: (870, 46) (features: 39)
✓ 2025-03-23: (936, 46) (features: 39)
✓ 2025-03-24: (936, 46) (features: 39)


✓ 2025-03-25: (930, 46) (features: 39)
✓ 2025-03-26: (858, 46) (features: 39)
✓ 2025-03-27: (990, 46) (features: 39)
✓ 2025-03-28: (786, 46) (features: 39)
✓ 2025-03-29: (858, 46) (features: 39)


✓ 2025-03-30: (858, 46) (features: 39)
✓ 2025-03-31: (792, 46) (features: 39)
✓ 2025-04-01: (720, 46) (features: 39)
✓ 2025-04-02: (792, 46) (features: 39)
✓ 2025-04-03: (864, 46) (features: 39)


✓ 2025-04-04: (864, 46) (features: 39)
✓ 2025-04-05: (912, 46) (features: 39)
✓ 2025-04-06: (840, 46) (features: 39)
✓ 2025-04-07: (780, 46) (features: 39)
✓ 2025-04-08: (852, 46) (features: 39)


✓ 2025-04-09: (864, 46) (features: 39)
✓ 2025-04-10: (780, 46) (features: 39)
✓ 2025-04-11: (780, 46) (features: 39)
✓ 2025-04-12: (780, 46) (features: 39)


✓ 2025-04-13: (852, 46) (features: 39)
✓ 2025-04-14: (708, 46) (features: 39)
✓ 2025-04-15: (852, 46) (features: 39)
✓ 2025-04-16: (720, 46) (features: 39)
✓ 2025-04-17: (720, 46) (features: 39)


✓ 2025-04-18: (936, 46) (features: 39)
✓ 2025-04-19: (852, 46) (features: 39)
✓ 2025-04-20: (852, 46) (features: 39)
✓ 2025-04-21: (780, 46) (features: 39)
✓ 2025-04-22: (852, 46) (features: 39)


✓ 2025-04-23: (930, 46) (features: 39)
✓ 2025-04-24: (990, 46) (features: 39)
✓ 2025-04-25: (924, 46) (features: 39)


✓ 2025-04-26: (996, 46) (features: 39)
✓ 2025-04-27: (1062, 46) (features: 39)
✓ 2025-04-28: (936, 46) (features: 39)
✓ 2025-04-29: (996, 46) (features: 39)


✓ 2025-04-30: (852, 46) (features: 39)
✓ 2025-05-01: (924, 46) (features: 39)
✓ 2025-05-02: (926, 46) (features: 39)
✓ 2025-05-03: (1218, 46) (features: 39)


✓ 2025-05-04: (1357, 46) (features: 39)
✓ 2025-05-05: (1286, 46) (features: 39)
✓ 2025-05-06: (1290, 46) (features: 39)
✓ 2025-05-07: (1080, 46) (features: 39)


✓ 2025-05-08: (924, 46) (features: 39)
✓ 2025-05-09: (822, 46) (features: 39)
✓ 2025-05-10: (997, 46) (features: 39)


✓ 2025-05-11: (996, 46) (features: 39)
✓ 2025-05-12: (1008, 46) (features: 39)
✓ 2025-05-13: (1008, 46) (features: 39)
✓ 2025-05-14: (936, 46) (features: 39)


✓ 2025-05-15: (936, 46) (features: 39)
✓ 2025-05-16: (792, 46) (features: 39)
✓ 2025-05-17: (1008, 46) (features: 39)
✓ 2025-05-18: (1153, 46) (features: 39)


✓ 2025-05-19: (1140, 46) (features: 39)
✓ 2025-05-20: (1140, 46) (features: 39)
✓ 2025-05-21: (1069, 46) (features: 39)
✓ 2025-05-22: (996, 46) (features: 39)


✓ 2025-05-23: (925, 46) (features: 39)
✓ 2025-05-24: (930, 46) (features: 39)
✓ 2025-05-25: (1008, 46) (features: 39)
✓ 2025-05-26: (864, 46) (features: 39)
✓ 2025-05-27: (708, 46) (features: 39)


✓ 2025-05-28: (852, 46) (features: 39)
✓ 2025-05-29: (852, 46) (features: 39)
✓ 2025-05-30: (708, 46) (features: 39)
✓ 2025-05-31: (780, 46) (features: 39)
✓ 2025-06-01: (1008, 46) (features: 39)
✓ 2025-06-02: (1008, 46) (features: 39)


✓ 2025-06-03: (1008, 46) (features: 39)
✓ 2025-06-04: (936, 46) (features: 39)
✓ 2025-06-05: (936, 46) (features: 39)
✓ 2025-06-06: (792, 46) (features: 39)
✓ 2025-06-07: (936, 46) (features: 39)
✓ 2025-06-08: (942, 46) (features: 39)


✓ 2025-06-09: (1002, 46) (features: 39)
✓ 2025-06-10: (1068, 46) (features: 39)
✓ 2025-06-11: (996, 46) (features: 39)
✓ 2025-06-12: (924, 46) (features: 39)
✓ 2025-06-13: (936, 46) (features: 39)
✓ 2025-06-14: (1140, 46) (features: 39)


✓ 2025-06-15: (1140, 46) (features: 39)
✓ 2025-06-16: (900, 46) (features: 39)
✓ 2025-06-17: (852, 46) (features: 39)
✓ 2025-06-18: (936, 46) (features: 39)
✓ 2025-06-19: (1008, 46) (features: 39)
✓ 2025-06-20: (1062, 46) (features: 39)


✓ 2025-06-21: (1062, 46) (features: 39)
✓ 2025-06-22: (984, 46) (features: 39)
✓ 2025-06-23: (924, 46) (features: 39)
✓ 2025-06-24: (924, 46) (features: 39)
✓ 2025-06-25: (996, 46) (features: 39)
✓ 2025-06-26: (864, 46) (features: 39)


✓ 2025-06-27: (864, 46) (features: 39)
✓ 2025-06-28: (1008, 46) (features: 39)
✓ 2025-06-29: (1069, 46) (features: 39)
✓ 2025-06-30: (852, 46) (features: 39)
✓ 2025-07-01: (846, 46) (features: 39)
✓ 2025-07-02: (1063, 46) (features: 39)


✓ 2025-07-03: (1068, 46) (features: 39)
✓ 2025-07-04: (1068, 46) (features: 39)
✓ 2025-07-05: (1146, 46) (features: 39)
✓ 2025-07-06: (1140, 46) (features: 39)
✓ 2025-07-07: (997, 46) (features: 39)


✓ 2025-07-08: (936, 46) (features: 39)
✓ 2025-07-09: (936, 46) (features: 39)
✓ 2025-07-10: (936, 46) (features: 39)
✓ 2025-07-11: (936, 46) (features: 39)
✓ 2025-07-12: (1008, 46) (features: 39)


✓ 2025-07-13: (1152, 46) (features: 39)
✓ 2025-07-14: (936, 46) (features: 39)
✓ 2025-07-15: (996, 46) (features: 39)
✓ 2025-07-16: (984, 46) (features: 39)
✓ 2025-07-17: (984, 46) (features: 39)


✓ 2025-07-18: (1201, 46) (features: 39)
✓ 2025-07-19: (1208, 46) (features: 39)
✓ 2025-07-20: (1284, 46) (features: 39)
✓ 2025-07-21: (1008, 46) (features: 39)
✓ 2025-07-22: (1008, 46) (features: 39)
✓ 2025-07-23: (864, 46) (features: 39)


✓ 2025-07-24: (996, 46) (features: 39)
✓ 2025-07-25: (1068, 46) (features: 39)
✓ 2025-07-26: (1026, 46) (features: 39)
✓ 2025-07-27: (996, 46) (features: 39)
✓ 2025-07-28: (990, 46) (features: 39)
✓ 2025-07-29: (1146, 46) (features: 39)


✓ 2025-07-30: (900, 46) (features: 39)
✓ 2025-07-31: (936, 46) (features: 39)
✓ 2025-08-01: (1080, 46) (features: 39)
✓ 2025-08-02: (1051, 46) (features: 39)
✓ 2025-08-03: (1123, 46) (features: 39)
✓ 2025-08-04: (1116, 46) (features: 39)


✓ 2025-08-05: (1038, 46) (features: 39)
✓ 2025-08-06: (1056, 46) (features: 39)
✓ 2025-08-07: (864, 46) (features: 39)
✓ 2025-08-08: (864, 46) (features: 39)
✓ 2025-08-09: (792, 46) (features: 39)


✓ 2025-08-10: (1068, 46) (features: 39)
✓ 2025-08-11: (1224, 46) (features: 39)
✓ 2025-08-12: (1134, 46) (features: 39)
✓ 2025-08-13: (1128, 46) (features: 39)


✓ 2025-08-14: (1206, 46) (features: 39)
✓ 2025-08-15: (1266, 46) (features: 39)
✓ 2025-08-16: (1206, 46) (features: 39)


✓ 2025-08-17: (1080, 46) (features: 39)
✓ 2025-08-18: (1008, 46) (features: 39)
✓ 2025-08-19: (1002, 46) (features: 39)
✓ 2025-08-20: (858, 46) (features: 39)
✓ 2025-08-21: (858, 46) (features: 39)
✓ 2025-08-22: (931, 46) (features: 39)


✓ 2025-08-23: (786, 46) (features: 39)
✓ 2025-08-24: (1062, 46) (features: 39)
✓ 2025-08-25: (930, 46) (features: 39)
✓ 2025-08-26: (996, 46) (features: 39)
✓ 2025-08-27: (864, 46) (features: 39)
✓ 2025-08-28: (792, 46) (features: 39)
✓ 2025-08-29: (792, 46) (features: 39)


✓ 2025-08-30: (1008, 46) (features: 39)
✓ 2025-08-31: (936, 46) (features: 39)
✓ 2025-09-01: (792, 46) (features: 39)
✓ 2025-09-02: (794, 46) (features: 39)
✓ 2025-09-03: (792, 46) (features: 39)
✓ 2025-09-04: (864, 46) (features: 39)
✓ 2025-09-05: (780, 46) (features: 39)


✓ 2025-09-06: (846, 46) (features: 39)
✓ 2025-09-07: (780, 46) (features: 39)
✓ 2025-09-08: (708, 46) (features: 39)
✓ 2025-09-09: (780, 46) (features: 39)
✓ 2025-09-10: (924, 46) (features: 39)
✓ 2025-09-11: (870, 46) (features: 39)
✓ 2025-09-12: (864, 46) (features: 39)


✓ 2025-09-13: (936, 46) (features: 39)
✓ 2025-09-14: (924, 46) (features: 39)
✓ 2025-09-15: (924, 46) (features: 39)
✓ 2025-09-16: (925, 46) (features: 39)
✓ 2025-09-17: (852, 46) (features: 39)
✓ 2025-09-18: (924, 46) (features: 39)
✓ 2025-09-19: (720, 46) (features: 39)


✓ 2025-09-20: (792, 46) (features: 39)
✓ 2025-09-21: (1008, 46) (features: 39)
✓ 2025-09-22: (1069, 46) (features: 39)
✓ 2025-09-23: (924, 46) (features: 39)
✓ 2025-09-24: (924, 46) (features: 39)
✓ 2025-09-25: (996, 46) (features: 39)
✓ 2025-09-26: (924, 46) (features: 39)


✓ 2025-09-27: (720, 46) (features: 39)
✓ 2025-09-28: (792, 46) (features: 39)
✓ 2025-09-29: (936, 46) (features: 39)
✓ 2025-09-30: (786, 46) (features: 39)
✓ 2025-10-01: (786, 46) (features: 39)
✓ 2025-10-02: (858, 46) (features: 39)


✓ 2025-10-03: (714, 46) (features: 39)
✓ 2025-10-04: (858, 46) (features: 39)
✓ 2025-10-05: (858, 46) (features: 39)
✓ 2025-10-06: (720, 46) (features: 39)
✓ 2025-10-07: (793, 46) (features: 39)


✓ 2025-10-08: (720, 46) (features: 39)
✓ 2025-10-09: (648, 46) (features: 39)
✓ 2025-10-10: (738, 46) (features: 39)
✓ 2025-10-11: (996, 46) (features: 39)
✓ 2025-10-12: (930, 46) (features: 39)
✓ 2025-10-13: (852, 46) (features: 39)


✓ 2025-10-14: (792, 46) (features: 39)
✓ 2025-10-15: (864, 46) (features: 39)
✓ 2025-10-16: (792, 46) (features: 39)
✓ 2025-10-17: (720, 46) (features: 39)
✓ 2025-10-18: (720, 46) (features: 39)
✓ 2025-10-19: (936, 46) (features: 39)
✓ 2025-10-20: (792, 46) (features: 39)
✓ 2025-10-21: (834, 46) (features: 39)


✓ 2025-10-22: (768, 46) (features: 39)
✓ 2025-10-23: (840, 46) (features: 39)
✓ 2025-10-24: (978, 46) (features: 39)
✓ 2025-10-25: (864, 46) (features: 39)
✓ 2025-10-26: (864, 46) (features: 39)
✓ 2025-10-27: (720, 46) (features: 39)
✓ 2025-10-28: (792, 46) (features: 39)


✓ 2025-10-29: (720, 46) (features: 39)
✓ 2025-10-30: (924, 46) (features: 39)
✓ 2025-10-31: (984, 46) (features: 39)
✓ 2025-11-01: (990, 46) (features: 39)
✓ 2025-11-02: (996, 46) (features: 39)
✓ 2025-11-03: (792, 46) (features: 39)
✓ 2025-11-04: (720, 46) (features: 39)


✓ 2025-11-05: (720, 46) (features: 39)
✓ 2025-11-06: (721, 46) (features: 39)
✓ 2025-11-07: (792, 46) (features: 39)
✓ 2025-11-08: (792, 46) (features: 39)
✓ 2025-11-09: (792, 46) (features: 39)
✓ 2025-11-10: (720, 46) (features: 39)
✓ 2025-11-11: (720, 46) (features: 39)
✓ 2025-11-12: (648, 46) (features: 39)


✓ 2025-11-13: (648, 46) (features: 39)
✓ 2025-11-14: (792, 46) (features: 39)
✓ 2025-11-15: (732, 46) (features: 39)
✓ 2025-11-16: (1008, 46) (features: 39)
✓ 2025-11-17: (792, 46) (features: 39)
✓ 2025-11-18: (756, 46) (features: 39)
✓ 2025-11-19: (864, 46) (features: 39)


✓ 2025-11-20: (721, 46) (features: 39)
✓ 2025-11-21: (792, 46) (features: 39)
✓ 2025-11-22: (864, 46) (features: 39)
✓ 2025-11-23: (864, 46) (features: 39)
✓ 2025-11-24: (792, 46) (features: 39)
✓ 2025-11-25: (726, 46) (features: 39)
✓ 2025-11-26: (864, 46) (features: 39)


✓ 2025-11-27: (792, 46) (features: 39)
✓ 2025-11-28: (793, 46) (features: 39)
✓ 2025-11-29: (930, 46) (features: 39)
✓ 2025-11-30: (925, 46) (features: 39)
✓ 2025-12-01: (792, 46) (features: 39)
✓ 2025-12-02: (720, 46) (features: 39)


✓ 2025-12-03: (720, 46) (features: 39)
✓ 2025-12-04: (864, 46) (features: 39)
✓ 2025-12-05: (936, 46) (features: 39)
✓ 2025-12-06: (864, 46) (features: 39)
✓ 2025-12-07: (792, 46) (features: 39)
✓ 2025-12-08: (792, 46) (features: 39)
✓ 2025-12-09: (864, 46) (features: 39)


✓ 2025-12-10: (864, 46) (features: 39)
✓ 2025-12-11: (792, 46) (features: 39)
✓ 2025-12-12: (792, 46) (features: 39)
✓ 2025-12-13: (936, 46) (features: 39)
✓ 2025-12-14: (996, 46) (features: 39)
✓ 2025-12-15: (1008, 46) (features: 39)
✓ 2025-12-16: (846, 46) (features: 39)


✓ 2025-12-17: (984, 46) (features: 39)
✓ 2025-12-18: (864, 46) (features: 39)
✓ 2025-12-19: (864, 46) (features: 39)
✓ 2025-12-20: (924, 46) (features: 39)
✓ 2025-12-21: (852, 46) (features: 39)
✓ 2025-12-22: (925, 46) (features: 39)
✓ 2025-12-23: (924, 46) (features: 39)


✓ 2025-12-24: (1080, 46) (features: 39)
✓ 2025-12-25: (984, 46) (features: 39)
✓ 2025-12-26: (1008, 46) (features: 39)
✓ 2025-12-27: (1128, 46) (features: 39)
✓ 2025-12-28: (1134, 46) (features: 39)
✓ 2025-12-29: (1272, 46) (features: 39)


✓ 2025-12-30: (1344, 46) (features: 39)
✓ 2025-12-31: (1140, 46) (features: 39)



Before filtering: (338623, 46)


Removed rows with 展示タイム = 0: 0
After filtering: (338623, 46)
Dates: 365
Stadiums: 24


## 展示タイム予測モデル

### 1. 特徴量準備

In [6]:
# Check columns and prepare features
print('=== データ準備 ===\n')
print(f'Final data shape: {final_data.shape}')
print(f'Total columns: {len(final_data.columns)}')

# Check for target variables (Previews data)
target_cols = ['展示タイム', 'コース', 'スタート展示', 'チルト調整']
print(f'\nTarget columns (to predict):')
for col in target_cols:
    if col in final_data.columns:
        non_null = final_data[col].notna().sum()
        total = len(final_data)
        print(f'  ✓ {col}: {final_data[col].dtype}, {non_null}/{total} ({non_null/total*100:.1f}%)')
    else:
        print(f'  ✗ {col}: NOT FOUND')

# Select features from Programs + Environment
exclude_cols = {
    'レースコード', 'レース日', 'レース場', 'レース回', 'タイトル',
    '艇番', '登録番号', '選手名', '支部',
    '着順',  # Result data
    '風向', '天候',  # Categorical - need encoding
    '展示タイム', 'コース', 'スタート展示', 'チルト調整',  # Target variables
    '体重(kg)', '体重調整(kg)',  # Preview-only data (target metadata)
}

feature_cols = [col for col in final_data.columns if col not in exclude_cols]
print(f'\nFeatures for prediction ({len(feature_cols)}):')
for i, col in enumerate(sorted(feature_cols)[:15], 1):  # Show first 15
    print(f'  {i:2d}. {col}')
if len(feature_cols) > 15:
    print(f'  ... and {len(feature_cols) - 15} more')

=== データ準備 ===

Final data shape: (338623, 46)
Total columns: 46

Target columns (to predict):
  ✓ 展示タイム: float64, 334712/338623 (98.8%)
  ✓ コース: float64, 334744/338623 (98.9%)
  ✓ スタート展示: float64, 334708/338623 (98.8%)
  ✓ チルト調整: float64, 334999/338623 (98.9%)

Features for prediction (29):
   1. ボート2連対率
   2. ボート番号
   3. モーター2連対率
   4. モーター番号
   5. 今節成績_1-1
   6. 今節成績_1-2
   7. 今節成績_2-1
   8. 今節成績_2-2
   9. 今節成績_3-1
  10. 今節成績_3-2
  11. 今節成績_4-1
  12. 今節成績_4-2
  13. 今節成績_5-1
  14. 今節成績_5-2
  15. 今節成績_6-1
  ... and 14 more


### 2. 展示タイム予測モデル構築

In [7]:
# Prepare features for exhibition time prediction
print('=== 展示タイム予測モデル準備 ===\n')

if '展示タイム' not in final_data.columns:
    print('ERROR: 展示タイム not found!')
    raise KeyError('展示タイム column missing')

# Prepare X and y
X = final_data[feature_cols].copy()
y = final_data['展示タイム'].copy()

# Convert to numeric and fill NaN
for col in X.columns:
    X[col] = pd.to_numeric(X[col], errors='coerce')

y = pd.to_numeric(y, errors='coerce')

# Fill NaN with column medians
for col in X.columns:
    median_val = X[col].median()
    X[col].fillna(median_val if pd.notna(median_val) else 0, inplace=True)

# Remove rows with missing target
valid = y.notna()
X = X[valid].reset_index(drop=True)
y = y[valid].reset_index(drop=True)

print(f'Features: {len(X.columns)}')
print(f'Samples: {len(X)}')
print(f'Target (展示タイム) stats:')
print(f'  Mean: {y.mean():.3f}s')
print(f'  Std: {y.std():.3f}s')
print(f'  Min: {y.min():.3f}s')
print(f'  Max: {y.max():.3f}s')

=== 展示タイム予測モデル準備 ===



Features: 29
Samples: 334712
Target (展示タイム) stats:
  Mean: 6.823s
  Std: 0.114s
  Min: 6.340s
  Max: 8.670s


### 3. 展示タイム予測モデル - レース場別訓練

In [8]:
# Train exhibition time models per stadium
print('\n=== 展示タイム予測モデル訓練 ===\n')

stadiums = sorted(final_data['レース場'].dropna().unique())
exhibition_models = {}
results_summary = []

for stadium in stadiums:
    stadium_mask = final_data['レース場'] == stadium
    X_std = X[stadium_mask].reset_index(drop=True)
    y_std = y[stadium_mask].reset_index(drop=True)
    
    if len(X_std) < 100:
        print(f'Stadium {int(stadium):2d}: insufficient data ({len(X_std)} samples) - SKIP')
        continue
    
    X_train, X_test, y_train, y_test = train_test_split(X_std, y_std, test_size=0.2, random_state=42)
    
    scaler = StandardScaler()
    X_train_s = scaler.fit_transform(X_train)
    X_test_s = scaler.transform(X_test)
    
    # Train GBRegressor
    gbr = lgb.LGBMRegressor(n_estimators=150, max_depth=6, learning_rate=0.05, subsample=0.8, verbose=-1, random_state=42)
    gbr.fit(X_train_s, y_train)
    
    y_pred = gbr.predict(X_test_s)
    mae = mean_absolute_error(y_test, y_pred)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    
    exhibition_models[stadium] = {'model': gbr, 'scaler': scaler, 'features': list(X.columns)}
    results_summary.append({'stadium': int(stadium), 'samples': len(X_std), 'mae': mae, 'rmse': rmse})
    
    status = '✓' if mae < 0.05 else '⚠' if mae < 0.10 else '✗'
    print(f'{status} Stadium {int(stadium):2d}: MAE={mae:.4f}s, RMSE={rmse:.4f}s ({len(X_std)} samples)')

if results_summary:
    results_df = pd.DataFrame(results_summary)
    print(f'\n=== Summary ===')
    print(f'Models trained: {len(results_df)}/{len(stadiums)}')
    print(f'Average MAE: {results_df["mae"].mean():.4f}s')
    print(f'Average RMSE: {results_df["rmse"].mean():.4f}s')
else:
    print('ERROR: No models trained!')


=== 展示タイム予測モデル訓練 ===



⚠ Stadium  1: MAE=0.0743s, RMSE=0.0968s (19115 samples)


⚠ Stadium  2: MAE=0.0748s, RMSE=0.0961s (14688 samples)


⚠ Stadium  3: MAE=0.0746s, RMSE=0.0961s (14244 samples)


⚠ Stadium  4: MAE=0.0768s, RMSE=0.0988s (11868 samples)


⚠ Stadium  5: MAE=0.0771s, RMSE=0.0971s (13956 samples)


⚠ Stadium  6: MAE=0.0787s, RMSE=0.1009s (14328 samples)


⚠ Stadium  7: MAE=0.0744s, RMSE=0.0942s (14172 samples)


⚠ Stadium  8: MAE=0.0758s, RMSE=0.0967s (15192 samples)


⚠ Stadium  9: MAE=0.0750s, RMSE=0.0976s (15134 samples)


⚠ Stadium 10: MAE=0.0747s, RMSE=0.0948s (14028 samples)


⚠ Stadium 11: MAE=0.0739s, RMSE=0.0942s (13380 samples)


⚠ Stadium 12: MAE=0.0779s, RMSE=0.0990s (14892 samples)


⚠ Stadium 13: MAE=0.0762s, RMSE=0.0968s (12078 samples)


⚠ Stadium 14: MAE=0.0758s, RMSE=0.0972s (13032 samples)


⚠ Stadium 15: MAE=0.0780s, RMSE=0.0995s (14880 samples)


⚠ Stadium 16: MAE=0.0711s, RMSE=0.0906s (11497 samples)


⚠ Stadium 17: MAE=0.0742s, RMSE=0.0948s (14232 samples)


⚠ Stadium 18: MAE=0.0744s, RMSE=0.0938s (13584 samples)


⚠ Stadium 19: MAE=0.0762s, RMSE=0.0966s (13092 samples)


⚠ Stadium 20: MAE=0.0754s, RMSE=0.0963s (14508 samples)


⚠ Stadium 21: MAE=0.0750s, RMSE=0.0953s (13968 samples)


⚠ Stadium 22: MAE=0.0758s, RMSE=0.0976s (13308 samples)


⚠ Stadium 23: MAE=0.0763s, RMSE=0.1001s (10932 samples)


⚠ Stadium 24: MAE=0.0749s, RMSE=0.0962s (14604 samples)

=== Summary ===
Models trained: 24/24
Average MAE: 0.0755s
Average RMSE: 0.0965s


## 進入コース予測モデル

In [9]:
# Prepare enhanced features for course entry prediction
print('\n=== 進入コース予測モデル準備（強化特徴量版）===\n')

if 'コース' not in final_data.columns:
    print('ERROR: コース not found!')
    raise KeyError('コース column missing')

# Start with base features (same as exhibition time)
X_course = final_data[feature_cols].copy()
y_course = final_data['コース'].copy()

# Add frame number as a feature (枠番 = boat number)
X_course['枠番'] = final_data['艇番']

# Add player's past course entry tendency
# Calculate the most common course for each player
player_course_tendency = final_data.groupby('登録番号')['コース'].agg(lambda x: x.mode()[0] if len(x.mode()) > 0 else x.median()).reset_index()
player_course_tendency.columns = ['登録番号', 'プレイヤー進入傾向コース']
X_course_temp = final_data[['登録番号']].copy()
X_course_temp = X_course_temp.merge(player_course_tendency, on='登録番号', how='left')
X_course['プレイヤー進入傾向'] = X_course_temp['プレイヤー進入傾向コース']

# Add stadium-level average course pattern
# For each stadium, calculate average course per frame position
stadium_frame_course = final_data.groupby(['レース場', '艇番'])['コース'].mean().reset_index()
stadium_frame_course.columns = ['レース場', '艇番', 'スタジアム枠別平均コース']
X_course_temp = final_data[['レース場', '艇番']].copy()
X_course_temp = X_course_temp.merge(stadium_frame_course, on=['レース場', '艇番'], how='left')
X_course['スタジアム枠別平均'] = X_course_temp['スタジアム枠別平均コース']

# Add player win rate features (interaction with frame)
X_course['全国勝率×枠'] = final_data['全国勝率'].fillna(0) * final_data['艇番']
X_course['当地勝率×枠'] = final_data['当地勝率'].fillna(0) * final_data['艇番']

# Convert to numeric
for col in X_course.columns:
    X_course[col] = pd.to_numeric(X_course[col], errors='coerce')

y_course = pd.to_numeric(y_course, errors='coerce')

# Fill NaN
for col in X_course.columns:
    median_val = X_course[col].median()
    X_course[col].fillna(median_val if pd.notna(median_val) else 0, inplace=True)

# Remove rows with missing target
valid_course = y_course.notna()
X_course = X_course[valid_course].reset_index(drop=True)
y_course = y_course[valid_course].reset_index(drop=True)
y_course = y_course.astype(int)

print(f'Base features: {len(feature_cols)}')
print(f'Added features: 5 (枠番, プレイヤー進入傾向, スタジアム枠別平均, 全国勝率×枠, 当地勝率×枠)')
print(f'Total features: {len(X_course.columns)}')
print(f'Samples: {len(X_course)}')
print(f'Target (コース) distribution:')
for course in sorted(y_course.unique()):
    count = (y_course == course).sum()
    pct = count / len(y_course) * 100
    print(f'  Course {int(course)}: {count} ({pct:.1f}%)')


=== 進入コース予測モデル準備（強化特徴量版）===



Base features: 29
Added features: 5 (枠番, プレイヤー進入傾向, スタジアム枠別平均, 全国勝率×枠, 当地勝率×枠)
Total features: 34
Samples: 334744
Target (コース) distribution:
  Course 1: 55911 (16.7%)
  Course 2: 55890 (16.7%)
  Course 3: 55893 (16.7%)
  Course 4: 55892 (16.7%)
  Course 5: 55885 (16.7%)
  Course 6: 55273 (16.5%)


### 進入コース予測モデル - レース場別訓練

In [10]:
# Train course entry models per stadium - improved hyperparameters
print('\n=== 進入コース予測モデル訓練（改善版）===\n')

course_models = {}
results_summary_course = []

for stadium in stadiums:
    stadium_mask = final_data['レース場'] == stadium
    X_std = X_course[stadium_mask].reset_index(drop=True)
    y_std = y_course[stadium_mask].reset_index(drop=True)
    
    if len(X_std) < 100:
        print(f'Stadium {int(stadium):2d}: insufficient data ({len(X_std)} samples) - SKIP')
        continue
    
    X_train, X_test, y_train, y_test = train_test_split(X_std, y_std, test_size=0.2, random_state=42)
    
    scaler = StandardScaler()
    X_train_s = scaler.fit_transform(X_train)
    X_test_s = scaler.transform(X_test)
    
    # Train GBClassifier with optimized hyperparameters
    # Increased n_estimators, max_depth, and learning_rate for better feature interaction
    gbc = lgb.LGBMClassifier(
        n_estimators=250,
        max_depth=8,
        learning_rate=0.1,
        verbose=-1,
        random_state=42,
    )
    
    gbc.fit(X_train_s, y_train)
    
    y_pred = gbc.predict(X_test_s)
    acc = accuracy_score(y_test, y_pred)
    
    course_models[stadium] = {'model': gbc, 'scaler': scaler, 'features': list(X_course.columns)}
    results_summary_course.append({'stadium': int(stadium), 'samples': len(X_std), 'accuracy': acc})
    
    status = '✓' if acc > 0.70 else '⚠' if acc > 0.50 else '✗'
    print(f'{status} Stadium {int(stadium):2d}: Accuracy={acc:.1%} ({len(X_std)} samples)')

if results_summary_course:
    results_course_df = pd.DataFrame(results_summary_course)
    print(f'\n=== Summary ===')
    print(f'Models trained: {len(results_course_df)}/{len(stadiums)}')
    print(f'Average Accuracy: {results_course_df["accuracy"].mean():.1%}')
    print(f'Min Accuracy: {results_course_df["accuracy"].min():.1%}')
    print(f'Max Accuracy: {results_course_df["accuracy"].max():.1%}')
else:
    print('ERROR: No course models trained!')


=== 進入コース予測モデル訓練（改善版）===



✓ Stadium  1: Accuracy=91.2% (19115 samples)


✓ Stadium  2: Accuracy=92.6% (14688 samples)


✓ Stadium  3: Accuracy=91.2% (14256 samples)


✓ Stadium  4: Accuracy=91.8% (11880 samples)


✓ Stadium  5: Accuracy=91.3% (13964 samples)


✓ Stadium  6: Accuracy=91.8% (14328 samples)


✓ Stadium  7: Accuracy=91.9% (14172 samples)


✓ Stadium  8: Accuracy=91.9% (15192 samples)


✓ Stadium  9: Accuracy=90.7% (15134 samples)


✓ Stadium 10: Accuracy=92.2% (14028 samples)


✓ Stadium 11: Accuracy=90.8% (13380 samples)


✓ Stadium 12: Accuracy=91.9% (14892 samples)


✓ Stadium 13: Accuracy=91.1% (12078 samples)


✓ Stadium 14: Accuracy=91.0% (13032 samples)


✓ Stadium 15: Accuracy=90.1% (14880 samples)


✓ Stadium 16: Accuracy=90.1% (11497 samples)


✓ Stadium 17: Accuracy=90.2% (14232 samples)


✓ Stadium 18: Accuracy=90.4% (13584 samples)


✓ Stadium 19: Accuracy=91.1% (13092 samples)


✓ Stadium 20: Accuracy=91.4% (14508 samples)


✓ Stadium 21: Accuracy=91.1% (13968 samples)


✓ Stadium 22: Accuracy=90.8% (13308 samples)


✓ Stadium 23: Accuracy=91.1% (10932 samples)


✓ Stadium 24: Accuracy=90.1% (14604 samples)

=== Summary ===
Models trained: 24/24
Average Accuracy: 91.2%
Min Accuracy: 90.1%
Max Accuracy: 92.6%


## スタート展示予測モデル

### 1. 特徴量準備

In [11]:
# Prepare enhanced features for start timing prediction
print('\n=== スタート展示予測モデル準備（強化特徴量版）===\n')

if 'スタート展示' not in final_data.columns:
    print('ERROR: スタート展示 not found!')
    raise KeyError('スタート展示 column missing')

# Start with base features
X_start = final_data[feature_cols].copy()
y_start = final_data['スタート展示'].copy()

# Add frame number as a feature
X_start['枠番'] = final_data['艇番']

# Add player's past start timing tendency
player_start_tendency = final_data.groupby('登録番号')['スタート展示'].agg('mean').reset_index()
player_start_tendency.columns = ['登録番号', 'プレイヤースタート平均']
X_start_temp = final_data[['登録番号']].copy()
X_start_temp = X_start_temp.merge(player_start_tendency, on='登録番号', how='left')
X_start['プレイヤースタート傾向'] = X_start_temp['プレイヤースタート平均']

# Add stadium-level average start timing per frame
stadium_frame_start = final_data.groupby(['レース場', '艇番'])['スタート展示'].mean().reset_index()
stadium_frame_start.columns = ['レース場', '艇番', 'スタジアム枠別平均スタート']
X_start_temp = final_data[['レース場', '艇番']].copy()
X_start_temp = X_start_temp.merge(stadium_frame_start, on=['レース場', '艇番'], how='left')
X_start['スタジアム枠別平均'] = X_start_temp['スタジアム枠別平均スタート']

# Add age interaction features
if '年齢' in final_data.columns:
    X_start['年齢×枠'] = final_data['年齢'].fillna(0) * final_data['艇番']
else:
    X_start['年齢×枠'] = 0
if '経験年数' in final_data.columns:
    X_start['経験年数×枠'] = final_data['経験年数'].fillna(0) * final_data['艇番']
else:
    X_start['経験年数×枠'] = 0

# Convert to numeric
for col in X_start.columns:
    X_start[col] = pd.to_numeric(X_start[col], errors='coerce')

y_start = pd.to_numeric(y_start, errors='coerce')

# Fill NaN
for col in X_start.columns:
    median_val = X_start[col].median()
    X_start[col].fillna(median_val if pd.notna(median_val) else 0, inplace=True)

# Remove rows with missing target
valid_start = y_start.notna()
X_start = X_start[valid_start].reset_index(drop=True)
y_start = y_start[valid_start].reset_index(drop=True)

print(f'Base features: {len(feature_cols)}')
print(f'Added features: 5 (枠番, プレイヤースタート傾向, スタジアム枠別平均, 年齢×枠, 経験年数×枠)')
print(f'Total features: {len(X_start.columns)}')
print(f'Samples: {len(X_start)}')
print(f'Target (スタート展示) stats:')
print(f'  Mean: {y_start.mean():.3f}s')
print(f'  Std: {y_start.std():.3f}s')
print(f'  Min: {y_start.min():.3f}s')
print(f'  Max: {y_start.max():.3f}s')


=== スタート展示予測モデル準備（強化特徴量版）===



Base features: 29
Added features: 5 (枠番, プレイヤースタート傾向, スタジアム枠別平均, 年齢×枠, 経験年数×枠)
Total features: 34
Samples: 334708
Target (スタート展示) stats:
  Mean: 0.080s
  Std: 0.107s
  Min: -0.490s
  Max: 0.990s


### 2. スタート展示予測モデル - レース場別訓練

In [12]:
# Train start timing models per stadium
print('\n=== スタート展示予測モデル訓練 ===\n')

start_timing_models = {}
results_summary_start = []

for stadium in stadiums:
    stadium_mask = final_data['レース場'] == stadium
    X_std = X_start[stadium_mask].reset_index(drop=True)
    y_std = y_start[stadium_mask].reset_index(drop=True)
    
    if len(X_std) < 100:
        print(f'Stadium {int(stadium):2d}: insufficient data ({len(X_std)} samples) - SKIP')
        continue
    
    X_train, X_test, y_train, y_test = train_test_split(X_std, y_std, test_size=0.2, random_state=42)
    
    scaler = StandardScaler()
    X_train_s = scaler.fit_transform(X_train)
    X_test_s = scaler.transform(X_test)
    
    # Train GBRegressor for start timing
    gbr_start = lgb.LGBMRegressor(
        n_estimators=150,
        max_depth=6,
        learning_rate=0.05,
        subsample=0.8,
        verbose=-1,
        random_state=42,
    )
    gbr_start.fit(X_train_s, y_train)
    
    y_pred = gbr_start.predict(X_test_s)
    mae = mean_absolute_error(y_test, y_pred)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    
    start_timing_models[stadium] = {'model': gbr_start, 'scaler': scaler, 'features': list(X_start.columns)}
    results_summary_start.append({'stadium': int(stadium), 'samples': len(X_std), 'mae': mae, 'rmse': rmse})
    
    status = '✓' if mae < 0.10 else '⚠' if mae < 0.15 else '✗'
    print(f'{status} Stadium {int(stadium):2d}: MAE={mae:.4f}s, RMSE={rmse:.4f}s ({len(X_std)} samples)')

if results_summary_start:
    results_start_df = pd.DataFrame(results_summary_start)
    print(f'\n=== Summary ===')
    print(f'Models trained: {len(results_start_df)}/{len(stadiums)}')
    print(f'Average MAE: {results_start_df["mae"].mean():.4f}s')
    print(f'Average RMSE: {results_start_df["rmse"].mean():.4f}s')
else:
    print('ERROR: No start timing models trained!')


=== スタート展示予測モデル訓練 ===



✓ Stadium  1: MAE=0.0810s, RMSE=0.1056s (19115 samples)


✓ Stadium  2: MAE=0.0793s, RMSE=0.1036s (14684 samples)


✓ Stadium  3: MAE=0.0812s, RMSE=0.1032s (14244 samples)


✓ Stadium  4: MAE=0.0797s, RMSE=0.1024s (11868 samples)


✓ Stadium  5: MAE=0.0804s, RMSE=0.1052s (13956 samples)


✓ Stadium  6: MAE=0.0818s, RMSE=0.1065s (14328 samples)


✓ Stadium  7: MAE=0.0810s, RMSE=0.1066s (14172 samples)


✓ Stadium  8: MAE=0.0798s, RMSE=0.1029s (15192 samples)


✓ Stadium  9: MAE=0.0813s, RMSE=0.1069s (15134 samples)


✓ Stadium 10: MAE=0.0820s, RMSE=0.1060s (14028 samples)


✓ Stadium 11: MAE=0.0801s, RMSE=0.1037s (13380 samples)


✓ Stadium 12: MAE=0.0822s, RMSE=0.1064s (14892 samples)


✓ Stadium 13: MAE=0.0815s, RMSE=0.1052s (12078 samples)


✓ Stadium 14: MAE=0.0795s, RMSE=0.1029s (13032 samples)


✓ Stadium 15: MAE=0.0793s, RMSE=0.1022s (14880 samples)


✓ Stadium 16: MAE=0.0786s, RMSE=0.1016s (11497 samples)


✓ Stadium 17: MAE=0.0791s, RMSE=0.1030s (14232 samples)


✓ Stadium 18: MAE=0.0796s, RMSE=0.1029s (13584 samples)


✓ Stadium 19: MAE=0.0789s, RMSE=0.1027s (13092 samples)


✓ Stadium 20: MAE=0.0794s, RMSE=0.1036s (14508 samples)


✓ Stadium 21: MAE=0.0802s, RMSE=0.1059s (13968 samples)


✓ Stadium 22: MAE=0.0815s, RMSE=0.1072s (13308 samples)


✓ Stadium 23: MAE=0.0814s, RMSE=0.1056s (10932 samples)


✓ Stadium 24: MAE=0.0808s, RMSE=0.1065s (14604 samples)

=== Summary ===
Models trained: 24/24
Average MAE: 0.0804s
Average RMSE: 0.1045s


## チルト調整予測モデル

### 1. 特徴量準備

In [13]:
# Prepare enhanced features for tilt adjustment prediction
print('\n=== チルト調整予測モデル準備（分類タスク）===\n')

if 'チルト調整' not in final_data.columns:
    print('ERROR: チルト調整 not found!')
    raise KeyError('チルト調整 column missing')

# Start with base features
X_tilt = final_data[feature_cols].copy()
y_tilt = final_data['チルト調整'].copy()

# Add frame number
X_tilt['枠番'] = final_data['艇番']

# Add player's past tilt adjustment tendency
player_tilt_tendency = final_data.groupby('登録番号')['チルト調整'].agg('mean').reset_index()
player_tilt_tendency.columns = ['登録番号', 'プレイヤーチルト平均']
X_tilt_temp = final_data[['登録番号']].copy()
X_tilt_temp = X_tilt_temp.merge(player_tilt_tendency, on='登録番号', how='left')
X_tilt['プレイヤーチルト傾向'] = X_tilt_temp['プレイヤーチルト平均']

# Add stadium-level average tilt adjustment per frame
stadium_frame_tilt = final_data.groupby(['レース場', '艇番'])['チルト調整'].mean().reset_index()
stadium_frame_tilt.columns = ['レース場', '艇番', 'スタジアム枠別平均チルト']
X_tilt_temp = final_data[['レース場', '艇番']].copy()
X_tilt_temp = X_tilt_temp.merge(stadium_frame_tilt, on=['レース場', '艇番'], how='left')
X_tilt['スタジアム枠別平均'] = X_tilt_temp['スタジアム枠別平均チルト']

# Add performance interaction
X_tilt['全国勝率×枠'] = final_data['全国勝率'].fillna(0) * final_data['艇番']
X_tilt['当地勝率×枠'] = final_data['当地勝率'].fillna(0) * final_data['艇番']

# Convert to numeric
for col in X_tilt.columns:
    X_tilt[col] = pd.to_numeric(X_tilt[col], errors='coerce')

y_tilt = pd.to_numeric(y_tilt, errors='coerce')

# Round target to nearest 0.5 for classification
# Create categorical labels: -1.0, -0.5, 0.0, 0.5, 1.0, 1.5, 2.0, etc.
y_tilt_rounded = (y_tilt * 2).round() / 2  # Round to nearest 0.5

# Fill NaN
for col in X_tilt.columns:
    median_val = X_tilt[col].median()
    X_tilt[col].fillna(median_val if pd.notna(median_val) else 0, inplace=True)

# Remove rows with missing target
valid_tilt = y_tilt_rounded.notna()
X_tilt = X_tilt[valid_tilt].reset_index(drop=True)
y_tilt_rounded = y_tilt_rounded[valid_tilt].reset_index(drop=True)

print(f'Base features: {len(feature_cols)}')
print(f'Added features: 5 (枠番, プレイヤーチルト傾向, スタジアム枠別平均, 全国勝率×枠, 当地勝率×枠)')
print(f'Total features: {len(X_tilt.columns)}')
print(f'Samples: {len(X_tilt)}')
print(f'Target (チルト調整) classes (rounded to 0.5):')
for val in sorted(y_tilt_rounded.unique()):
    count = (y_tilt_rounded == val).sum()
    pct = count / len(y_tilt_rounded) * 100
    print(f'  {val:+.1f}: {count} ({pct:.1f}%)')


=== チルト調整予測モデル準備（分類タスク）===



Base features: 29
Added features: 5 (枠番, プレイヤーチルト傾向, スタジアム枠別平均, 全国勝率×枠, 当地勝率×枠)
Total features: 34
Samples: 334999
Target (チルト調整) classes (rounded to 0.5):
  -0.5: 185339 (55.3%)
  +0.0: 131682 (39.3%)
  +0.5: 13470 (4.0%)
  +1.0: 2412 (0.7%)
  +1.5: 543 (0.2%)
  +2.0: 610 (0.2%)
  +2.5: 34 (0.0%)
  +3.0: 909 (0.3%)


### 2. チルト調整予測モデル - レース場別訓練

In [14]:
# Train tilt adjustment models per stadium
print('\n=== チルト調整予測モデル訓練 ===\n')

tilt_adjustment_models = {}
results_summary_tilt = []

for stadium in stadiums:
    stadium_mask = final_data['レース場'] == stadium
    X_std = X_tilt[stadium_mask].reset_index(drop=True)
    y_std = y_tilt_rounded[stadium_mask].reset_index(drop=True)
    
    if len(X_std) < 100:
        print(f'Stadium {int(stadium):2d}: insufficient data ({len(X_std)} samples) - SKIP')
        continue
    
    X_train, X_test, y_train, y_test = train_test_split(X_std, y_std, test_size=0.2, random_state=42)
    # Convert float labels to string for LGBMClassifier compatibility
    y_train = y_train.astype(str)
    y_test = y_test.astype(str)
    
    scaler = StandardScaler()
    X_train_s = scaler.fit_transform(X_train)
    X_test_s = scaler.transform(X_test)
    
    # Train GBClassifier for tilt adjustment
    gbc_tilt = lgb.LGBMClassifier(
        n_estimators=150,
        max_depth=6,
        learning_rate=0.1,
        verbose=-1,
        random_state=42,
    )
    gbc_tilt.fit(X_train_s, y_train)
    
    y_pred = gbc_tilt.predict(X_test_s)
    acc = accuracy_score(y_test, y_pred)
    
    tilt_adjustment_models[stadium] = {'model': gbc_tilt, 'scaler': scaler, 'features': list(X_tilt.columns)}
    results_summary_tilt.append({'stadium': int(stadium), 'samples': len(X_std), 'accuracy': acc})
    
    status = '✓' if acc > 0.60 else '⚠' if acc > 0.45 else '✗'
    print(f'{status} Stadium {int(stadium):2d}: Accuracy={acc:.1%} ({len(X_std)} samples)')

if results_summary_tilt:
    results_tilt_df = pd.DataFrame(results_summary_tilt)
    print(f'\n=== Summary ===')
    print(f'Models trained: {len(results_tilt_df)}/{len(stadiums)}')
    print(f'Average Accuracy: {results_tilt_df["accuracy"].mean():.1%}')
    print(f'Min Accuracy: {results_tilt_df["accuracy"].min():.1%}')
    print(f'Max Accuracy: {results_tilt_df["accuracy"].max():.1%}')
else:
    print('ERROR: No tilt adjustment models trained!')


=== チルト調整予測モデル訓練 ===



⚠ Stadium  1: Accuracy=59.7% (19138 samples)


⚠ Stadium  2: Accuracy=58.1% (14700 samples)


✓ Stadium  3: Accuracy=61.0% (14268 samples)


✓ Stadium  4: Accuracy=67.1% (11892 samples)


✓ Stadium  5: Accuracy=63.8% (13980 samples)


⚠ Stadium  6: Accuracy=58.2% (14340 samples)


✓ Stadium  7: Accuracy=65.6% (14184 samples)


⚠ Stadium  8: Accuracy=57.2% (15204 samples)


⚠ Stadium  9: Accuracy=59.4% (15134 samples)


✓ Stadium 10: Accuracy=61.4% (14028 samples)


✓ Stadium 11: Accuracy=65.6% (13404 samples)


✓ Stadium 12: Accuracy=61.6% (14905 samples)


⚠ Stadium 13: Accuracy=56.5% (12090 samples)


✓ Stadium 14: Accuracy=61.2% (13032 samples)


✓ Stadium 15: Accuracy=66.8% (14880 samples)


✓ Stadium 16: Accuracy=63.1% (11509 samples)


✓ Stadium 17: Accuracy=66.2% (14232 samples)


⚠ Stadium 18: Accuracy=57.2% (13596 samples)


✓ Stadium 19: Accuracy=64.5% (13116 samples)


✓ Stadium 20: Accuracy=60.1% (14508 samples)


⚠ Stadium 21: Accuracy=59.4% (13968 samples)


⚠ Stadium 22: Accuracy=58.5% (13321 samples)


✓ Stadium 23: Accuracy=66.1% (10942 samples)


✓ Stadium 24: Accuracy=62.6% (14628 samples)

=== Summary ===
Models trained: 24/24
Average Accuracy: 61.7%
Min Accuracy: 56.5%
Max Accuracy: 67.1%


## モデル保存

In [15]:
# Save models
models_data = {
    'exhibition_time': exhibition_models,
    'course_entry': course_models,
    'start_timing': start_timing_models,
    'tilt_adjustment': tilt_adjustment_models
}

model_save_path = repo_root / 'models' / 'preview_models.pkl'
model_save_path.parent.mkdir(parents=True, exist_ok=True)

with open(model_save_path, 'wb') as f:
    pickle.dump(models_data, f)

print(f'✓ Saved models to {model_save_path}')
print(f'  Exhibition time models: {len(exhibition_models)}')
print(f'  Course entry models: {len(course_models)}')
print(f'  Start timing models: {len(start_timing_models)}')
print(f'  Tilt adjustment models: {len(tilt_adjustment_models)}')

✓ Saved models to /Users/mahiguch/dev/boatrace/data/models/preview_models.pkl
  Exhibition time models: 24
  Course entry models: 24
  Start timing models: 24
  Tilt adjustment models: 24


## 2026年1月のテストデータで予測

### 1. テストデータ読み込み

In [16]:
# Load test data for 2026-01
test_data_list = []

year_test = '2026'
month_test = '01'
month_num = int(month_test)
year_num = int(year_test)

_, max_day = calendar.monthrange(year_num, month_num)

for day in range(1, max_day + 1):
    day_str = f'{day:02d}'
    prog_path = repo_root / 'data' / 'programs' / year_test / month_test / f'{day_str}.csv'
    
    if prog_path.exists():
        try:
            prog_test = pd.read_csv(prog_path)
            test_data_list.append((day_str, prog_test))
        except Exception as e:
            print(f'Error loading {year_test}-{month_test}-{day_str}: {e}')

print(f'✓ Loaded {len(test_data_list)} days of test data for 2026-01')

✓ Loaded 31 days of test data for 2026-01


### 2. テストデータの変形

In [17]:
# Reshape test programs and add environment info from previews
test_programs_list = []

for day, prog_test in test_data_list:
    prog_reshaped = reshape_programs(prog_test)
    
    if not prog_reshaped.empty:
        # Map stadium
        prog_reshaped['レース場'] = prog_reshaped['レース場'].apply(map_stadium_name_to_number)
        prog_reshaped = prog_reshaped[prog_reshaped['レース場'].notna()].reset_index(drop=True)
        
        # Load previews for this day to get environment info
        prev_path = repo_root / 'data' / 'previews' / '2026' / '01' / f'{day}.csv'
        if prev_path.exists():
            try:
                prev_test = pd.read_csv(prev_path)
                prev_reshaped = reshape_previews(prev_test)
                
                # Extract environment columns from previews
                environment_cols = ['レースコード', '風速(m)', '波の高さ(cm)', '気温(℃)', '水温(℃)']
                available_env = [c for c in environment_cols if c in prev_reshaped.columns]
                
                if available_env:
                    prev_env = prev_reshaped[available_env].drop_duplicates()
                    prog_reshaped = prog_reshaped.merge(prev_env, on='レースコード', how='left')
            except Exception as e:
                pass
        
        # Add date column
        prog_reshaped['日付'] = day
        
        if not prog_reshaped.empty:
            test_programs_list.append(prog_reshaped)

if test_programs_list:
    test_programs = pd.concat(test_programs_list, ignore_index=True)
    print(f'✓ Test programs reshaped: {test_programs.shape}')
else:
    print('✗ No test programs')

✓ Test programs reshaped: (31350, 37)


### 3. 特徴量準備と予測

In [18]:
# Load trained models and prepare test features with enhanced features
with open(model_save_path, 'rb') as f:
    models_data = pickle.load(f)

exhibition_models = models_data['exhibition_time']
course_models = models_data['course_entry']
start_timing_models = models_data['start_timing']
tilt_adjustment_models = models_data['tilt_adjustment']

print(f'✓ Loaded exhibition time models: {len(exhibition_models)}')
print(f'✓ Loaded course entry models: {len(course_models)}')
print(f'✓ Loaded start timing models: {len(start_timing_models)}')
print(f'✓ Loaded tilt adjustment models: {len(tilt_adjustment_models)}')

# Get expected feature lists from trained models
expected_course_features = course_models[list(course_models.keys())[0]]['features']
expected_start_features = start_timing_models[list(start_timing_models.keys())[0]]['features']
expected_tilt_features = tilt_adjustment_models[list(tilt_adjustment_models.keys())[0]]['features']

print(f'Expected course features: {len(expected_course_features)}')
print(f'Expected start timing features: {len(expected_start_features)}')
print(f'Expected tilt adjustment features: {len(expected_tilt_features)}')

# Prepare test features for start timing prediction
X_test_start = test_programs[feature_cols].copy()
X_test_start['枠番'] = test_programs['艇番']

# Add player's start timing tendency
player_start_tendency = final_data.groupby('登録番号')['スタート展示'].agg('mean').reset_index()
player_start_tendency.columns = ['登録番号', 'プレイヤースタート平均']
X_start_temp = test_programs[['登録番号']].copy()
X_start_temp = X_start_temp.merge(player_start_tendency, on='登録番号', how='left')
X_test_start['プレイヤースタート傾向'] = X_start_temp['プレイヤースタート平均']

# Add stadium-frame average start timing
stadium_frame_start = final_data.groupby(['レース場', '艇番'])['スタート展示'].mean().reset_index()
stadium_frame_start.columns = ['レース場', '艇番', 'スタジアム枠別平均スタート']
X_start_temp = test_programs[['レース場', '艇番']].copy()
X_start_temp = X_start_temp.merge(stadium_frame_start, on=['レース場', '艇番'], how='left')
X_test_start['スタジアム枠別平均'] = X_start_temp['スタジアム枠別平均スタート']

# Add age interactions
if '年齢' in test_programs.columns:
    X_test_start['年齢×枠'] = test_programs['年齢'].fillna(0) * test_programs['艇番']
else:
    X_test_start['年齢×枠'] = 0
if '経験年数' in test_programs.columns:
    X_test_start['経験年数×枠'] = test_programs['経験年数'].fillna(0) * test_programs['艇番']
else:
    X_test_start['経験年数×枠'] = 0

# Convert to numeric and fill NaN
for col in X_test_start.columns:
    X_test_start[col] = pd.to_numeric(X_test_start[col], errors='coerce')

for col in X_test_start.columns:
    if col in X_start.columns:
        median_val = X_start[col].median()
    else:
        median_val = np.nan
    
    if pd.isna(median_val):
        X_test_start[col].fillna(0, inplace=True)
    else:
        X_test_start[col].fillna(median_val, inplace=True)

# Prepare test features for tilt adjustment prediction
X_test_tilt = test_programs[feature_cols].copy()
X_test_tilt['枠番'] = test_programs['艇番']

# Add player's tilt tendency
player_tilt_tendency = final_data.groupby('登録番号')['チルト調整'].agg('mean').reset_index()
player_tilt_tendency.columns = ['登録番号', 'プレイヤーチルト平均']
X_tilt_temp = test_programs[['登録番号']].copy()
X_tilt_temp = X_tilt_temp.merge(player_tilt_tendency, on='登録番号', how='left')
X_test_tilt['プレイヤーチルト傾向'] = X_tilt_temp['プレイヤーチルト平均']

# Add stadium-frame average tilt
stadium_frame_tilt = final_data.groupby(['レース場', '艇番'])['チルト調整'].mean().reset_index()
stadium_frame_tilt.columns = ['レース場', '艇番', 'スタジアム枠別平均チルト']
X_tilt_temp = test_programs[['レース場', '艇番']].copy()
X_tilt_temp = X_tilt_temp.merge(stadium_frame_tilt, on=['レース場', '艇番'], how='left')
X_test_tilt['スタジアム枠別平均'] = X_tilt_temp['スタジアム枠別平均チルト']

# Add performance interactions
X_test_tilt['全国勝率×枠'] = test_programs['全国勝率'].fillna(0) * test_programs['艇番']
X_test_tilt['当地勝率×枠'] = test_programs['当地勝率'].fillna(0) * test_programs['艇番']

# Convert to numeric and fill NaN
for col in X_test_tilt.columns:
    X_test_tilt[col] = pd.to_numeric(X_test_tilt[col], errors='coerce')

for col in X_test_tilt.columns:
    if col in X_tilt.columns:
        median_val = X_tilt[col].median()
    else:
        median_val = np.nan
    
    if pd.isna(median_val):
        X_test_tilt[col].fillna(0, inplace=True)
    else:
        X_test_tilt[col].fillna(median_val, inplace=True)

# Prepare for exhibition time and course prediction (use original features)
X_test_exhibition = test_programs[feature_cols].copy()
for col in X_test_exhibition.columns:
    X_test_exhibition[col] = pd.to_numeric(X_test_exhibition[col], errors='coerce')

for col in X_test_exhibition.columns:
    if col in X.columns:
        median_val = X[col].median()
    else:
        median_val = np.nan
    
    if pd.isna(median_val):
        X_test_exhibition[col].fillna(0, inplace=True)
    else:
        X_test_exhibition[col].fillna(median_val, inplace=True)

# Prepare for course prediction (uses enhanced features)
X_test_course = test_programs[feature_cols].copy()
X_test_course['枠番'] = test_programs['艇番']

# Add player's course entry tendency
player_course_tendency = final_data.groupby('登録番号')['コース'].agg(lambda x: x.mode()[0] if len(x.mode()) > 0 else x.median()).reset_index()
player_course_tendency.columns = ['登録番号', 'プレイヤー進入傾向コース']
X_course_temp = test_programs[['登録番号']].copy()
X_course_temp = X_course_temp.merge(player_course_tendency, on='登録番号', how='left')
X_test_course['プレイヤー進入傾向'] = X_course_temp['プレイヤー進入傾向コース']

# Add stadium-frame average course
stadium_frame_course = final_data.groupby(['レース場', '艇番'])['コース'].mean().reset_index()
stadium_frame_course.columns = ['レース場', '艇番', 'スタジアム枠別平均コース']
X_course_temp = test_programs[['レース場', '艇番']].copy()
X_course_temp = X_course_temp.merge(stadium_frame_course, on=['レース場', '艇番'], how='left')
X_test_course['スタジアム枠別平均'] = X_course_temp['スタジアム枠別平均コース']

# Add player win rate interactions
X_test_course['全国勝率×枠'] = test_programs['全国勝率'].fillna(0) * test_programs['艇番']
X_test_course['当地勝率×枠'] = test_programs['当地勝率'].fillna(0) * test_programs['艇番']

# Convert to numeric and fill NaN for course
for col in X_test_course.columns:
    X_test_course[col] = pd.to_numeric(X_test_course[col], errors='coerce')

for col in X_test_course.columns:
    if col in X_course.columns:
        median_val = X_course[col].median()
    else:
        median_val = np.nan
    
    if pd.isna(median_val):
        X_test_course[col].fillna(0, inplace=True)
    else:
        X_test_course[col].fillna(median_val, inplace=True)

print(f'✓ Test features prepared for exhibition time: {X_test_exhibition.shape}')
print(f'✓ Test features prepared for course entry: {X_test_course.shape}')
print(f'✓ Test features prepared for start timing: {X_test_start.shape}')
print(f'✓ Test features prepared for tilt adjustment: {X_test_tilt.shape}')

✓ Loaded exhibition time models: 24
✓ Loaded course entry models: 24
✓ Loaded start timing models: 24
✓ Loaded tilt adjustment models: 24
Expected course features: 34
Expected start timing features: 34
Expected tilt adjustment features: 34


✓ Test features prepared for exhibition time: (31350, 29)
✓ Test features prepared for course entry: (31350, 34)
✓ Test features prepared for start timing: (31350, 34)
✓ Test features prepared for tilt adjustment: (31350, 34)


### 4. 展示タイム予測

In [19]:
# Predict exhibition times
exhibition_predictions = []
success_count = 0
error_count = 0

for idx, row in test_programs.iterrows():
    stadium = row['レース場']
    
    if stadium not in exhibition_models:
        exhibition_predictions.append(np.nan)
        continue
    
    model_info = exhibition_models[stadium]
    model = model_info['model']
    scaler = model_info['scaler']
    
    try:
        X_row = X_test_exhibition.iloc[idx:idx+1]
        X_scaled = scaler.transform(X_row)
        pred = model.predict(X_scaled)[0]
        
        # Clamp to reasonable range (e.g., 5.0 to 8.0 seconds)
        pred = max(5.0, min(8.0, pred))
        
        exhibition_predictions.append(pred)
        success_count += 1
    except Exception as e:
        exhibition_predictions.append(np.nan)
        error_count += 1

test_programs['展示タイム'] = exhibition_predictions

print(f'✓ Exhibition time predictions: {success_count} successful, {error_count} errors')
print(f'  Valid predictions: {test_programs["展示タイム"].notna().sum()}/{len(test_programs)}')
if test_programs["展示タイム"].notna().sum() > 0:
    print(f'  Mean: {test_programs["展示タイム"].mean():.3f}s, Std: {test_programs["展示タイム"].std():.3f}s')

✓ Exhibition time predictions: 31350 successful, 0 errors
  Valid predictions: 31350/31350
  Mean: 6.794s, Std: 0.049s


### 5. 進入コース予測

In [20]:
# Predict course entries
course_predictions = []
success_count = 0
error_count = 0

for idx, row in test_programs.iterrows():
    stadium = row['レース場']
    
    if stadium not in course_models:
        course_predictions.append(np.nan)
        continue
    
    model_info = course_models[stadium]
    model = model_info['model']
    scaler = model_info['scaler']
    
    try:
        X_row = X_test_course.iloc[idx:idx+1]
        X_scaled = scaler.transform(X_row)
        pred = model.predict(X_scaled)[0]
        
        # Ensure it's in valid range (1-6)
        pred = max(1, min(6, int(pred)))
        
        course_predictions.append(pred)
        success_count += 1
    except Exception as e:
        course_predictions.append(np.nan)
        error_count += 1

test_programs['コース'] = course_predictions

print(f'✓ Course entry predictions: {success_count} successful, {error_count} errors')
print(f'  Valid predictions: {test_programs["コース"].notna().sum()}/{len(test_programs)}')
if test_programs["コース"].notna().sum() > 0:
    print(f'  Distribution: {test_programs["コース"].value_counts().sort_index().to_dict()}')

✓ Course entry predictions: 31350 successful, 0 errors
  Valid predictions: 31350/31350
  Distribution: {1: 5225, 2: 5226, 3: 5224, 4: 5222, 5: 5099, 6: 5354}


### 6. スタート展示予測

In [21]:
# Predict start timings
start_timing_predictions = []
success_count = 0
error_count = 0

for idx, row in test_programs.iterrows():
    stadium = row['レース場']
    
    if stadium not in start_timing_models:
        start_timing_predictions.append(np.nan)
        continue
    
    model_info = start_timing_models[stadium]
    model = model_info['model']
    scaler = model_info['scaler']
    
    try:
        X_row = X_test_start.iloc[idx:idx+1]
        X_scaled = scaler.transform(X_row)
        pred = model.predict(X_scaled)[0]
        
        # Clamp to reasonable range (e.g., -0.5 to 1.0 seconds)
        pred = max(-0.5, min(1.0, pred))
        
        start_timing_predictions.append(pred)
        success_count += 1
    except Exception as e:
        start_timing_predictions.append(np.nan)
        error_count += 1

test_programs['スタート展示'] = start_timing_predictions

print(f'✓ Start timing predictions: {success_count} successful, {error_count} errors')
print(f'  Valid predictions: {test_programs["スタート展示"].notna().sum()}/{len(test_programs)}')
if test_programs["スタート展示"].notna().sum() > 0:
    print(f'  Mean: {test_programs["スタート展示"].mean():.3f}s, Std: {test_programs["スタート展示"].std():.3f}s')

✓ Start timing predictions: 31350 successful, 0 errors
  Valid predictions: 31350/31350
  Mean: 0.075s, Std: 0.032s


### 7. チルト調整予測

In [22]:
# Predict tilt adjustments
tilt_adjustment_predictions = []
success_count = 0
error_count = 0

for idx, row in test_programs.iterrows():
    stadium = row['レース場']
    
    if stadium not in tilt_adjustment_models:
        tilt_adjustment_predictions.append(np.nan)
        continue
    
    model_info = tilt_adjustment_models[stadium]
    model = model_info['model']
    scaler = model_info['scaler']
    
    try:
        X_row = X_test_tilt.iloc[idx:idx+1]
        X_scaled = scaler.transform(X_row)
        pred = model.predict(X_scaled)[0]
        
        # Pred is already rounded to nearest 0.5 during training
        # Clamp to reasonable range (e.g., -1.0 to 2.5)
        pred = max(-1.0, min(2.5, float(pred)))
        
        tilt_adjustment_predictions.append(pred)
        success_count += 1
    except Exception as e:
        tilt_adjustment_predictions.append(np.nan)
        error_count += 1

test_programs['チルト調整'] = tilt_adjustment_predictions

print(f'✓ Tilt adjustment predictions: {success_count} successful, {error_count} errors')
print(f'  Valid predictions: {test_programs["チルト調整"].notna().sum()}/{len(test_programs)}')
if test_programs["チルト調整"].notna().sum() > 0:
    print(f'  Distribution:')
    dist = test_programs["チルト調整"][test_programs["チルト調整"].notna()].round(1).value_counts().sort_index()
    for val, count in dist.items():
        print(f'    {val:+.1f}: {count}')

✓ Tilt adjustment predictions: 31350 successful, 0 errors
  Valid predictions: 31350/31350
  Distribution:
    -0.5: 20005
    +0.0: 9326
    +0.5: 920
    +1.0: 323
    +1.5: 184
    +2.0: 194
    +2.5: 398


## 予測Previews形式での出力

In [23]:
# Prepare predicted previews output
# Output format: one row per race, with columns for each boat

output_data = {}

for race_code in test_programs['レースコード'].unique():
    race_programs = test_programs[test_programs['レースコード'] == race_code]
    
    if len(race_programs) == 0:
        continue
    
    # Get race-level info from first boat
    first_row = race_programs.iloc[0]
    race_info = {
        'レースコード': race_code,
        'レース日': first_row['レース日'],
        'レース場': int(first_row['レース場']),
        'レース回': first_row['レース回']
    }
    
    # Add boat-specific predictions for all 4 models
    for _, row in race_programs.iterrows():
        boat_num = int(row['艇番'])
        race_info[f'艇{boat_num}_展示タイム'] = row['展示タイム']
        race_info[f'艇{boat_num}_コース'] = row['コース']
        race_info[f'艇{boat_num}_スタート展示'] = row['スタート展示']
        race_info[f'艇{boat_num}_チルト調整'] = row['チルト調整']
    
    output_data[race_code] = race_info

# Convert to DataFrame
output_df = pd.DataFrame(list(output_data.values()))

# Sort columns: race info first, then by boat and prediction type
cols_first = ['レースコード', 'レース日', 'レース場', 'レース回']
cols_other = sorted([c for c in output_df.columns if c not in cols_first])
output_df = output_df[cols_first + cols_other]

print(f'✓ Prepared output: {output_df.shape}')
print(f'  Sample:')
print(output_df.head(3))

✓ Prepared output: (5128, 28)
  Sample:
         レースコード        レース日  レース場 レース回  艇1_コース  艇1_スタート展示  艇1_チルト調整  艇1_展示タイム  \
0  202601012301  2026-01-01    23   1R       1   0.043981      -0.5  6.871648   
1  202601012302  2026-01-01    23   2R       1   0.054987      -0.5  6.891597   
2  202601012303  2026-01-01    23   3R       1   0.092303      -0.5  6.773284   

   艇2_コース  艇2_スタート展示  ...  艇4_チルト調整  艇4_展示タイム  艇5_コース  艇5_スタート展示  艇5_チルト調整  \
0       2   0.081678  ...      -0.5  6.891336       5   0.047530      -0.5   
1       2   0.038337  ...      -0.5  6.871106       5   0.084583      -0.5   
2       2   0.112911  ...       0.0  6.795244       5   0.019846      -0.5   

   艇5_展示タイム  艇6_コース  艇6_スタート展示  艇6_チルト調整  艇6_展示タイム  
0  6.876201       6   0.026988      -0.5  6.849877  
1  6.878187       6   0.054066      -0.5  6.883296  
2  6.786157       6   0.064287       0.0  6.794635  

[3 rows x 28 columns]


## CSV出力（日別）

In [24]:
# Save predictions by date
output_dir = repo_root / 'data' / 'prediction-preview' / '2026' / '01'
output_dir.mkdir(parents=True, exist_ok=True)

# Group by date
for date_str in test_programs['レース日'].unique():
    if pd.isna(date_str):
        continue
    
    date_programs = test_programs[test_programs['レース日'] == date_str]
    
    # Create date key from race code
    race_codes = date_programs['レースコード'].unique()
    if len(race_codes) > 0:
        first_race_code = str(race_codes[0])
        if len(first_race_code) >= 8:
            year = first_race_code[:4]
            month = first_race_code[4:6]
            day = first_race_code[6:8]
            
            # Filter output for this date
            date_output = output_df[output_df['レース日'] == date_str]
            
            # Save
            output_path = output_dir / f'{day}.csv'
            date_output.to_csv(output_path, index=False)
            print(f'✓ Saved {output_path} ({len(date_output)} races)')

print('\nAll done!')

✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/01.csv (154 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/02.csv (189 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/03.csv (213 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/04.csv (228 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/05.csv (180 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/06.csv (180 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/07.csv (180 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/08.csv (165 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/09.csv (144 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/10.csv (144 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026

✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/18.csv (190 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/19.csv (176 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/20.csv (156 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/21.csv (156 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/22.csv (156 races)


✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/23.csv (156 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/24.csv (156 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/25.csv (156 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/26.csv (191 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/27.csv (168 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/28.csv (133 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/29.csv (130 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/30.csv (129 races)
✓ Saved /Users/mahiguch/dev/boatrace/data/data/prediction-preview/2026/01/31.csv (140 races)

All done!
