# 08. Integrated Prophet Analysis: Rolling Ensemble System

---

## 아키텍처

```
Data Loader -> Feature Engineering (RSI, MA 등)
  ↓
[Rolling Loop: 2021..2025]
  ↓
Train (Prophet + XGBoost Residual)
  ↓
Predict (Hybrid: Prophet + XGB Correction)
  ↓
Scenario Adjustment (Bull/Neutral/Bear)
  ↓
Evaluation (MAPE, Spearman, Top-K Hit Rate)
```

## 핵심 설계 원칙

### Rule 1) 통일된 데이터 포맷

### Rule 2) 타겟 y는 log(Sector_Index)
섹터별 일간 평균 수익률 -> 지수 생성 (Base=100) -> log 변환

### Rule 3) Rolling Loop에서 스케일러 재학습
매년 루프에서 train만 가지고 scaler fit, test는 transform만 (데이터 누수 방지)

### Rule 4) 앙상블은 잔차 방식
1. Prophet으로 yhat_prophet 생성
2. resid = y - yhat_prophet (train 구간)
3. XGBoost로 resid 예측
4. yhat_hybrid = yhat_prophet + resid_hat

### Rule 5) Scenario는 예측 후 적용
시나리오 분류는 train_end까지의 정보만 사용

### Rule 6) 평가는 랭킹 중심
- MAPE/RMSE/MAE: 참고
- **핵심**: Spearman Rank Correlation, Top-3/5 Hit Rate, 근접성공(|rank diff|≤2)

---

## Section 1. 환경 설정

In [82]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm
import seaborn as sns
from pathlib import Path
import warnings
from datetime import datetime, timedelta
import os
from prophet import Prophet
import xgboost as xgb

# 미국 공휴일
import holidays

from scipy.stats import spearmanr
from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error
from sklearn.preprocessing import StandardScaler

from enhanced_features import (
      load_and_prepare_macro,
      merge_macro_to_sector_panel,
      add_regime_features,
      add_momentum_features,
      calculate_dynamic_cps
  )

warnings.filterwarnings('ignore')

# 시각화 스타일
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 7)
plt.rcParams['font.size'] = 10

# 한글 폰트
import koreanize_matplotlib
pkg_dir = os.path.dirname(koreanize_matplotlib.__file__)
font_path = os.path.join(pkg_dir, 'fonts', 'NanumGothic.ttf')

fm.fontManager.addfont(font_path)
font_name = fm.FontProperties(fname=font_path).get_name()
plt.rcParams['font.family'] = font_name
plt.rcParams['font.sans-serif'] = [font_name] + plt.rcParams['font.sans-serif']
plt.rcParams['axes.unicode_minus'] = False

print(f"Font: {font_name}")

Font: NanumGothic


In [64]:
PROJECT_ROOT = Path('.').resolve()
DATA_DIR = PROJECT_ROOT / 'Data_set'
OUTPUT_DIR = DATA_DIR / 'Integrated_Prophet_Results'
TABLEAU_DIR = DATA_DIR / 'Tableau_Csv'

OUTPUT_DIR.mkdir(exist_ok=True)

print(f"프로젝트 경로: {PROJECT_ROOT}")
print(f"결과 저장: {OUTPUT_DIR}")

프로젝트 경로: /Users/yu_seok/Documents/workspace/nbCamp/Project/Yahoo Finance
결과 저장: /Users/yu_seok/Documents/workspace/nbCamp/Project/Yahoo Finance/Data_set/Integrated_Prophet_Results


---

## Section 2. 데이터 로드 및 섹터 인덱스 생성

### 2.1 원본 데이터 로드

In [65]:
print("=" * 60)
print("데이터 로드")
print("=" * 60)

df = pd.read_csv(DATA_DIR / 'stock_features_clean.csv')
df['Date'] = pd.to_datetime(df['Date'])

# 결측치 및 Unknown 섹터 제거
df = df.dropna(subset=['Date', 'Close', 'Sector', 'Daily_Return'])
df = df[df['Sector'] != 'Unknown']

print(f"\n로드 완료")
print(f"  Shape: {df.shape}")
print(f"  기간: {df['Date'].min().date()} ~ {df['Date'].max().date()}")
print(f"  기업 수: {df['Company'].nunique()}")
print(f"  섹터 수: {df['Sector'].nunique()}")

# 섹터 목록
print("\n섹터 목록:")
sector_counts = df.groupby('Sector')['Company'].nunique().sort_values(ascending=False)
for i, (sector, count) in enumerate(sector_counts.items(), 1):
    print(f"  {i:2d}. {sector} ({count}개 기업)")

데이터 로드

로드 완료
  Shape: (603359, 41)
  기간: 2020-11-27 ~ 2026-01-09
  기업 수: 481
  섹터 수: 11

섹터 목록:
   1. Financial Services (82개 기업)
   2. Technology (77개 기업)
   3. Healthcare (56개 기업)
   4. Industrials (56개 기업)
   5. Consumer Cyclical (45개 기업)
   6. Energy (38개 기업)
   7. Consumer Defensive (31개 기업)
   8. Communication Services (27개 기업)
   9. Basic Materials (26개 기업)
  10. Utilities (24개 기업)
  11. Real Estate (19개 기업)


### 2.2 섹터 인덱스 생성 (수익률 기반)

**계산 방식**:
1. 섹터 내 모든 종목의 일간 수익률 평균 계산
2. 이를 100에서 시작하는 누적 지수로 환산

**장점**: 
- 고가 주식 편향 제거
- Prophet이 선형 추세 잘 포착
- XGBoost 잔차 모델도 안정적

In [66]:
def calculate_sector_index_log(df):
    """
    수익률 기반 섹터 인덱스 계산 (Vectorized & Log Consistency)
    """
    df_clean = df.copy()

    #  1. 이상치 클리핑
    # 개별 종목의 수익률이 상위 1% / 하위 1%를 벗어나면 경계값으로 치환
    upper_limit = df_clean['Daily_Return'].quantile(0.99)
    lower_limit = df_clean['Daily_Return'].quantile(0.01)
    
    # 예: 상위 1%가 +25%라면, +50%인 종목을 +25%로 깎음
    df_clean['Daily_Return'] = df_clean['Daily_Return'].clip(lower=lower_limit, upper=upper_limit)

    # 2. 섹터/일자별 집계
    sector_daily = df_clean.groupby(['Date', 'Sector'], as_index=False).agg({
        'Daily_Return': 'mean',
        'Volume': 'sum',
        'Company': 'count'
    })
    
    # 날짜순 정렬
    sector_daily = sector_daily.sort_values(['Sector', 'Date'])
    
    # 결측치 처리
    sector_daily['Daily_Return'] = sector_daily['Daily_Return'].fillna(0)
    
    # 3. 지수 계산
    # 각 섹터별로 (1+r)의 누적곱 계산
    sector_daily['Index'] = sector_daily.groupby('Sector')['Daily_Return'].transform(
        lambda x: 100 * (1 + x).cumprod()
    )
    
    # 4. 로그 변환
    sector_daily['y'] = np.log(sector_daily['Index'])
    
    # Prophet 입력용 컬럼
    sector_daily['ds'] = sector_daily['Date']
    
    return sector_daily

# 실행
sector_df = calculate_sector_index_log(df)

print(f"\n섹터 인덱스 생성 완료: {sector_df.shape}")
print(f"컬럼: {sector_df.columns.tolist()}")
print(f"\n샘플 데이터:")
print(sector_df.head(10))


섹터 인덱스 생성 완료: (14135, 8)
컬럼: ['Date', 'Sector', 'Daily_Return', 'Volume', 'Company', 'Index', 'y', 'ds']

샘플 데이터:
         Date           Sector  Daily_Return     Volume  Company       Index  \
0  2020-11-27  Basic Materials      0.003054   68568783       24  100.305401   
11 2020-11-30  Basic Materials     -0.009052  132420442       24   99.397448   
22 2020-12-01  Basic Materials      0.022079  126777499       24  101.592009   
33 2020-12-02  Basic Materials     -0.000330  123427550       24  101.558435   
44 2020-12-03  Basic Materials      0.003853  107596029       24  101.949698   
55 2020-12-04  Basic Materials      0.019749  127260672       24  103.963087   
66 2020-12-07  Basic Materials     -0.000743  105091051       24  103.885852   
77 2020-12-08  Basic Materials      0.003298   82921991       24  104.228511   
88 2020-12-09  Basic Materials     -0.005910   87466621       24  103.612512   
99 2020-12-10  Basic Materials     -0.001343  101516656       24  103.473313   

    

---

## Section 3. Feature Engineering (기술적 지표)

### 3.1 기술적 지표 계산

**Prophet 외부 변수로 사용할 기술적 지표**:
- **RSI_14**: 상대강도지수 (과매수/과매도)
- **Volatility_20d**: 20일 변동성
- **BB_Width**: 볼린저 밴드 폭 (변동성 스퀴즈)
- **Vol_Z_Score**: 거래량 Z-Score
- **Gap_Pct**: 전일 대비 갭 비율

In [67]:
import pandas as pd
import numpy as np

def build_sector_panel_from_stock_df(
    df: pd.DataFrame,
    sector_index_df: pd.DataFrame,
    key_columns: list,
    *,
    # 집계/가공 옵션
    mean_cols: list | None = None,
    risk_cols_q80: list | None = None,
    vol_z_q90_col: str = "Vol_Z_Score",
    prefix: str = "Avg_",
    # 파생 컬럼 생성 옵션
    add_scaled_cols: bool = True,
    add_log1p_cols: bool = True,
    # 결측치 처리
    fill_method: str = "ffill_bfill",   # "ffill_bfill" | "ffill_zero" | "none"
) -> pd.DataFrame:

    # 0) 안전장치 / 표준화
    df = df.copy()
    df["Date"] = pd.to_datetime(df["Date"])

    sector_index_df = sector_index_df.copy()
    if "Date" not in sector_index_df.columns and "ds" in sector_index_df.columns:
        sector_index_df["Date"] = pd.to_datetime(sector_index_df["ds"])
    else:
        sector_index_df["Date"] = pd.to_datetime(sector_index_df["Date"])

    if "ds" not in sector_index_df.columns:
        sector_index_df["ds"] = sector_index_df["Date"]

    # 1) 기본 집계 대상 컬럼 정의
    default_mean_cols = [
        "Daily_Return_raw", "Cum_Return",
        "Return_1M", "Return_3M", "Return_6M",
        "MA_5", "MA_20", "MA_60",
        "Volatility_20d",
        "Vol_MA_20", "Vol_Ratio", "Log_Volume_W",
        "RSI_14", "BB_Width",
    ]
    default_risk_cols_q80 = ["Drawdown", "MDD", "DD_Short"]

    if mean_cols is None:
        mean_cols = default_mean_cols
    if risk_cols_q80 is None:
        risk_cols_q80 = default_risk_cols_q80

    mean_cols = [c for c in mean_cols if (c in key_columns and c in df.columns)]
    risk_cols_q80 = [c for c in risk_cols_q80 if (c in key_columns and c in df.columns)]
    use_vol_z_q90 = (vol_z_q90_col in key_columns and vol_z_q90_col in df.columns)

    # 2) 집계 실행
    # mean 집계
    frames = []
    if mean_cols:
        mean_agg = (
            df.groupby(["Date", "Sector"], as_index=False)[mean_cols]
              .mean()
        )
        frames.append(mean_agg)

    # q80(보수적) 집계 - 리스크 컬럼
    if risk_cols_q80:
        q80_agg = (
            df.groupby(["Date", "Sector"], as_index=False)[risk_cols_q80]
              .quantile(0.80)
        )
        frames.append(q80_agg)

    # q90 집계 - Vol_Z_Score (섹터 내 '급증 종목 존재' 신호 살리기)
    if use_vol_z_q90:
        q90_agg = (
            df.groupby(["Date", "Sector"], as_index=False)[[vol_z_q90_col]]
              .quantile(0.90)
        )
        frames.append(q90_agg)

    if not frames:
        sector_panel = sector_index_df.sort_values(["Sector", "ds"]).copy()
        return sector_panel

    # 집계 결과 병합 (Date, Sector 기준)
    sector_feats = frames[0]
    for f in frames[1:]:
        sector_feats = sector_feats.merge(f, on=["Date", "Sector"], how="outer")

    # 3) 컬럼명 접두어(Avg_) 적용
    rename_map = {}
    for c in mean_cols + risk_cols_q80:
        if c in sector_feats.columns:
            rename_map[c] = f"{prefix}{c}"

    if use_vol_z_q90 and vol_z_q90_col in sector_feats.columns:
        rename_map[vol_z_q90_col] = f"{prefix}{vol_z_q90_col}"

    sector_feats = sector_feats.rename(columns=rename_map)

    # 4) 섹터 지수(타깃 y)와 결합
    sector_panel = sector_index_df.merge(
        sector_feats,
        on=["Date", "Sector"],
        how="left"
    )

    # 5) 모델 입력용 파생 컬럼 (스케일/로그)
    if add_scaled_cols and f"{prefix}RSI_14" in sector_panel.columns:
        sector_panel[f"{prefix}RSI_14_scaled"] = sector_panel[f"{prefix}RSI_14"] / 100.0

    if add_log1p_cols:
        for base_col in ["Volatility_20d", "Vol_Ratio", "Log_Volume_W"]:
            c = f"{prefix}{base_col}"
            if c in sector_panel.columns:
                sector_panel[f"{c}_log1p"] = np.log1p(sector_panel[c])

    # 6) 정렬 + 결측 처리
    sector_panel = sector_panel.sort_values(["Sector", "ds"])

    if fill_method == "ffill_bfill":
        sector_panel = sector_panel.groupby("Sector", group_keys=False).apply(
            lambda x: x.ffill().bfill()
        )
    elif fill_method == "ffill_zero":
        sector_panel = sector_panel.groupby("Sector", group_keys=False).apply(
            lambda x: x.ffill()
        ).fillna(0)
    elif fill_method == "none":
        pass
    else:
        raise ValueError("fill_method must be one of: 'ffill_bfill', 'ffill_zero', 'none'")

    return sector_panel



---

## Section 4. 미국 시장 휴장일 설정

In [68]:
def get_market_holidays_simple(start_year, end_year):
    """
    미국 NYSE 휴장일 데이터 생성
    """
    nyse_holidays = holidays.NYSE(years=range(start_year, end_year + 1))
    
    holiday_data = []
    for date, name in nyse_holidays.items():
        holiday_data.append({
            'holiday': 'market_closed',
            'ds': pd.to_datetime(date),
            'lower_window': 0,
            'upper_window': 0
        })
        
    return pd.DataFrame(holiday_data)

US_HOLIDAYS = get_market_holidays_simple(2018, 2026)
print(f"미국 시장 휴장일: {len(US_HOLIDAYS)}개")
print(f"\n2024년 휴장일 샘플:")
print(US_HOLIDAYS[US_HOLIDAYS['ds'].dt.year == 2024].sort_values('ds'))

미국 시장 휴장일: 87개

2024년 휴장일 샘플:
          holiday         ds  lower_window  upper_window
56  market_closed 2024-01-01             0             0
57  market_closed 2024-01-15             0             0
58  market_closed 2024-02-19             0             0
59  market_closed 2024-03-29             0             0
60  market_closed 2024-05-27             0             0
61  market_closed 2024-06-19             0             0
62  market_closed 2024-07-04             0             0
63  market_closed 2024-09-02             0             0
64  market_closed 2024-11-28             0             0
65  market_closed 2024-12-25             0             0


---

## Section 5. 모델 함수 정의

### 5.1 Prophet 모델 함수

In [69]:
def create_prophet_model(holidays_df, cps=0.05):
    """
    Prophet 모델 생성
    """
    return Prophet(
        growth='linear',
        changepoint_prior_scale=cps,
        seasonality_mode='additive',
        yearly_seasonality=True,
        weekly_seasonality=False,
        daily_seasonality=False,
        holidays=holidays_df,
        interval_width=0.95
    )
    
    # 기술적 지표를 외부 변수로 추가
    regressors = ['Volatility_20d', 'RSI_14', 'Vol_Z_Score', 'BB_Width', 'Gap_Pct']
    for r in regressors:
        model.add_regressor(r)
    
    return model

print(" Prophet 모델 함수 정의 완료")

 Prophet 모델 함수 정의 완료


### 5.2 XGBoost 잔차 모델 함수

**핵심 로직**:
1. Prophet으로 yhat_prophet 생성
2. residual = y - yhat_prophet 계산
3. XGBoost로 residual 예측
4. yhat_hybrid = yhat_prophet + residual_predicted

In [70]:
def make_xgb_features(df: pd.DataFrame, *, lags=(1, 5, 20), roll_window=20) -> pd.DataFrame:
    """
    잔차 모델용 시계열 피처 생성
    """
    df = df.sort_values(['Sector', 'ds']).copy()

    # lag features (과거값)
    for lag in lags:
        df[f'y_lag_{lag}'] = df.groupby('Sector')['y'].shift(lag)

    # rolling stats (과거값 기반: shift(1) 후 rolling)
    y_shift_1 = df.groupby('Sector')['y'].shift(1)
    df[f'y_roll_mean_{roll_window}'] = (
        y_shift_1.groupby(df['Sector']).rolling(roll_window).mean().reset_index(level=0, drop=True)
    )
    df[f'y_roll_std_{roll_window}'] = (
        y_shift_1.groupby(df['Sector']).rolling(roll_window).std().reset_index(level=0, drop=True)
    )

    return df


def train_xgb_residual_model(
    train_data: pd.DataFrame,
    prophet_forecast: pd.DataFrame,
    feature_cols: list | None = None,
    *,
    horizon: int = 1,              
    lags=(1, 5, 20),
    roll_window: int = 20,
    scale: bool = True,
    verbose: bool = True,
):
    """
    XGBoost 잔차 모델 학습
    """

    # 1) 병합
    pf = prophet_forecast[['ds', 'Sector', 'yhat']].copy()
    merged = train_data.merge(pf, on=['ds', 'Sector'], how='inner')

    # 2) 잔차 계산
    merged['residual'] = merged['y'] - merged['yhat']

    # 3) 시계열 피처 추가
    merged = make_xgb_features(merged, lags=lags, roll_window=roll_window)

    # 4) target을 미래로 이동   
    if horizon < 0:
        raise ValueError("horizon must be >= 0")
    if horizon > 0:
        merged['target'] = merged.groupby('Sector')['residual'].shift(-horizon)
    else:
        merged['target'] = merged['residual']

    # 5) 기본 피처 세트
    if feature_cols is None:
        feature_cols = [
            'yhat',  # Prophet 예측 자체도 강력한 설명 변수
            'y_lag_1', 'y_lag_5', 'y_lag_20',
            f'y_roll_std_{roll_window}',
            'Avg_Volatility_20d_log1p',
            'Avg_RSI_14_scaled',
            'Avg_Vol_Ratio_log1p',
            'Avg_Vol_Z_Score',     # q90
            'Avg_BB_Width',
            'Avg_DD_Short',        # q80
            'Avg_Return_3M',
        ]

    feature_cols_used = [c for c in feature_cols if c in merged.columns]
    if not feature_cols_used:
        raise ValueError(f"유효한 feature가 없습니다. feature_cols={feature_cols}")

    # 6) 결측 제거
    model_df = merged.dropna(subset=feature_cols_used + ['target']).copy()

    X_train = model_df[feature_cols_used].values
    y_train = model_df['target'].values

    # 7) 스케일링 (train only)
    scaler = None
    if scale:
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)

    # 8) XGB 모델
    xgb_model = xgb.XGBRegressor(
        n_estimators=600,
        learning_rate=0.05,
        max_depth=3,
        subsample=0.8,
        colsample_bytree=0.8,
        reg_alpha=0.0,
        reg_lambda=1.0,
        random_state=42,
        objective='reg:squarederror',
        n_jobs=-1
    )
    xgb_model.fit(X_train, y_train)

    meta = {
        "horizon": horizon,
        "n_rows_train": int(len(model_df)),
        "features": feature_cols_used,
        "lags": list(lags),
        "roll_window": roll_window,
    }

    if verbose:
        print(f"XGB residual model trained | horizon={horizon} | rows={meta['n_rows_train']:,} | features={len(feature_cols_used)}")

    return xgb_model, scaler, feature_cols_used, meta

---

## Section 6. Rolling Loop: 년도별 학습 및 예측

### 6.1 년도별 예측 함수

In [71]:
def predict_sector_year_hybrid_final(
    sector_data: pd.DataFrame,
    year: int,
    holidays_df: pd.DataFrame | None,
    *,
    prophet_regressors=None,
    xgb_feature_cols=None,
    horizon: int = 1,
):
    """
    year에 대해 Prophet + XGB(residual) 하이브리드 예측
    """

    # 0. 안전장치
    sectors = sector_data['Sector'].dropna().unique()
    if len(sectors) != 1:
        raise ValueError(f"sector_data must contain exactly 1 sector, got: {sectors}")
    sector_name = sectors[0]

    df = sector_data.copy()
    df['ds'] = pd.to_datetime(df['ds'])
    df = df.sort_values('ds').reset_index(drop=True)

    split_date = pd.Timestamp(f"{year}-01-01")
    next_year_date = pd.Timestamp(f"{year+1}-01-01")

    train = df[df['ds'] < split_date].copy()
    test  = df[(df['ds'] >= split_date) & (df['ds'] < next_year_date)].copy()

    if len(train) == 0 or len(test) == 0:
        return None

    # 1. Prophet (Trend)
    prophet_model = create_prophet_model(holidays_df)

    # Prophet은 ds, y만 사용
    prophet_model.fit(train[['ds', 'y']])

    # 전체 timeline에 대해 trend 예측
    fcst = prophet_model.predict(df[['ds']])

    fcst = fcst[['ds', 'yhat']].copy()
    fcst['Sector'] = sector_name

    # 2. Prophet 결과 병합 + residual 계산 (FULL timeline)
    full = df.merge(fcst, on=['ds', 'Sector'], how='left')
    full['residual'] = full['y'] - full['yhat']

    # 3. XGB feature 생성 (FULL timeline에서 1회)
    #    → 연말-연초 lag/rolling 정합 완벽
    full_feat = make_xgb_features(full, lags=(1, 5, 20), roll_window=20)

    # 4. XGB 학습 데이터 구성 (Direct strategy)
    #    target = residual(t + horizon)
    full_feat['target'] = full_feat.groupby('Sector')['residual'].shift(-horizon)

    train_feat = full_feat[full_feat['ds'] < split_date].copy()
    test_feat  = full_feat[(full_feat['ds'] >= split_date) & (full_feat['ds'] < next_year_date)].copy()

    if xgb_feature_cols is None:
        xgb_feature_cols = [
            'yhat',
            'y_lag_1', 'y_lag_5', 'y_lag_20',
            'y_roll_std_20',
            'resid_lag_1', 'resid_lag_5',
            'resid_roll_std_20',
            'Avg_Volatility_20d_log1p',
            'Avg_RSI_14_scaled',
            'Avg_Vol_Z_Score',
            'Avg_Return_3M',
        ]

    xgb_feature_cols = [c for c in xgb_feature_cols if c in train_feat.columns]

    train_feat = train_feat.dropna(subset=xgb_feature_cols + ['target'])
    test_feat  = test_feat.dropna(subset=xgb_feature_cols)

    # 5. XGBoost 잔차 모델 학습
    scaler = StandardScaler()
    X_train = scaler.fit_transform(train_feat[xgb_feature_cols].values)
    y_train = train_feat['target'].values

    xgb_model = xgb.XGBRegressor(
        n_estimators=600,
        learning_rate=0.03,
        max_depth=3,
        subsample=0.8,
        colsample_bytree=0.8,
        reg_lambda=2.0,
        min_child_weight=5,
        random_state=42,
        n_jobs=-1,
        objective='reg:squarederror'
    )
    xgb_model.fit(X_train, y_train)

    # 6. Test residual 예측
    X_test = scaler.transform(test_feat[xgb_feature_cols].values)
    resid_hat = xgb_model.predict(X_test)

    test_feat['yhat_xgb_resid_h'] = resid_hat

    # 7. 최종 결과
    #    yhat_hybrid(t+h) = yhat_prophet(t+h) + resid_hat(t+h)
    result = test_feat[['ds', 'Sector', 'y', 'yhat', 'yhat_xgb_resid_h']].copy()
    result = result.rename(columns={'y': 'y_actual', 'yhat': 'yhat_prophet'})
    result['yhat_hybrid_h'] = result['yhat_prophet'] + result['yhat_xgb_resid_h']

    return result

### 6.2 Rolling Loop 실행 (2021-2025)

In [72]:
key_columns = [
    'Daily_Return_raw','Cum_Return',
    'Return_1M','Return_3M','Return_6M',
    'MA_5','MA_20','MA_60',
    'Volatility_20d',
    'Vol_MA_20','Vol_Ratio','Log_Volume_W',
    'RSI_14','BB_Width',
    'Drawdown','MDD','DD_Short', 'Vol_Z_Score'
]

sector_final = build_sector_panel_from_stock_df(
    df=df,                    
    sector_index_df=sector_df,
    key_columns=key_columns,    
    prefix="Avg_",            
    add_scaled_cols=True,     
    add_log1p_cols=True,      
    fill_method="ffill_bfill" 
)

# 1. 거시경제 데이터 로드
macro_df = load_and_prepare_macro(DATA_DIR)

# 2. 섹터 패널에 거시경제 데이터 병합
sector_final = merge_macro_to_sector_panel(sector_final, macro_df)

# 3. 레짐 피처 추가
sector_final = add_regime_features(sector_final)

# 4. 모멘텀 피처 추가
sector_final = add_momentum_features(sector_final)

print(f" 생성 완료: sector_final")
print(f"   Shape: {sector_final.shape}")
print(f"   컬럼 확인: {[c for c in sector_final.columns if 'Avg_' in c][:5]} ...")

 생성 완료: sector_final
   Shape: (14135, 30)
   컬럼 확인: ['Avg_Daily_Return_raw', 'Avg_Cum_Return', 'Avg_Return_1M', 'Avg_Return_3M', 'Avg_Return_6M'] ...


In [73]:
# Rolling Loop 설정 (예측 대상 연도)
TEST_YEARS = [2022, 2023, 2024, 2025]   # 보통 이렇게가 실용적
sectors = sector_final['Sector'].unique()

print("=" * 70)
print(f"Rolling Loop 실행: {TEST_YEARS[0]} ~ {TEST_YEARS[-1]}")
print("=" * 70)

all_predictions = []

for year in TEST_YEARS:
    print(f"\n{'='*70}")
    print(f"Test Year {year} 예측 시작 (Train <= {year-1})")
    print(f"{'='*70}")
    
    year_predictions = []
    
    for i, sector in enumerate(sectors, 1):
        print(f"  [{i}/{len(sectors)}] {sector:25s} ", end="")
        
        sector_data = sector_final[sector_final['Sector'] == sector].copy()
        
        try:
            pred = predict_sector_year_hybrid_final(
                sector_data=sector_data,
                year=year,
                holidays_df=US_HOLIDAYS,
                horizon=1
            )
            
            if pred is not None and len(pred) > 0:
                pred['test_year'] = year
                pred['train_end_year'] = year - 1
                year_predictions.append(pred)
                print(f"{len(pred)}일")
            else:
                print("데이터 부족")
                
        except Exception as e:
            print(f"오류: {str(e)}")
    
    if year_predictions:
        year_df = pd.concat(year_predictions, ignore_index=True)
        all_predictions.append(year_df)
        print(f"\n {year}년 예측 완료: {len(year_df):,} 레코드")

if all_predictions:
    df_predictions = pd.concat(all_predictions, ignore_index=True)
    print(f"\n{'='*70}")
    print("전체 Rolling Loop 완료")
    print(f"{'='*70}")
    print(f"총 예측 레코드: {len(df_predictions):,}")
    print(f"섹터 수: {df_predictions['Sector'].nunique()}")
    print(f"년도: {sorted(df_predictions['test_year'].unique().tolist())}")
else:
    print("\n예측 실패")

17:25:15 - cmdstanpy - INFO - Chain [1] start processing
17:25:15 - cmdstanpy - INFO - Chain [1] done processing


Rolling Loop 실행: 2022 ~ 2025

Test Year 2022 예측 시작 (Train <= 2021)
  [1/11] Basic Materials           

17:25:15 - cmdstanpy - INFO - Chain [1] start processing
17:25:15 - cmdstanpy - INFO - Chain [1] done processing


251일
  [2/11] Communication Services    

17:25:16 - cmdstanpy - INFO - Chain [1] start processing
17:25:16 - cmdstanpy - INFO - Chain [1] done processing


251일
  [3/11] Consumer Cyclical         

17:25:16 - cmdstanpy - INFO - Chain [1] start processing
17:25:16 - cmdstanpy - INFO - Chain [1] done processing


251일
  [4/11] Consumer Defensive        

17:25:17 - cmdstanpy - INFO - Chain [1] start processing
17:25:17 - cmdstanpy - INFO - Chain [1] done processing


251일
  [5/11] Energy                    

17:25:17 - cmdstanpy - INFO - Chain [1] start processing
17:25:17 - cmdstanpy - INFO - Chain [1] done processing


251일
  [6/11] Financial Services        

17:25:18 - cmdstanpy - INFO - Chain [1] start processing
17:25:18 - cmdstanpy - INFO - Chain [1] done processing


251일
  [7/11] Healthcare                

17:25:19 - cmdstanpy - INFO - Chain [1] start processing
17:25:19 - cmdstanpy - INFO - Chain [1] done processing


251일
  [8/11] Industrials               

17:25:19 - cmdstanpy - INFO - Chain [1] start processing
17:25:19 - cmdstanpy - INFO - Chain [1] done processing


251일
  [9/11] Real Estate               

17:25:20 - cmdstanpy - INFO - Chain [1] start processing
17:25:20 - cmdstanpy - INFO - Chain [1] done processing


251일
  [10/11] Technology                

17:25:20 - cmdstanpy - INFO - Chain [1] start processing
17:25:20 - cmdstanpy - INFO - Chain [1] done processing


251일
  [11/11] Utilities                 

17:25:21 - cmdstanpy - INFO - Chain [1] start processing
17:25:21 - cmdstanpy - INFO - Chain [1] done processing


251일

 2022년 예측 완료: 2,761 레코드

Test Year 2023 예측 시작 (Train <= 2022)
  [1/11] Basic Materials           

17:25:21 - cmdstanpy - INFO - Chain [1] start processing
17:25:22 - cmdstanpy - INFO - Chain [1] done processing


250일
  [2/11] Communication Services    

17:25:22 - cmdstanpy - INFO - Chain [1] start processing
17:25:22 - cmdstanpy - INFO - Chain [1] done processing


250일
  [3/11] Consumer Cyclical         

17:25:23 - cmdstanpy - INFO - Chain [1] start processing
17:25:23 - cmdstanpy - INFO - Chain [1] done processing


250일
  [4/11] Consumer Defensive        

17:25:23 - cmdstanpy - INFO - Chain [1] start processing
17:25:24 - cmdstanpy - INFO - Chain [1] done processing


250일
  [5/11] Energy                    

17:25:24 - cmdstanpy - INFO - Chain [1] start processing


250일
  [6/11] Financial Services        

17:25:24 - cmdstanpy - INFO - Chain [1] done processing
17:25:25 - cmdstanpy - INFO - Chain [1] start processing


250일
  [7/11] Healthcare                

17:25:25 - cmdstanpy - INFO - Chain [1] done processing
17:25:26 - cmdstanpy - INFO - Chain [1] start processing
17:25:26 - cmdstanpy - INFO - Chain [1] done processing


250일
  [8/11] Industrials               

17:25:26 - cmdstanpy - INFO - Chain [1] start processing
17:25:26 - cmdstanpy - INFO - Chain [1] done processing


250일
  [9/11] Real Estate               

17:25:27 - cmdstanpy - INFO - Chain [1] start processing
17:25:27 - cmdstanpy - INFO - Chain [1] done processing


250일
  [10/11] Technology                

17:25:27 - cmdstanpy - INFO - Chain [1] start processing
17:25:28 - cmdstanpy - INFO - Chain [1] done processing


250일
  [11/11] Utilities                 

17:25:28 - cmdstanpy - INFO - Chain [1] start processing
17:25:28 - cmdstanpy - INFO - Chain [1] done processing


250일

 2023년 예측 완료: 2,750 레코드

Test Year 2024 예측 시작 (Train <= 2023)
  [1/11] Basic Materials           

17:25:29 - cmdstanpy - INFO - Chain [1] start processing


252일
  [2/11] Communication Services    

17:25:29 - cmdstanpy - INFO - Chain [1] done processing
17:25:30 - cmdstanpy - INFO - Chain [1] start processing


252일
  [3/11] Consumer Cyclical         

17:25:30 - cmdstanpy - INFO - Chain [1] done processing
17:25:30 - cmdstanpy - INFO - Chain [1] start processing


252일
  [4/11] Consumer Defensive        

17:25:31 - cmdstanpy - INFO - Chain [1] done processing
17:25:31 - cmdstanpy - INFO - Chain [1] start processing


252일
  [5/11] Energy                    

17:25:31 - cmdstanpy - INFO - Chain [1] done processing
17:25:32 - cmdstanpy - INFO - Chain [1] start processing
17:25:32 - cmdstanpy - INFO - Chain [1] done processing


252일
  [6/11] Financial Services        

17:25:33 - cmdstanpy - INFO - Chain [1] start processing


252일
  [7/11] Healthcare                

17:25:33 - cmdstanpy - INFO - Chain [1] done processing
17:25:33 - cmdstanpy - INFO - Chain [1] start processing


252일
  [8/11] Industrials               

17:25:34 - cmdstanpy - INFO - Chain [1] done processing
17:25:34 - cmdstanpy - INFO - Chain [1] start processing


252일
  [9/11] Real Estate               

17:25:34 - cmdstanpy - INFO - Chain [1] done processing
17:25:35 - cmdstanpy - INFO - Chain [1] start processing


252일
  [10/11] Technology                

17:25:35 - cmdstanpy - INFO - Chain [1] done processing
17:25:36 - cmdstanpy - INFO - Chain [1] start processing


252일
  [11/11] Utilities                 

17:25:36 - cmdstanpy - INFO - Chain [1] done processing
17:25:36 - cmdstanpy - INFO - Chain [1] start processing


252일

 2024년 예측 완료: 2,772 레코드

Test Year 2025 예측 시작 (Train <= 2024)
  [1/11] Basic Materials           

17:25:37 - cmdstanpy - INFO - Chain [1] done processing
17:25:37 - cmdstanpy - INFO - Chain [1] start processing


250일
  [2/11] Communication Services    

17:25:37 - cmdstanpy - INFO - Chain [1] done processing
17:25:38 - cmdstanpy - INFO - Chain [1] start processing


250일
  [3/11] Consumer Cyclical         

17:25:38 - cmdstanpy - INFO - Chain [1] done processing
17:25:39 - cmdstanpy - INFO - Chain [1] start processing


250일
  [4/11] Consumer Defensive        

17:25:39 - cmdstanpy - INFO - Chain [1] done processing
17:25:40 - cmdstanpy - INFO - Chain [1] start processing


250일
  [5/11] Energy                    

17:25:40 - cmdstanpy - INFO - Chain [1] done processing
17:25:41 - cmdstanpy - INFO - Chain [1] start processing


250일
  [6/11] Financial Services        

17:25:41 - cmdstanpy - INFO - Chain [1] done processing
17:25:41 - cmdstanpy - INFO - Chain [1] start processing


250일
  [7/11] Healthcare                

17:25:42 - cmdstanpy - INFO - Chain [1] done processing
17:25:42 - cmdstanpy - INFO - Chain [1] start processing


250일
  [8/11] Industrials               

17:25:43 - cmdstanpy - INFO - Chain [1] done processing
17:25:43 - cmdstanpy - INFO - Chain [1] start processing


250일
  [9/11] Real Estate               

17:25:43 - cmdstanpy - INFO - Chain [1] done processing
17:25:44 - cmdstanpy - INFO - Chain [1] start processing


250일
  [10/11] Technology                

17:25:44 - cmdstanpy - INFO - Chain [1] done processing
17:25:45 - cmdstanpy - INFO - Chain [1] start processing


250일
  [11/11] Utilities                 

17:25:45 - cmdstanpy - INFO - Chain [1] done processing


250일

 2025년 예측 완료: 2,750 레코드

전체 Rolling Loop 완료
총 예측 레코드: 11,033
섹터 수: 11
년도: [2022, 2023, 2024, 2025]


---

## Section 7. Scenario Adjustment (시나리오 보정)

### 7.1 시나리오 분류 함수

In [74]:
def classify_market_scenario_quantile(sector_df, year, lookback=60, base_window=252):
    hist = sector_df[sector_df['ds'] < f"{year}-01-01"].sort_values('ds')
    if len(hist) < max(lookback, 120):
        return "Neutral"

    recent = hist.tail(lookback)
    base = hist.tail(base_window)

    recent_vol = recent['Avg_Volatility_20d'].mean()
    recent_rsi = recent['Avg_RSI_14_scaled'].mean()

    vol_low, vol_high = base['Avg_Volatility_20d'].quantile([0.30, 0.70])
    rsi_low, rsi_high = base['Avg_RSI_14_scaled'].quantile([0.30, 0.70])

    if (recent_rsi >= rsi_high) and (recent_vol <= vol_low):
        return "Bullish"
    elif (recent_rsi <= rsi_low) or (recent_vol >= vol_high):
        return "Bearish"
    else:
        return "Neutral"

### 7.2 시나리오 보정 적용

In [75]:
# 시나리오별 보정 계수
SCENARIO_ADJ = {'Bullish': 1.10, 'Neutral': 1.00, 'Bearish': 0.90}

# base 예측 컬럼 자동 선택
base_col = 'yhat_hybrid' if 'yhat_hybrid' in df_predictions.columns else 'yhat_hybrid_h'

# 기본값 세팅 (누락 방지)
df_predictions['scenario'] = 'Neutral'
df_predictions['yhat_final'] = df_predictions[base_col]

scenario_cache = {}

for year in df_predictions['test_year'].unique():
    for sector in df_predictions['Sector'].unique():

        key = (year, sector)
        if key not in scenario_cache:
            sector_data = sector_final[sector_final['Sector'] == sector]
            scenario_cache[key] = classify_market_scenario_quantile(sector_data, year)

        scenario = scenario_cache[key]

        mask = (df_predictions['test_year'] == year) & (df_predictions['Sector'] == sector)
        df_predictions.loc[mask, 'scenario'] = scenario
        df_predictions.loc[mask, 'yhat_final'] = df_predictions.loc[mask, base_col] * SCENARIO_ADJ[scenario]

print("시나리오 보정 완료")
print("\n시나리오 분포:")
print(df_predictions.groupby(['test_year', 'scenario']).size().unstack(fill_value=0))

시나리오 보정 완료

시나리오 분포:
scenario   Bearish  Neutral
test_year                  
2022          1004     1757
2023           500     2250
2024           756     2016
2025          2000      750


---

## Section 8. 평가 (Ranking 중심)

### 8.1 평가 지표 함수

In [78]:

def calculate_metrics(actual, predicted):
    """기본 예측 정확도 지표"""
    mae = mean_absolute_error(actual, predicted)
    rmse = np.sqrt(mean_squared_error(actual, predicted))
    mape = mean_absolute_percentage_error(actual, predicted) * 100
    return {'MAE': mae, 'RMSE': rmse, 'MAPE': mape}

In [76]:
def calculate_ranking_metrics(df, year, pred_col='yhat_final'):
    """
    랭킹 기반 평가 지표
    Returns: Spearman, Top-K Hit Rate, 근접 성공률 + sector_summary
    """
    year_data = df[df['test_year'] == year].copy()
    if len(year_data) == 0:
        return None

    year_data = year_data.sort_values(['Sector', 'ds'])

    # 섹터별 연간 누적수익률 계산
    def log_return(x):
        x = x.dropna()
        if len(x) < 2:
            return np.nan
        return np.exp(x.iloc[-1] - x.iloc[0]) - 1.0

    sector_summary = (
        year_data.groupby('Sector')
        .agg(
            Actual_Return=('y_actual', log_return),
            Predicted_Return=(pred_col, log_return),
        )
        .reset_index()
        .dropna(subset=['Actual_Return', 'Predicted_Return'])
    )

    # 순위
    sector_summary['Actual_Rank'] = sector_summary['Actual_Return'].rank(ascending=False, method='min')
    sector_summary['Predicted_Rank'] = sector_summary['Predicted_Return'].rank(ascending=False, method='min')
    sector_summary['Rank_Diff'] = (sector_summary['Actual_Rank'] - sector_summary['Predicted_Rank']).abs()

    # Spearman
    spearman_corr, spearman_pvalue = spearmanr(
        sector_summary['Predicted_Return'],
        sector_summary['Actual_Return']
    )

    # Top-K Hit Rate
    def top_k_hit_rate(summary_df, k):
        k = min(k, len(summary_df))
        pred_top_k = set(summary_df.nsmallest(k, 'Predicted_Rank')['Sector'])
        actual_top_k = set(summary_df.nsmallest(k, 'Actual_Rank')['Sector'])
        return len(pred_top_k & actual_top_k) / k * 100 if k > 0 else np.nan

    top3_hit = top_k_hit_rate(sector_summary, 3)
    top5_hit = top_k_hit_rate(sector_summary, 5)

    # 근접 성공률
    close_success_rate = (sector_summary['Rank_Diff'] <= 2).mean() * 100 if len(sector_summary) else np.nan

    return {
        'Spearman': spearman_corr,
        'Spearman_PValue': spearman_pvalue,
        'Top3_Hit_Rate': top3_hit,
        'Top5_Hit_Rate': top5_hit,
        'Close_Success_Rate': close_success_rate,
        'Sector_Summary': sector_summary
    }

### 8.2 년도별 평가 실행

In [81]:
print("=" * 90)
print("년도별 평가 결과 (Ranking 중심)")
print("=" * 90)

evaluation_results = []

for year in sorted(df_predictions['test_year'].unique()):
    print(f"\n{'='*90}")
    print(f"Year {year}")
    print(f"{'='*90}")
    
    # 랭킹 지표
    ranking_metrics = calculate_ranking_metrics(df_predictions, year, 'yhat_final')
    if ranking_metrics is None:
        print("  랭킹 평가 불가(데이터 부족)")
        continue
    
    # 기본 지표
    year_data = df_predictions[df_predictions['test_year'] == year].sort_values(['Sector','ds'])

    basic_metrics = calculate_metrics(
        np.exp(year_data['y_actual']),
        np.exp(year_data['yhat_final'])
    )
    
    print(f"\n[기본 지표 - 참고용]")
    print(f"  MAPE: {basic_metrics['MAPE']:.2f}%")
    print(f"  RMSE: {basic_metrics['RMSE']:.4f}")
    print(f"  MAE:  {basic_metrics['MAE']:.4f}")
    
    print(f"\n[핵심 랭킹 지표]")
    print(f"  Spearman Correlation: {ranking_metrics['Spearman']:.4f} (p={ranking_metrics['Spearman_PValue']:.4f})")
    print(f"  Top-3 Hit Rate: {ranking_metrics['Top3_Hit_Rate']:.1f}%")
    print(f"  Top-5 Hit Rate: {ranking_metrics['Top5_Hit_Rate']:.1f}%")
    print(f"  근접 성공률 (|diff|≤2): {ranking_metrics['Close_Success_Rate']:.1f}%")
    
    # 섹터 순위 테이블
    sector_summary = ranking_metrics['Sector_Summary'].dropna(
        subset=['Actual_Rank','Predicted_Rank']
    )
    print(f"\n[섹터별 순위]")
    print(f"{'Sector':25s} {'실제순위':>10s} {'예측순위':>10s} {'차이':>8s} {'실제수익률':>12s}")
    print("-" * 70)
    for _, row in sector_summary.iterrows():
        print(f"{row['Sector']:25s} {int(row['Actual_Rank']):>10d} {int(row['Predicted_Rank']):>10d} "
              f"{int(row['Rank_Diff']):>8d} {row['Actual_Return']*100:+11.2f}%")
    
    # 결과 저장
    evaluation_results.append({
        'Year': year,
        'MAPE': basic_metrics['MAPE'],
        'Spearman': ranking_metrics['Spearman'],
        'Top3_Hit': ranking_metrics['Top3_Hit_Rate'],
        'Top5_Hit': ranking_metrics['Top5_Hit_Rate'],
        'Close_Success': ranking_metrics['Close_Success_Rate']
    })

# 전체 평균
df_eval_summary = pd.DataFrame(evaluation_results)

print(f"\n{'='*90}")
print("전체 평균 성능")
print(f"{'='*90}")
print(f"  평균 MAPE: {df_eval_summary['MAPE'].mean():.2f}%")
print(f"  평균 Spearman: {df_eval_summary['Spearman'].dropna().mean():.4f}")
print(f"  평균 Top-3 Hit: {df_eval_summary['Top3_Hit'].mean():.1f}%")
print(f"  평균 Top-5 Hit: {df_eval_summary['Top5_Hit'].mean():.1f}%")
print(f"  평균 근접 성공률: {df_eval_summary['Close_Success'].mean():.1f}%")

년도별 평가 결과 (Ranking 중심)

Year 2022

[기본 지표 - 참고용]
  MAPE: 27.41%
  RMSE: 44.7103
  MAE:  32.5723

[핵심 랭킹 지표]
  Spearman Correlation: 0.3636 (p=0.2716)
  Top-3 Hit Rate: 33.3%
  Top-5 Hit Rate: 80.0%
  근접 성공률 (|diff|≤2): 54.5%

[섹터별 순위]
Sector                          실제순위       예측순위       차이        실제수익률
----------------------------------------------------------------------
Basic Materials                    6          9        3       -3.68%
Communication Services            10         11        1      -27.15%
Consumer Cyclical                  8         10        2      -18.45%
Consumer Defensive                 2          7        5       +4.46%
Energy                             1          4        3      +44.12%
Financial Services                 7          8        1       -6.03%
Healthcare                         4          3        1       +0.03%
Industrials                        5          5        0       -1.61%
Real Estate                        9          1        8      -2

---

## Section 9. 시각화 및 결과 저장

### 9.1 년도별 Spearman Correlation 추이

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

ax.plot(df_eval_summary['Year'], df_eval_summary['Spearman'], 
        marker='o', linewidth=2, markersize=8, color='steelblue')

ax.axhline(0, color='gray', linestyle='--', linewidth=1)
ax.axhline(0.4, color='orange', linestyle='--', linewidth=1, label='중간 상관 기준 (0.4)')
ax.axhline(0.7, color='green', linestyle='--', linewidth=1, label='강한 상관 기준 (0.7)')

ax.set_xlabel('Year', fontsize=12)
ax.set_ylabel('Spearman Correlation', fontsize=12)
ax.set_title('년도별 Spearman Rank Correlation 추이', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(OUTPUT_DIR / '01_spearman_yearly.png', dpi=300, bbox_inches='tight')
plt.show()

### 9.2 Top-K Hit Rate 비교

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(len(df_eval_summary))
width = 0.35

bars1 = ax.bar(x - width/2, df_eval_summary['Top3_Hit'], width, 
               label='Top-3 Hit Rate', color='coral', alpha=0.8)
bars2 = ax.bar(x + width/2, df_eval_summary['Top5_Hit'], width, 
               label='Top-5 Hit Rate', color='steelblue', alpha=0.8)

ax.set_xlabel('Year', fontsize=12)
ax.set_ylabel('Hit Rate (%)', fontsize=12)
ax.set_title('년도별 Top-K Hit Rate', fontsize=14, fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels(df_eval_summary['Year'])
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

# 값 표시
for bars in [bars1, bars2]:
    for bar in bars:
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2., height,
                f'{height:.0f}%', ha='center', va='bottom', fontsize=9)

plt.tight_layout()
plt.savefig(OUTPUT_DIR / '02_topk_hit_rate.png', dpi=300, bbox_inches='tight')
plt.show()

### 9.3 결과 저장

In [None]:
# 1. 전체 예측 결과
df_predictions.to_csv(OUTPUT_DIR / 'integrated_predictions_all.csv', index=False)
print(f"✅ 전체 예측 결과 저장: integrated_predictions_all.csv")

# 2. 평가 요약
df_eval_summary.to_csv(OUTPUT_DIR / 'evaluation_summary.csv', index=False)
print(f"✅ 평가 요약 저장: evaluation_summary.csv")

# 3. Tableau용 (최종 예측만)
df_tableau = df_predictions[['ds', 'Sector', 'test_year', 'y_actual', 'yhat_final', 'scenario']].copy()
df_tableau.to_csv(TABLEAU_DIR / 'integrated_prophet_forecast.csv', index=False)
print(f"✅ Tableau용 저장: integrated_prophet_forecast.csv")

print(f"\n{'='*70}")
print("통합 분석 완료")
print(f"{'='*70}")

---

## 최종 요약

### 아키텍처 특징

1. **Walk-Forward Validation**: 매년 재학습으로 현재성 반영
2. **Residual Ensemble**: Prophet(추세) + XGBoost(패턴 보정)
3. **Scenario Adjustment**: 시장 상황별 보정
4. **Ranking-Based Evaluation**: 실전 투자 관점의 평가

### 핵심 성과 지표

- **Spearman Correlation**: 순위 예측력
- **Top-K Hit Rate**: 상위 섹터 적중률
- **근접 성공률**: 실전 허용 오차 내 성공률

### 다음 단계

- **2026년 예측**: 최신 데이터로 2026년 섹터 순위 예측
- **K-Means Clustering**: 상위 섹터 내 종목 군집화
- **포트폴리오 최적화**: 선정 종목 기반 최적 배분