# Core10_10 — Experiment Results & Operational Metrics

## 목적
Counterfactual experiment 결과를
**운영 지표 3종으로 정량 요약**한다.

본 노트북은 모델 성능 비교가 아니라,
운영 안정성 / 재설계 압박을 수치로 끝내는 것이 목적이다.

## 고정 운영 지표 (3)
1. Re-design request time (첫 재설계 요구 시점)
2. SoMS degradation rate (Δ SoMS / step)
3. Re-design request frequency (요구 빈도)

## 입력
- core10_09_experiment_plan_table.csv
- core10_07_operation_rollout_log.csv (experiment_id 포함)

## 산출물
- core10_10_metrics_summary.csv
- 핵심 요약 표 2개

In [13]:
from pathlib import Path
import numpy as np
import pandas as pd

ART_CORE10 = Path("../artifact/core10")

PLAN_PATH   = ART_CORE10 / "core10_09_experiment_plan_table.csv"
ROLLOUT_PATH = ART_CORE10 / "core10_07_operation_rollout_log.csv"

assert PLAN_PATH.exists(), "Missing experiment plan table"
assert ROLLOUT_PATH.exists(), "Missing rollout log"

plan = pd.read_csv(PLAN_PATH)
rollout = pd.read_csv(ROLLOUT_PATH)

print("experiments:", len(plan))
print("rollout rows:", len(rollout))

plan.head(3), rollout.head(3)

experiments: 9
rollout rows: 60


(  experiment_id       policy_key policy_type          policy_description  \
 0       EXP_000  BASELINE_RANDOM      random  Random allocation baseline   
 1       EXP_001  BASELINE_RANDOM      random  Random allocation baseline   
 2       EXP_002  BASELINE_RANDOM      random  Random allocation baseline   
 
   scenario_key                                    scenario_config        seed  \
 0         COLD  {"drift_enabled": false, "drift_per_step": 0.0...  1458256497   
 1          HOT  {"drift_enabled": true, "drift_per_step": 0.03...  2865804044   
 2  OSCILLATION  {"drift_enabled": true, "drift_per_step": 0.01...  1780719939   
 
    n_steps  ttl_steps  cooldown_steps  
 0       60          8               3  
 1       60          8               3  
 2       60          8               3  ,
            run_id     case_id  step selected_antibody_id selected_signature  \
 0  core10_sim_001  B_GOVERNED     0            GDPa1-060       IgG1|Kappa|0   
 1  core10_sim_001  B_GOVERNED     

## Metric Contracts (정의 고정)

### 1. Re-design request time
- 정의: hazard 기반으로 “교체 필요(want_switch=1)”가 **처음 발생한 step**
- 없으면 T_STEPS로 기록

### 2. SoMS degradation rate
- 정의: SoMS ~ step 선형 회귀 기울기
- 단위: Δ SoMS / step

### 3. Re-design request frequency
- 정의: want_switch == 1 인 step 비율
- 단위: fraction ∈ [0,1]

In [14]:
def redesign_request_time(df):
    """
    First step where want_switch == 1
    """
    hits = df[df["want_switch"] == 1]
    if hits.empty:
        return int(df["step"].max())
    return int(hits["step"].min())


def soms_degradation_rate(df):
    """
    Linear slope of SoMS over time
    """
    if df["step"].nunique() < 2:
        return 0.0
    slope = np.polyfit(df["step"], df["SoMS"], 1)[0]
    return float(slope)


def redesign_request_frequency(df):
    """
    Fraction of steps where want_switch == 1
    """
    return float(df["want_switch"].mean()) # Helper Functions

In [15]:
rollout2 = rollout.copy()

# 0) experiment_id 확보
if "experiment_id" not in rollout2.columns:
    # 가능한 키 조합 우선순위
    candidate_keys = []
    if "run_id" in rollout2.columns: candidate_keys.append("run_id")
    if "case_id" in rollout2.columns: candidate_keys.append("case_id")
    if "scenario_key" in rollout2.columns: candidate_keys.append("scenario_key")
    if "policy_key" in rollout2.columns: candidate_keys.append("policy_key")

    # 없으면 최소한 case_id라도 만들기
    if not candidate_keys:
        rollout2["case_id"] = "CASE_UNKNOWN"
        candidate_keys = ["case_id"]

    # experiment_id = 키들을 '::'로 join
    rollout2["experiment_id"] = rollout2[candidate_keys].astype(str).agg("::".join, axis=1)

# 1) policy_key / scenario_key 확보
if "policy_key" not in rollout2.columns:
    rollout2["policy_key"] = rollout2["case_id"].astype(str) if "case_id" in rollout2.columns else "POLICY_UNKNOWN"

if "scenario_key" not in rollout2.columns:
    # drift on/off 같은 게 있으면 그걸로도 가능하지만, 없으면 case_id로 대체
    rollout2["scenario_key"] = rollout2["case_id"].astype(str) if "case_id" in rollout2.columns else "SCENARIO_UNKNOWN"

# 2) switched 컬럼명 방어 (core10_07에서 allocation_changed만 있을 수도 있음)
if "switched" not in rollout2.columns and "allocation_changed" in rollout2.columns:
    rollout2["switched"] = rollout2["allocation_changed"]

# 3) want_switch 방어 (없으면 hazard_rule 기준으로 재정의)
if "want_switch" not in rollout2.columns:
    if "hazard_rule" not in rollout2.columns and "proxy_survivability_score" in rollout2.columns:
        rollout2["hazard_rule"] = (1.0 - pd.to_numeric(rollout2["proxy_survivability_score"], errors="coerce")).fillna(0).clip(0, 1)
    REPLACE_HAZARD_TH = 0.55
    rollout2["want_switch"] = (pd.to_numeric(rollout2["hazard_rule"], errors="coerce").fillna(0.0) >= REPLACE_HAZARD_TH).astype(int)

# 4) SoMS 방어
if "SoMS" not in rollout2.columns:
    raise KeyError("rollout에 SoMS 컬럼이 없습니다. core10_07 로그 스키마를 확인하세요.")

rows = []

for exp_id, g in rollout2.groupby("experiment_id"):
    g = g.sort_values("step")

    rows.append({
        "experiment_id": exp_id,
        "policy_key": str(g["policy_key"].iloc[0]),
        "scenario_key": str(g["scenario_key"].iloc[0]),

        # Metrics
        "redesign_request_time": redesign_request_time(g),
        "soms_degradation_rate": soms_degradation_rate(g),
        "redesign_request_frequency": redesign_request_frequency(g),

        # Supporting
        "n_steps": int(g["step"].nunique()),
        "final_SoMS": float(g["SoMS"].iloc[-1]),
        "total_switches": int(pd.to_numeric(g["switched"], errors="coerce").fillna(0).sum()),
    })

metrics = pd.DataFrame(rows)

print("✅ derived experiment units:", metrics["experiment_id"].nunique())
metrics

✅ derived experiment units: 1


Unnamed: 0,experiment_id,policy_key,scenario_key,redesign_request_time,soms_degradation_rate,redesign_request_frequency,n_steps,final_SoMS,total_switches
0,core10_sim_001::B_GOVERNED,B_GOVERNED,B_GOVERNED,36,0.364133,0.4,60,21.903764,3


In [16]:
rows = []

for exp_id, g in rollout2.groupby("experiment_id"):
    g = g.sort_values("step")

    rows.append({
        "experiment_id": exp_id,
        "policy_key": g["policy_key"].iloc[0] if "policy_key" in g.columns else "NA",
        "scenario_key": g["scenario_key"].iloc[0] if "scenario_key" in g.columns else "NA",

        # Metrics
        "redesign_request_time": redesign_request_time(g),
        "soms_degradation_rate": soms_degradation_rate(g),
        "redesign_request_frequency": redesign_request_frequency(g),

        # Supporting
        "n_steps": int(g["step"].nunique()),
        "final_SoMS": float(g["SoMS"].iloc[-1]),
        "total_switches": int(g["switched"].sum()),
    })

metrics = pd.DataFrame(rows)

## Table 1 — Policy × Scenario Operational Metrics

In [17]:
table_policy_scenario = (
    metrics
    .groupby(["policy_key", "scenario_key"], as_index=False)
    .agg(
        redesign_request_time_mean=("redesign_request_time", "mean"),
        soms_degradation_rate_mean=("soms_degradation_rate", "mean"),
        redesign_request_frequency_mean=("redesign_request_frequency", "mean"),
        total_switches_mean=("total_switches", "mean"),
    )
    .sort_values(["scenario_key", "policy_key"])
)

table_policy_scenario

Unnamed: 0,policy_key,scenario_key,redesign_request_time_mean,soms_degradation_rate_mean,redesign_request_frequency_mean,total_switches_mean
0,B_GOVERNED,B_GOVERNED,36.0,0.364133,0.4,3.0


## Table 2 — Policy Ranking (Scenario-agnostic)

환경을 섞어서 봤을 때,
어떤 정책이 **전반적으로 덜 재설계를 요구하는가**를 보여준다.

In [18]:
table_policy_rank = (
    metrics
    .groupby("policy_key", as_index=False)
    .agg(
        redesign_request_time_mean=("redesign_request_time", "mean"),
        soms_degradation_rate_mean=("soms_degradation_rate", "mean"),
        redesign_request_frequency_mean=("redesign_request_frequency", "mean"),
    )
    .sort_values([
        "redesign_request_time_mean",
        "soms_degradation_rate_mean",
    ])
)

table_policy_rank

Unnamed: 0,policy_key,redesign_request_time_mean,soms_degradation_rate_mean,redesign_request_frequency_mean
0,B_GOVERNED,36.0,0.364133,0.4


In [19]:
OUT_PATH = ART_CORE10 / "core10_10_metrics_summary.csv"
metrics.to_csv(OUT_PATH, index=False)

print("✅ Exported metrics summary:")
print(OUT_PATH.resolve())
print("rows:", len(metrics))

✅ Exported metrics summary:
/Users/mac/Desktop/De/Developability_Data/core/artifact/core10/core10_10_metrics_summary.csv
rows: 1


## Review Usage (한 장 요약 가능)

- Table 1:
  → “Hot / Oscillation 환경에서도 Proposed 정책이
     재설계 시점이 늦고, 빈도가 낮다”

- Table 2:
  → “금지된 score-based baseline은
     SoMS 악화 속도가 가장 크다”

이 두 표만으로:
- 모델 성능 주장 ❌
- 운영 정책 우월성 주장 ✅