# 2-Trait vs 5-Trait Steering 比較分析

**実験日**: 2026-01-08

**目的**: R2とR4のみで最適化することで、より大きな重みに収束し、Steering効果が向上するか検証する

## 1. 背景と仮説

### 問題
- 5-Trait Steering（R1-R5）は**84.6%のpersonasで効果なし**
- 重みが小さい（L2ノルム < 4）ため、Steeringが効かない

### Trait別の効果分析
- **R4 (Trait 4)**: p < 0.001, 差=+3.049 → **highly significant**
- **R2 (Trait 2)**: p = 0.0012, 差=+2.344 → **significant**
- **R3 (Trait 3)**: p = 0.1944, 差=+1.082 → marginal
- **R5 (Trait 5)**: p = 0.4427, 差=+0.480 → **no effect**
- **R1 (Trait 1)**: p = 0.6156, 差=+0.177 → **no effect**

### 仮説
**不要なTrait（R1, R3, R5）を除去し、R2とR4のみで最適化すれば、より大きな重みに収束し、Steering効果が向上する**

### 期待される効果
1. **探索効率の向上**: 5次元 → 2次元で探索が容易に
2. **L2ノルムの増加**: 5-Traitの ~7.18 → 2-Traitで > 8.0を期待
3. **収束の改善**: 不要な次元がないため、より大きな重みに収束

In [1]:
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

# Matplotlib style
plt.style.use('default')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10

## 2. データ読み込み

In [2]:
# 2-Trait 最適化結果（ログから手動抽出）
two_trait_results = {
    "episode-184019_A": {"R2": 6.643, "R4": -1.315, "fitness": 0.1442},
    "episode-118328_B": {"R2": 4.636, "R4": 0.300, "fitness": 0.1377},
    "episode-239427_A": {"R2": 6.631, "R4": -0.624, "fitness": 0.1601},
    "episode-225888_A": {"R2": 0.537, "R4": 1.148, "fitness": 0.1534}
}

# 5-Trait 最適化結果
five_trait_results = {}
personas = ["episode-184019_A", "episode-118328_B", "episode-239427_A", "episode-225888_A"]

for persona in personas:
    result_file = f"../optimization_results_26personas/gpu0/{persona}_optimization.json"
    with open(result_file, 'r') as f:
        five_trait_results[persona] = json.load(f)

print("✅ Data loaded successfully")
print(f"Personas: {len(personas)}")

✅ Data loaded successfully
Personas: 4


## 3. L2ノルムの計算

In [3]:
# 結果をDataFrameに整理
comparison_data = []

for persona in personas:
    # 5-Trait
    weights_5t = five_trait_results[persona]["best_weights"]
    l2_5t = np.sqrt(sum(w**2 for w in weights_5t.values()))
    score_5t = five_trait_results[persona]["best_score"]
    
    # 2-Trait
    r2 = two_trait_results[persona]["R2"]
    r4 = two_trait_results[persona]["R4"]
    l2_2t = np.sqrt(r2**2 + r4**2)
    fitness_2t = two_trait_results[persona]["fitness"]
    
    comparison_data.append({
        "Persona": persona,
        "5T_L2": l2_5t,
        "2T_L2": l2_2t,
        "L2_Change_%": (l2_2t - l2_5t) / l2_5t * 100,
        "5T_Score": score_5t,
        "2T_Fitness": fitness_2t,
        "2T_R2": r2,
        "2T_R4": r4
    })

df_comparison = pd.DataFrame(comparison_data)
df_comparison

Unnamed: 0,Persona,5T_L2,2T_L2,L2_Change_%,5T_Score,2T_Fitness,2T_R2,2T_R4
0,episode-184019_A,7.670985,6.771903,-11.720548,5.0,0.1442,6.643,-1.315
1,episode-118328_B,7.999642,4.645697,-41.926196,5.0,0.1377,4.636,0.3
2,episode-239427_A,5.073708,6.660296,31.270768,5.0,0.1601,6.631,-0.624
3,episode-225888_A,7.970762,1.267388,-84.099534,5.0,0.1534,0.537,1.148


In [4]:
# サマリー統計
print("=" * 70)
print("L2ノルム比較")
print("=" * 70)
print(f"5-Trait平均 L2ノルム: {df_comparison['5T_L2'].mean():.3f}")
print(f"2-Trait平均 L2ノルム: {df_comparison['2T_L2'].mean():.3f}")
print(f"変化率: {df_comparison['L2_Change_%'].mean():.1f}%")
print()
print(f"目標: > 8.0")
print(f"達成: {'✅' if df_comparison['2T_L2'].mean() > 8.0 else '❌'}")
print("=" * 70)

L2ノルム比較
5-Trait平均 L2ノルム: 7.179
2-Trait平均 L2ノルム: 4.836
変化率: -26.6%

目標: > 8.0
達成: ❌


## 4. 可視化: L2ノルム比較

In [5]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: Bar chart
x = np.arange(len(personas))
width = 0.35

axes[0].bar(x - width/2, df_comparison['5T_L2'], width, label='5-Trait', alpha=0.8)
axes[0].bar(x + width/2, df_comparison['2T_L2'], width, label='2-Trait', alpha=0.8)
axes[0].axhline(y=8.0, color='r', linestyle='--', label='Target (8.0)')
axes[0].set_xlabel('Persona')
axes[0].set_ylabel('L2 Norm')
axes[0].set_title('L2 Norm: 5-Trait vs 2-Trait')
axes[0].set_xticks(x)
axes[0].set_xticklabels([p.split('-')[1].split('_')[0] for p in personas], rotation=45)
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Right: Change percentage
colors = ['green' if c > 0 else 'red' for c in df_comparison['L2_Change_%']]
axes[1].barh(df_comparison['Persona'], df_comparison['L2_Change_%'], color=colors, alpha=0.7)
axes[1].axvline(x=0, color='black', linestyle='-', linewidth=0.8)
axes[1].set_xlabel('Change (%)')
axes[1].set_title('L2 Norm Change: 2-Trait vs 5-Trait')
axes[1].set_yticklabels([p.split('-')[1].split('_')[0] for p in personas])
axes[1].grid(True, alpha=0.3, axis='x')

plt.tight_layout()
plt.show()

print(f"\n平均変化率: {df_comparison['L2_Change_%'].mean():.1f}%")
print(f"改善したペルソナ数: {sum(df_comparison['L2_Change_%'] > 0)}/4")


平均変化率: -26.6%
改善したペルソナ数: 1/4


  axes[1].set_yticklabels([p.split('-')[1].split('_')[0] for p in personas])


## 5. 5-Traitの重み内訳分析

In [6]:
# 5-Traitの各重みを抽出
weights_5t_data = []

for persona in personas:
    weights = five_trait_results[persona]["best_weights"]
    weights_5t_data.append({
        "Persona": persona,
        "R1": weights["R1"],
        "R2": weights["R2"],
        "R3": weights["R3"],
        "R4": weights["R4"],
        "R5": weights["R5"]
    })

df_5t_weights = pd.DataFrame(weights_5t_data)
df_5t_weights

Unnamed: 0,Persona,R1,R2,R3,R4,R5
0,episode-184019_A,-1.51595,-5.568616,-1.639567,-4.648191,-1.114699
1,episode-118328_B,-0.993613,-5.305302,-2.877873,-5.071375,0.927243
2,episode-239427_A,-2.039684,0.380808,0.903378,-4.355167,-1.285929
3,episode-225888_A,-0.618463,-3.140009,4.53817,-4.161748,-3.921193


In [7]:
# Heatmap: 5-Traitの重み (matplotlib版)
plt.figure(figsize=(10, 6))
weights_matrix = df_5t_weights.set_index('Persona')[["R1", "R2", "R3", "R4", "R5"]]

# Create heatmap using imshow
im = plt.imshow(weights_matrix.values, cmap='RdBu_r', aspect='auto', 
                vmin=-6, vmax=6)

# Add colorbar
cbar = plt.colorbar(im, label='Weight')

# Set ticks and labels
plt.xticks(range(len(weights_matrix.columns)), weights_matrix.columns)
plt.yticks(range(len(weights_matrix.index)), 
           [p.split('-')[1].split('_')[0] for p in weights_matrix.index])

# Add text annotations
for i in range(len(weights_matrix.index)):
    for j in range(len(weights_matrix.columns)):
        text = plt.text(j, i, f'{weights_matrix.values[i, j]:.2f}',
                       ha="center", va="center", color="black", fontsize=9)

plt.title('5-Trait Optimal Weights Heatmap')
plt.xlabel('Trait')
plt.ylabel('Persona')
plt.tight_layout()
plt.show()

## 6. 重要な発見: episode-225888_A の詳細分析

In [8]:
# episode-225888_Aの重み比較
persona = "episode-225888_A"

print("=" * 70)
print(f"詳細分析: {persona}")
print("=" * 70)

# 5-Trait
weights_5t = five_trait_results[persona]["best_weights"]
l2_5t = np.sqrt(sum(w**2 for w in weights_5t.values()))

print("\n5-Trait最適解:")
for trait, weight in weights_5t.items():
    print(f"  {trait}: {weight:>6.2f} (|w|={abs(weight):.2f})")
print(f"  L2ノルム: {l2_5t:.2f}")

# 2-Trait
r2 = two_trait_results[persona]["R2"]
r4 = two_trait_results[persona]["R4"]
l2_2t = np.sqrt(r2**2 + r4**2)

print("\n2-Trait最適解:")
print(f"  R2: {r2:>6.2f}")
print(f"  R4: {r4:>6.2f}")
print(f"  L2ノルム: {l2_2t:.2f}")

print("\n変化:")
print(f"  L2ノルム: {l2_5t:.2f} → {l2_2t:.2f} ({(l2_2t-l2_5t)/l2_5t*100:+.1f}%)")
print("\n⚠️ このペルソナは実際にR3とR5が最重要だった！")
print(f"   R3の重み: {abs(weights_5t['R3']):.2f}")
print(f"   R5の重み: {abs(weights_5t['R5']):.2f}")
print("=" * 70)

詳細分析: episode-225888_A

5-Trait最適解:
  R1:  -0.62 (|w|=0.62)
  R2:  -3.14 (|w|=3.14)
  R3:   4.54 (|w|=4.54)
  R4:  -4.16 (|w|=4.16)
  R5:  -3.92 (|w|=3.92)
  L2ノルム: 7.97

2-Trait最適解:
  R2:   0.54
  R4:   1.15
  L2ノルム: 1.27

変化:
  L2ノルム: 7.97 → 1.27 (-84.1%)

⚠️ このペルソナは実際にR3とR5が最重要だった！
   R3の重み: 4.54
   R5の重み: 3.92


## 7. 各Traitの重要度可視化

In [9]:
# 各ペルソナにおけるTraitの絶対重み
fig, ax = plt.subplots(figsize=(12, 6))

traits = ["R1", "R2", "R3", "R4", "R5"]
x = np.arange(len(personas))
width = 0.15

for i, trait in enumerate(traits):
    weights_abs = [abs(five_trait_results[p]["best_weights"][trait]) for p in personas]
    ax.bar(x + i*width, weights_abs, width, label=trait, alpha=0.8)

ax.set_xlabel('Persona')
ax.set_ylabel('Absolute Weight')
ax.set_title('Absolute Trait Weights per Persona (5-Trait Optimization)')
ax.set_xticks(x + width * 2)
ax.set_xticklabels([p.split('-')[1].split('_')[0] for p in personas])
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\n観察:")
print("- episode-225888_A: R3とR5が特に大きい")
print("- 他のペルソナ: R2とR4が支配的")
print("- ペルソナごとに重要なTraitが異なる！")


観察:
- episode-225888_A: R3とR5が特に大きい
- 他のペルソナ: R2とR4が支配的
- ペルソナごとに重要なTraitが異なる！


## 8. 結論

In [10]:
print("=" * 70)
print("実験結果サマリー")
print("=" * 70)

print("\n仮説: 不要なTrait（R1, R3, R5）を除去すれば、L2ノルムが増加する")
print("\n結果: ❌ 仮説は誤り")

print("\nL2ノルム変化:")
print(f"  5-Trait平均: {df_comparison['5T_L2'].mean():.2f}")
print(f"  2-Trait平均: {df_comparison['2T_L2'].mean():.2f}")
print(f"  変化: {df_comparison['L2_Change_%'].mean():.1f}% (目標: +11%以上)")

print("\n個別結果:")
for _, row in df_comparison.iterrows():
    status = "✅" if row['L2_Change_%'] > 0 else "❌"
    print(f"  {status} {row['Persona']}: {row['L2_Change_%']:+.1f}%")

print("\n重要な発見:")
print("  1. 統計的有意性 ≠ 個別ペルソナの重要性")
print("  2. ペルソナごとに重要Traitが異なる")
print("  3. episode-225888_Aは実際にR3とR5が最重要だった")

print("\n推奨事項:")
print("  ❌ 全ペルソナで2-Trait適用")
print("  ✅ ペルソナごとに重要Traitを選択的に使用")
print("  ✅ Adaptive Trait Selection の検討")

print("=" * 70)

実験結果サマリー

仮説: 不要なTrait（R1, R3, R5）を除去すれば、L2ノルムが増加する

結果: ❌ 仮説は誤り

L2ノルム変化:
  5-Trait平均: 7.18
  2-Trait平均: 4.84
  変化: -26.6% (目標: +11%以上)

個別結果:
  ❌ episode-184019_A: -11.7%
  ❌ episode-118328_B: -41.9%
  ✅ episode-239427_A: +31.3%
  ❌ episode-225888_A: -84.1%

重要な発見:
  1. 統計的有意性 ≠ 個別ペルソナの重要性
  2. ペルソナごとに重要Traitが異なる
  3. episode-225888_Aは実際にR3とR5が最重要だった

推奨事項:
  ❌ 全ペルソナで2-Trait適用
  ✅ ペルソナごとに重要Traitを選択的に使用
  ✅ Adaptive Trait Selection の検討


## 9. 次のステップ

### Option A: Adaptive Trait Selection
各ペルソナで最も効果的なTrait組み合わせを自動選択

### Option B: Persona Clustering
似た特性を持つペルソナをクラスタリングし、クラスタごとに最適Trait組を決定

### Option C: Multi-objective Optimization
L2ノルムとFitnessの両方を最大化するPareto最適解を探索