# 83. ITQ Confidence-Based Multi-Probe

## 目的
- ITQの内部射影値（`Z = V @ rotation_matrix`）の絶対値を「確信度」として利用
- 確信度の低いビットを優先的にフリップするmulti-probeの効果を検証
- confidence順 vs ランダム順の比較（核心比較）

## 核心的洞察
```python
Z = V @ self.rotation_matrix   # (n, 128) 実数値
B = (Z > 0).astype(np.uint8)   # 符号で量子化
# |Z[:, i]| が小さい = ビットiの確信度が低い
# → このビットをフリップしたprobeが最も有効
```

## 0. セットアップ

In [1]:
import sys
import numpy as np
import time
from pathlib import Path
from sklearn.metrics.pairwise import cosine_similarity
import warnings
warnings.filterwarnings('ignore')

sys.path.insert(0, '../src')
from itq_lsh import ITQLSH, hamming_distance, hamming_distance_batch
from dflsh import build_band_index, band_filter, confidence_multiprobe

DATA_DIR = Path('../data')
np.random.seed(42)

N_QUERIES = 100
TOP_K = 10
print(f'Configuration: N_QUERIES={N_QUERIES}, TOP_K={TOP_K}')

Configuration: N_QUERIES=100, TOP_K=10


## 1. データロード

In [2]:
# 英語
en_emb = np.load(DATA_DIR / '10k_e5_base_en_embeddings.npy')
en_hashes = np.load(DATA_DIR / '10k_e5_base_en_hashes_128bits.npy')
en_pivot_dist = np.load(DATA_DIR / '10k_e5_base_en_pivot_distances.npy')
en_pivots = np.load(DATA_DIR / 'pivots_8_e5_base_en.npy')

# 日本語
ja_emb = np.load(DATA_DIR / '10k_e5_base_ja_embeddings.npy')
ja_hashes = np.load(DATA_DIR / '10k_e5_base_ja_hashes_128bits.npy')
ja_pivot_dist = np.load(DATA_DIR / '10k_e5_base_ja_pivot_distances.npy')
ja_pivots = np.load(DATA_DIR / 'pivots_8_e5_base_ja.npy')

# MiniLM
minilm_emb = np.load(DATA_DIR / '10k_minilm_embeddings.npy')
minilm_hashes = np.load(DATA_DIR / '10k_minilm_hashes_128bits.npy')
minilm_pivot_dist = np.load(DATA_DIR / '10k_minilm_pivot_distances.npy')
minilm_pivots = np.load(DATA_DIR / 'pivots_8_minilm.npy')

# ITQモデル
itq = ITQLSH.load(str(DATA_DIR / 'itq_e5_base_128bits.pkl'))
itq_minilm = ITQLSH.load(str(DATA_DIR / 'itq_minilm_128bits.pkl'))

print(f'English: {en_emb.shape}, Japanese: {ja_emb.shape}, MiniLM: {minilm_emb.shape}')

English: (10000, 768), Japanese: (10000, 768), MiniLM: (10000, 384)


## 2. 確信度分布の分析

In [3]:
# ITQ射影値の確信度分布を調査
rng = np.random.default_rng(42)
sample_indices = rng.choice(len(en_emb), 1000, replace=False)

# 英語
_, en_projections = itq.transform_with_confidence(en_emb[sample_indices])
en_confidence = np.abs(en_projections)

# 日本語
_, ja_projections = itq.transform_with_confidence(ja_emb[sample_indices])
ja_confidence = np.abs(ja_projections)

# MiniLM
_, minilm_projections = itq_minilm.transform_with_confidence(minilm_emb[sample_indices])
minilm_confidence = np.abs(minilm_projections)

print('=== Confidence Distribution ===')
print(f'\n--- English E5-base ---')
print(f'Mean confidence: {en_confidence.mean():.4f}')
print(f'Median confidence: {np.median(en_confidence):.4f}')
print(f'Min confidence: {en_confidence.min():.6f}')
print(f'Percentiles: 10%={np.percentile(en_confidence, 10):.4f}, '
      f'25%={np.percentile(en_confidence, 25):.4f}, '
      f'50%={np.percentile(en_confidence, 50):.4f}, '
      f'75%={np.percentile(en_confidence, 75):.4f}, '
      f'90%={np.percentile(en_confidence, 90):.4f}')

# 低確信度ビットの割合
for threshold in [0.1, 0.2, 0.5]:
    frac = (en_confidence < threshold).mean()
    print(f'  Bits with confidence < {threshold}: {frac*100:.1f}%')

print(f'\n--- Japanese E5-base ---')
print(f'Mean confidence: {ja_confidence.mean():.4f}')
print(f'Median confidence: {np.median(ja_confidence):.4f}')
for threshold in [0.1, 0.2, 0.5]:
    frac = (ja_confidence < threshold).mean()
    print(f'  Bits with confidence < {threshold}: {frac*100:.1f}%')

print(f'\n--- MiniLM ---')
print(f'Mean confidence: {minilm_confidence.mean():.4f}')
print(f'Median confidence: {np.median(minilm_confidence):.4f}')
for threshold in [0.1, 0.2, 0.5]:
    frac = (minilm_confidence < threshold).mean()
    print(f'  Bits with confidence < {threshold}: {frac*100:.1f}%')

=== Confidence Distribution ===

--- English E5-base ---
Mean confidence: 0.0280
Median confidence: 0.0237
Min confidence: 0.000000
Percentiles: 10%=0.0043, 25%=0.0110, 50%=0.0237, 75%=0.0404, 90%=0.0577
  Bits with confidence < 0.1: 99.6%
  Bits with confidence < 0.2: 100.0%
  Bits with confidence < 0.5: 100.0%

--- Japanese E5-base ---
Mean confidence: 0.0288
Median confidence: 0.0255
  Bits with confidence < 0.1: 99.8%
  Bits with confidence < 0.2: 100.0%
  Bits with confidence < 0.5: 100.0%

--- MiniLM ---
Mean confidence: 0.0647
Median confidence: 0.0572
  Bits with confidence < 0.1: 79.4%
  Bits with confidence < 0.2: 99.4%
  Bits with confidence < 0.5: 100.0%


## 3. 評価関数定義

In [4]:
def get_ground_truth(embeddings, qi, top_k=10):
    """コサイン類似度でGround Truthを取得"""
    cos_sims = cosine_similarity(embeddings[qi:qi+1], embeddings)[0]
    cos_sims[qi] = -1
    return set(np.argsort(cos_sims)[-top_k:])


def evaluate_multiprobe(
    embeddings, hashes, itq_model,
    label, band_width=16, max_probes_list=[0, 4, 8, 16, 32],
    order='confidence', candidate_limit=500
):
    """Multi-probe評価"""
    rng = np.random.default_rng(42)
    query_indices = rng.choice(len(embeddings), N_QUERIES, replace=False)
    
    bi = build_band_index(hashes, band_width)
    
    # 全クエリの射影値を事前計算
    _, all_projections = itq_model.transform_with_confidence(embeddings)
    
    results = []
    for max_probes in max_probes_list:
        filter_recalls = []
        final_recalls = []
        cand_counts = []
        times = []
        
        for qi in query_indices:
            gt = get_ground_truth(embeddings, qi, TOP_K)
            query_hash = hashes[qi]
            query_proj = all_projections[qi]
            
            start = time.time()
            
            # Confidence multi-probe
            cands = confidence_multiprobe(
                query_hash, query_proj, bi, band_width,
                max_probes=max_probes, order=order
            )
            cands = cands[cands != qi]
            
            cand_counts.append(len(cands))
            filter_recalls.append(len(gt & set(cands)) / TOP_K)
            
            # Hamming sort + Cosine rerank
            if len(cands) > 0:
                h_dists = hamming_distance_batch(query_hash, hashes[cands])
                top_idx = np.argsort(h_dists)[:candidate_limit]
                final_cands = cands[top_idx]
                
                cand_cos = cosine_similarity(embeddings[qi:qi+1], embeddings[final_cands])[0]
                top_in_cand = final_cands[np.argsort(cand_cos)[-TOP_K:]]
                final_recalls.append(len(gt & set(top_in_cand)) / TOP_K)
            else:
                final_recalls.append(0.0)
            
            times.append(time.time() - start)
        
        results.append({
            'label': label,
            'order': order,
            'band_width': band_width,
            'max_probes': max_probes,
            'candidates': np.mean(cand_counts),
            'candidates_std': np.std(cand_counts),
            'filter_recall': np.mean(filter_recalls),
            'recall_at_k': np.mean(final_recalls),
            'time_ms': np.mean(times) * 1000,
        })
    
    return results


def evaluate_multiprobe_with_pivot(
    embeddings, hashes, pivot_distances, pivots, itq_model,
    label, band_width=16, max_probes_list=[0, 4, 8, 16],
    pivot_threshold=20, order='confidence', candidate_limit=500
):
    """Multi-probe + Pivot統合評価"""
    rng = np.random.default_rng(42)
    query_indices = rng.choice(len(embeddings), N_QUERIES, replace=False)
    
    bi = build_band_index(hashes, band_width)
    _, all_projections = itq_model.transform_with_confidence(embeddings)
    
    results = []
    for max_probes in max_probes_list:
        filter_recalls = []
        final_recalls = []
        band_counts = []
        pivot_counts = []
        times = []
        
        for qi in query_indices:
            gt = get_ground_truth(embeddings, qi, TOP_K)
            query_hash = hashes[qi]
            query_proj = all_projections[qi]
            
            start = time.time()
            
            # Stage 1: Confidence multi-probe
            band_cands = confidence_multiprobe(
                query_hash, query_proj, bi, band_width,
                max_probes=max_probes, order=order
            )
            band_cands = band_cands[band_cands != qi]
            band_counts.append(len(band_cands))
            
            # Stage 2: Pivot filter
            if len(band_cands) > 0:
                query_pivot_dists = np.array([
                    hamming_distance(query_hash, p) for p in pivots
                ])
                cand_pivot_dists = pivot_distances[band_cands]
                mask = np.ones(len(band_cands), dtype=bool)
                for i in range(len(pivots)):
                    lower = query_pivot_dists[i] - pivot_threshold
                    upper = query_pivot_dists[i] + pivot_threshold
                    mask &= (cand_pivot_dists[:, i] >= lower) & (cand_pivot_dists[:, i] <= upper)
                pivot_cands = band_cands[mask]
            else:
                pivot_cands = band_cands
            
            pivot_counts.append(len(pivot_cands))
            filter_recalls.append(len(gt & set(pivot_cands)) / TOP_K)
            
            # Stage 3: Hamming sort + Cosine rerank
            if len(pivot_cands) > 0:
                h_dists = hamming_distance_batch(query_hash, hashes[pivot_cands])
                top_idx = np.argsort(h_dists)[:candidate_limit]
                final_cands = pivot_cands[top_idx]
                
                cand_cos = cosine_similarity(embeddings[qi:qi+1], embeddings[final_cands])[0]
                top_in_cand = final_cands[np.argsort(cand_cos)[-TOP_K:]]
                final_recalls.append(len(gt & set(top_in_cand)) / TOP_K)
            else:
                final_recalls.append(0.0)
            
            times.append(time.time() - start)
        
        results.append({
            'label': label,
            'order': order,
            'band_width': band_width,
            'max_probes': max_probes,
            'pivot_threshold': pivot_threshold,
            'band_candidates': np.mean(band_counts),
            'pivot_candidates': np.mean(pivot_counts),
            'filter_recall': np.mean(filter_recalls),
            'recall_at_k': np.mean(final_recalls),
            'time_ms': np.mean(times) * 1000,
        })
    
    return results

## 4. Confidence順 vs ランダム順の比較（英語）

In [5]:
print('='*80)
print('Confidence vs Random Probe Order - English E5-base')
print('='*80)

for bw in [8, 16]:
    print(f'\n--- band_width={bw} ({128//bw} bands) ---')
    
    # Confidence順
    conf_results = evaluate_multiprobe(
        en_emb, en_hashes, itq,
        label=f'EN bw={bw}', band_width=bw,
        max_probes_list=[0, 4, 8, 16, 32],
        order='confidence'
    )
    
    # ランダム順
    rand_results = evaluate_multiprobe(
        en_emb, en_hashes, itq,
        label=f'EN bw={bw}', band_width=bw,
        max_probes_list=[0, 4, 8, 16, 32],
        order='random'
    )
    
    print(f'\n{"Probes":>6} | {"--- Confidence ---":^36} | {"--- Random ---":^36}')
    print(f'{"":>6} | {"Cands":>8} {"Reduction":>10} {"FiltRcl":>8} {"R@10":>8} | '
          f'{"Cands":>8} {"Reduction":>10} {"FiltRcl":>8} {"R@10":>8}')
    print('-' * 85)
    for c, r in zip(conf_results, rand_results):
        c_red = (1 - c['candidates'] / 10000) * 100
        r_red = (1 - r['candidates'] / 10000) * 100
        print(f'{c["max_probes"]:>6} | '
              f'{c["candidates"]:>7.0f} {c_red:>9.1f}% {c["filter_recall"]*100:>7.1f}% {c["recall_at_k"]*100:>7.1f}% | '
              f'{r["candidates"]:>7.0f} {r_red:>9.1f}% {r["filter_recall"]*100:>7.1f}% {r["recall_at_k"]*100:>7.1f}%')

Confidence vs Random Probe Order - English E5-base

--- band_width=8 (16 bands) ---



Probes |          --- Confidence ---          |            --- Random ---           
       |    Cands  Reduction  FiltRcl     R@10 |    Cands  Reduction  FiltRcl     R@10
-------------------------------------------------------------------------------------
     0 |    2181      78.2%    68.9%    66.0% |    2181      78.2%    68.9%    66.0%
     4 |    2446      75.5%    71.3%    68.1% |    2607      73.9%    75.0%    71.5%
     8 |    2793      72.1%    75.6%    71.4% |    2975      70.3%    79.0%    73.9%
    16 |    3757      62.4%    85.1%    78.0% |    3757      62.4%    85.1%    78.0%
    32 |    3757      62.4%    85.1%    78.0% |    3757      62.4%    85.1%    78.0%

--- band_width=16 (8 bands) ---



Probes |          --- Confidence ---          |            --- Random ---           
       |    Cands  Reduction  FiltRcl     R@10 |    Cands  Reduction  FiltRcl     R@10
-------------------------------------------------------------------------------------
     0 |      33      99.7%     7.9%     7.8% |      33      99.7%     7.9%     7.8%
     4 |      40      99.6%     9.0%     8.9% |      48      99.5%    10.5%    10.4%
     8 |      56      99.4%    11.7%    11.6% |      56      99.4%    11.7%    11.6%
    16 |      56      99.4%    11.7%    11.6% |      56      99.4%    11.7%    11.6%
    32 |      56      99.4%    11.7%    11.6% |      56      99.4%    11.7%    11.6%


## 5. 日本語データでのMulti-probe評価

In [6]:
print('='*80)
print('Confidence vs Random Probe Order - Japanese E5-base')
print('='*80)

for bw in [8, 16]:
    print(f'\n--- band_width={bw} ({128//bw} bands) ---')
    
    conf_results_ja = evaluate_multiprobe(
        ja_emb, ja_hashes, itq,
        label=f'JA bw={bw}', band_width=bw,
        max_probes_list=[0, 4, 8, 16, 32],
        order='confidence'
    )
    
    rand_results_ja = evaluate_multiprobe(
        ja_emb, ja_hashes, itq,
        label=f'JA bw={bw}', band_width=bw,
        max_probes_list=[0, 4, 8, 16, 32],
        order='random'
    )
    
    print(f'\n{"Probes":>6} | {"--- Confidence ---":^36} | {"--- Random ---":^36}')
    print(f'{"":>6} | {"Cands":>8} {"Reduction":>10} {"FiltRcl":>8} {"R@10":>8} | '
          f'{"Cands":>8} {"Reduction":>10} {"FiltRcl":>8} {"R@10":>8}')
    print('-' * 85)
    for c, r in zip(conf_results_ja, rand_results_ja):
        c_red = (1 - c['candidates'] / 10000) * 100
        r_red = (1 - r['candidates'] / 10000) * 100
        print(f'{c["max_probes"]:>6} | '
              f'{c["candidates"]:>7.0f} {c_red:>9.1f}% {c["filter_recall"]*100:>7.1f}% {c["recall_at_k"]*100:>7.1f}% | '
              f'{r["candidates"]:>7.0f} {r_red:>9.1f}% {r["filter_recall"]*100:>7.1f}% {r["recall_at_k"]*100:>7.1f}%')

Confidence vs Random Probe Order - Japanese E5-base

--- band_width=8 (16 bands) ---



Probes |          --- Confidence ---          |            --- Random ---           
       |    Cands  Reduction  FiltRcl     R@10 |    Cands  Reduction  FiltRcl     R@10
-------------------------------------------------------------------------------------
     0 |     694      93.1%    63.9%    63.9% |     694      93.1%    63.9%    63.9%
     4 |     844      91.6%    67.4%    67.4% |     865      91.3%    70.7%    70.7%
     8 |    1002      90.0%    72.4%    72.4% |    1016      89.8%    76.5%    76.5%
    16 |    1318      86.8%    82.9%    82.7% |    1318      86.8%    82.9%    82.7%
    32 |    1318      86.8%    82.9%    82.7% |    1318      86.8%    82.9%    82.7%

--- band_width=16 (8 bands) ---



Probes |          --- Confidence ---          |            --- Random ---           
       |    Cands  Reduction  FiltRcl     R@10 |    Cands  Reduction  FiltRcl     R@10
-------------------------------------------------------------------------------------
     0 |       5     100.0%     3.8%     3.8% |       5     100.0%     3.8%     3.8%
     4 |       6      99.9%     5.4%     5.4% |       6      99.9%     5.4%     5.4%
     8 |       8      99.9%     7.3%     7.3% |       8      99.9%     7.3%     7.3%
    16 |       8      99.9%     7.3%     7.3% |       8      99.9%     7.3%     7.3%
    32 |       8      99.9%     7.3%     7.3% |       8      99.9%     7.3%     7.3%


## 6. Multi-probe + Pivot統合評価

In [7]:
print('='*80)
print('Confidence Multi-probe + Pivot Filter Integration')
print('='*80)

# 英語
print('\n--- English E5-base ---')
for bw in [8, 16]:
    for pt in [20, 25]:
        results_cp = evaluate_multiprobe_with_pivot(
            en_emb, en_hashes, en_pivot_dist, en_pivots, itq,
            label=f'EN bw={bw} pt={pt}', band_width=bw,
            max_probes_list=[0, 4, 8, 16],
            pivot_threshold=pt, order='confidence'
        )
        
        print(f'\n  band_width={bw}, pivot_threshold={pt}:')
        print(f'  {"Probes":>6} {"Band→":>8} {"→Pivot":>8} {"Reduction":>10} {"FiltRcl":>8} {"R@10":>8} {"Time":>8}')
        print(f'  ' + '-' * 65)
        for r in results_cp:
            total_red = (1 - r['pivot_candidates'] / 10000) * 100
            print(f'  {r["max_probes"]:>6} {r["band_candidates"]:>7.0f} {r["pivot_candidates"]:>7.0f} '
                  f'{total_red:>9.1f}% {r["filter_recall"]*100:>7.1f}% '
                  f'{r["recall_at_k"]*100:>7.1f}% {r["time_ms"]:>7.2f}')

# 日本語
print('\n--- Japanese E5-base ---')
for bw in [8, 16]:
    for pt in [20, 25]:
        results_cp = evaluate_multiprobe_with_pivot(
            ja_emb, ja_hashes, ja_pivot_dist, ja_pivots, itq,
            label=f'JA bw={bw} pt={pt}', band_width=bw,
            max_probes_list=[0, 4, 8, 16],
            pivot_threshold=pt, order='confidence'
        )
        
        print(f'\n  band_width={bw}, pivot_threshold={pt}:')
        print(f'  {"Probes":>6} {"Band→":>8} {"→Pivot":>8} {"Reduction":>10} {"FiltRcl":>8} {"R@10":>8} {"Time":>8}')
        print(f'  ' + '-' * 65)
        for r in results_cp:
            total_red = (1 - r['pivot_candidates'] / 10000) * 100
            print(f'  {r["max_probes"]:>6} {r["band_candidates"]:>7.0f} {r["pivot_candidates"]:>7.0f} '
                  f'{total_red:>9.1f}% {r["filter_recall"]*100:>7.1f}% '
                  f'{r["recall_at_k"]*100:>7.1f}% {r["time_ms"]:>7.2f}')

Confidence Multi-probe + Pivot Filter Integration

--- English E5-base ---



  band_width=8, pivot_threshold=20:
  Probes    Band→   →Pivot  Reduction  FiltRcl     R@10     Time
  -----------------------------------------------------------------
       0    2181    2039      79.6%    68.4%    66.1%    2.73
       4    2446    2282      77.2%    70.8%    67.9%    3.08
       8    2793    2601      74.0%    75.1%    71.0%    3.26
      16    3757    3489      65.1%    84.6%    77.9%    3.72



  band_width=8, pivot_threshold=25:
  Probes    Band→   →Pivot  Reduction  FiltRcl     R@10     Time
  -----------------------------------------------------------------
       0    2181    2158      78.4%    68.8%    66.0%    2.80
       4    2446    2419      75.8%    71.2%    67.9%    3.13
       8    2793    2761      72.4%    75.5%    71.2%    3.30
      16    3757    3713      62.9%    85.0%    77.8%    3.77



  band_width=16, pivot_threshold=20:
  Probes    Band→   →Pivot  Reduction  FiltRcl     R@10     Time
  -----------------------------------------------------------------
       0      33      32      99.7%     7.9%     7.9%    0.81
       4      40      39      99.6%     9.0%     8.9%    0.92
       8      56      55      99.5%    11.7%    11.6%    1.04
      16      56      55      99.5%    11.7%    11.6%    1.06



  band_width=16, pivot_threshold=25:
  Probes    Band→   →Pivot  Reduction  FiltRcl     R@10     Time
  -----------------------------------------------------------------
       0      33      33      99.7%     7.9%     7.9%    0.75
       4      40      40      99.6%     9.0%     8.9%    0.94
       8      56      56      99.4%    11.7%    11.6%    1.01
      16      56      56      99.4%    11.7%    11.6%    0.99

--- Japanese E5-base ---



  band_width=8, pivot_threshold=20:
  Probes    Band→   →Pivot  Reduction  FiltRcl     R@10     Time
  -----------------------------------------------------------------
       0     694     560      94.4%    63.0%    63.0%    2.06
       4     844     676      93.2%    66.5%    66.5%    2.34
       8    1002     798      92.0%    71.5%    71.5%    2.48
      16    1318    1047      89.5%    81.9%    81.9%    2.74



  band_width=8, pivot_threshold=25:
  Probes    Band→   →Pivot  Reduction  FiltRcl     R@10     Time
  -----------------------------------------------------------------
       0     694     646      93.5%    63.8%    63.8%    2.32
       4     844     783      92.2%    67.3%    67.3%    2.57
       8    1002     928      90.7%    72.3%    72.3%    2.60
      16    1318    1221      87.8%    82.8%    82.6%    2.73



  band_width=16, pivot_threshold=20:
  Probes    Band→   →Pivot  Reduction  FiltRcl     R@10     Time
  -----------------------------------------------------------------
       0       5       4     100.0%     3.6%     3.6%    0.61
       4       6       6      99.9%     5.2%     5.2%    0.80
       8       8       8      99.9%     7.1%     7.1%    0.90
      16       8       8      99.9%     7.1%     7.1%    0.86



  band_width=16, pivot_threshold=25:
  Probes    Band→   →Pivot  Reduction  FiltRcl     R@10     Time
  -----------------------------------------------------------------
       0       5       4     100.0%     3.7%     3.7%    0.65
       4       6       6      99.9%     5.3%     5.3%    0.84
       8       8       8      99.9%     7.2%     7.2%    0.89
      16       8       8      99.9%     7.2%     7.2%    0.88


## 7. Probe毎の限界Recall改善（収穫逓減分析）

In [8]:
print('='*80)
print('Marginal Recall Improvement per Probe (English, band_width=8)')
print('='*80)

# 細かいprobe数で評価
fine_probes = list(range(0, 17))
fine_results = evaluate_multiprobe(
    en_emb, en_hashes, itq,
    label='EN bw=8', band_width=8,
    max_probes_list=fine_probes,
    order='confidence'
)

print(f'\n{"Probes":>6} {"Candidates":>12} {"FilterRecall":>13} {"R@10":>8} {"ΔR@10":>8} {"ΔCands":>8}')
print('-' * 65)
prev_recall = 0.0
prev_cands = 0.0
for r in fine_results:
    delta_r = r['recall_at_k'] * 100 - prev_recall
    delta_c = r['candidates'] - prev_cands
    print(f'{r["max_probes"]:>6} {r["candidates"]:>10.0f} '
          f'{r["filter_recall"]*100:>12.1f}% {r["recall_at_k"]*100:>7.1f}% '
          f'{delta_r:>+7.1f}% {delta_c:>+7.0f}')
    prev_recall = r['recall_at_k'] * 100
    prev_cands = r['candidates']

Marginal Recall Improvement per Probe (English, band_width=8)



Probes   Candidates  FilterRecall     R@10    ΔR@10   ΔCands
-----------------------------------------------------------------
     0       2181         68.9%    66.0%   +66.0%   +2181
     1       2235         69.2%    66.3%    +0.3%     +54
     2       2303         70.1%    67.0%    +0.7%     +69
     3       2369         70.8%    67.7%    +0.7%     +66
     4       2446         71.3%    68.1%    +0.4%     +77
     5       2521         72.6%    69.2%    +1.1%     +75
     6       2612         73.3%    69.6%    +0.4%     +91
     7       2698         74.3%    70.3%    +0.7%     +86
     8       2793         75.6%    71.4%    +1.1%     +95
     9       2891         76.5%    72.2%    +0.8%     +98
    10       3002         77.5%    72.8%    +0.6%    +111
    11       3109         78.4%    73.4%    +0.6%    +107
    12       3242         80.0%    74.3%    +0.9%    +133
    13       3356         81.7%    75.5%    +1.2%    +115
    14       3491         82.9%    76.6%    +1.1%    +135
  

## 8. MiniLM検証

In [9]:
print('='*80)
print('MiniLM Verification')
print('='*80)

for bw in [8, 16]:
    print(f'\n--- band_width={bw} ---')
    
    conf_results_m = evaluate_multiprobe(
        minilm_emb, minilm_hashes, itq_minilm,
        label=f'MiniLM bw={bw}', band_width=bw,
        max_probes_list=[0, 4, 8, 16],
        order='confidence'
    )
    
    rand_results_m = evaluate_multiprobe(
        minilm_emb, minilm_hashes, itq_minilm,
        label=f'MiniLM bw={bw}', band_width=bw,
        max_probes_list=[0, 4, 8, 16],
        order='random'
    )
    
    print(f'\n{"Probes":>6} | {"--- Confidence ---":^36} | {"--- Random ---":^36}')
    print(f'{"":>6} | {"Cands":>8} {"Reduction":>10} {"FiltRcl":>8} {"R@10":>8} | '
          f'{"Cands":>8} {"Reduction":>10} {"FiltRcl":>8} {"R@10":>8}')
    print('-' * 85)
    for c, r in zip(conf_results_m, rand_results_m):
        c_red = (1 - c['candidates'] / 10000) * 100
        r_red = (1 - r['candidates'] / 10000) * 100
        print(f'{c["max_probes"]:>6} | '
              f'{c["candidates"]:>7.0f} {c_red:>9.1f}% {c["filter_recall"]*100:>7.1f}% {c["recall_at_k"]*100:>7.1f}% | '
              f'{r["candidates"]:>7.0f} {r_red:>9.1f}% {r["filter_recall"]*100:>7.1f}% {r["recall_at_k"]*100:>7.1f}%')

MiniLM Verification

--- band_width=8 ---



Probes |          --- Confidence ---          |            --- Random ---           
       |    Cands  Reduction  FiltRcl     R@10 |    Cands  Reduction  FiltRcl     R@10
-------------------------------------------------------------------------------------
     0 |     654      93.5%    50.9%    50.9% |     654      93.5%    50.9%    50.9%
     4 |     806      91.9%    54.4%    54.4% |     810      91.9%    57.6%    57.6%
     8 |     955      90.5%    59.5%    59.5% |     961      90.4%    63.2%    63.2%
    16 |    1250      87.5%    71.4%    71.4% |    1250      87.5%    71.4%    71.4%

--- band_width=16 ---



Probes |          --- Confidence ---          |            --- Random ---           
       |    Cands  Reduction  FiltRcl     R@10 |    Cands  Reduction  FiltRcl     R@10
-------------------------------------------------------------------------------------
     0 |       2     100.0%     3.2%     3.2% |       2     100.0%     3.2%     3.2%
     4 |       3     100.0%     4.0%     4.0% |       4     100.0%     4.4%     4.4%
     8 |       5     100.0%     5.5%     5.5% |       5     100.0%     5.5%     5.5%
    16 |       5     100.0%     5.5%     5.5% |       5     100.0%     5.5%     5.5%


## 9. サマリー

In [10]:
print('='*80)
print('Confidence Multi-probe Summary')
print('='*80)

print('\n【確信度ベースmulti-probeの評価】')
print('1. 確信度分布: ITQ射影値の絶対値が確信度の指標として機能するか')
print('2. confidence順 vs ランダム順: 確信度順のprobe順序の優位性')
print('3. Probe数 vs Recall: 収穫逓減の度合い')
print('4. Pivot統合: multi-probe + Pivot枝刈りの組み合わせ効果')
print('\n実験84で全パイプラインを統一基準で比較。')

Confidence Multi-probe Summary

【確信度ベースmulti-probeの評価】
1. 確信度分布: ITQ射影値の絶対値が確信度の指標として機能するか
2. confidence順 vs ランダム順: 確信度順のprobe順序の優位性
3. Probe数 vs Recall: 収穫逓減の度合い
4. Pivot統合: multi-probe + Pivot枝刈りの組み合わせ効果

実験84で全パイプラインを統一基準で比較。
