# Chain‑of‑Thought Faithfulness Experiments
This notebook reproduces the four planned analyses:
1. Category frequency & sequence patterns  
2. Length & entropy metrics  
3. Self‑consistency & backtracking correlation  
4. Explain‑then‑predict (XTP) classification  

Run the notebook top‑to‑bottom after placing the **cot_analysis** package and the segmented JSON data directory in the expected locations.

## Install dependencies

In [None]:
!pip install -q pandas numpy seaborn matplotlib scikit-learn nltk textstat tqdm

## Imports & data loading

In [None]:
from pathlib import Path
import pandas as pd, numpy as np, matplotlib.pyplot as plt, seaborn as sns

from cot_analysis import (
    data_utils, metrics_utils as met, visualization_utils as viz, model_utils as mod
)

sns.set_theme(style='whitegrid')
data_dir = Path('g_cot_cluster/outputs/mmlu/DeepSeek-R1-Distill-Llama-8B')
df = data_utils.load_segmented_directory(data_dir)
seq_df = data_utils.sequence_dataframe(df)
print(f'Loaded {len(df):,} segments from {seq_df.question_id.nunique():,} questions.')

## 1  · Category frequency & sequence patterns

In [None]:
# Category frequencies
freq_df = met.category_frequencies(seq_df)
viz.bar_category_freq(freq_df);


In [None]:
# First‑order transition matrices
mats = met.markov_transition_matrix(seq_df, data_utils.CATEGORY_ORDER)
for hint, mat in mats.items():
    viz.heatmap_transition(mat, f'Transition probabilities — {hint}');


In [None]:
# Jensen–Shannon divergence between bigram distributions
bigram_counts = met.bigram_distributions(seq_df)
js_mat = met.js_divergence_matrix(bigram_counts, data_utils.CATEGORY_ORDER)
viz.heatmap_js(js_mat);

## 2  · Length & entropy metrics

In [None]:
metrics_df = met.length_entropy_metrics(seq_df)
viz.dist_length(metrics_df);

## 3  · Self‑consistency & backtracking

Upload a CSV called **accuracy.csv** with columns `question_id` (int) and `accuracy` (0/1). This section correlates backtracking with final answer correctness.

In [None]:
acc_path = Path('accuracy.csv')
if acc_path.exists():
    accuracy = pd.read_csv(acc_path).set_index('question_id')['accuracy']
    r, p = met.backtracking_accuracy_correlation(metrics_df, accuracy)
    print(f'Point‑biserial r = {r:.3f}   p‑value = {p:.4g}')
    viz.scatter_backtracking(metrics_df, accuracy);
else:
    print('accuracy.csv not found — skipping correlation analysis.')

## 4  · Explain‑then‑predict (XTP)

In [None]:
if 'accuracy' in locals():
    X, y = mod.prepare_xy(seq_df, accuracy)
    clf, fpr, tpr, auc_val = mod.train_xtp_logreg(X, y)
    viz.plot_roc(fpr, tpr, f'LogReg  AUC={auc_val:.3f}');
else:
    print('accuracy labels unavailable — skipping XTP experiment.')

## Summary & next steps
- **Visualise** differences between hint conditions to spot CoT shortening or shifts.
- **Entropy metrics** complement raw length: shorter chains are not necessarily less informative.
- **Backtracking signal** can point to uncertainty: check if r ≈ 0 implies neutral effect.
- A high **XTP AUC** would suggest the CoT structure alone predicts correctness, a useful faithfulness cue.