# Manuscript Claims Registry (Evidence Control)

**Objective**: enforce one-to-one traceability between each paper claim and concrete evidence artifacts (logs, tables, figures, checkpoints, code references).

Use this notebook as the single source of truth for what you can safely claim in the paper.


## 1) Why This Exists

A claim is publishable only when it has:
1. exact metric definition,
2. exact data split,
3. exact model/config/checkpoint,
4. exact artifact path proving it,
5. reproducible extraction route.

If any of these is missing, the claim remains **Draft** and must not be promoted to final manuscript language.


## 2) Claim Quality Levels

- **Draft**: idea-level claim, evidence not yet locked.
- **Supported**: evidence exists, but reproducibility check not completed.
- **Verified**: evidence + reproducibility + consistency checks passed.
- **Rejected**: claim contradicted by current evidence.

Only **Verified** claims should appear as definitive results in the paper.


## 3) Required Registry Fields

Each claim row must include:

- `claim_id` (stable unique key)
- `paper_section` (e.g., Introduction, Methods, Results)
- `claim_text` (exact sentence-level claim)
- `claim_type` (`method`, `performance`, `robustness`, `implementation`, `risk`)
- `model_variant` (`TCN`, `TCN_ATTENTION`, `TCN_FUSION`)
- `data_split` (`train`, `test_oos`, `stochastic_oos`)
- `eval_track` (`det_mode`, `det_mean`, `stochastic`)
- `metric_name`, `metric_value`, `metric_unit`
- `comparator_name`, `comparator_value` (if relative claim)
- `checkpoint_ref` (episode/checkpoint prefix)
- `run_metadata_json`
- `evidence_paths` (semicolon-separated artifact paths)
- `code_refs` (source files/functions used to compute metric)
- `repro_steps` (brief extraction instructions)
- `status` (`Draft`, `Supported`, `Verified`, `Rejected`)
- `owner`, `last_updated`, `notes`


## 4) Recommended Folder Convention

- Registry CSV: `tcn_documentation/claims_registry/claims_registry.csv`
- Snapshots: `tcn_documentation/claims_registry/snapshots/`
- Derived tables for paper: `tcn_documentation/claims_registry/exports/`

Keep artifact paths relative to project root where possible.


In [None]:
from pathlib import Path
import pandas as pd
from datetime import datetime

PROJECT_ROOT = Path.cwd()
if PROJECT_ROOT.name != 'adaptive_portfolio_rl':
    # If running from repo root or elsewhere, try to resolve explicitly
    candidate = PROJECT_ROOT / 'adaptive_portfolio_rl'
    if candidate.exists():
        PROJECT_ROOT = candidate

REGISTRY_DIR = PROJECT_ROOT / 'tcn_documentation' / 'claims_registry'
SNAPSHOT_DIR = REGISTRY_DIR / 'snapshots'
EXPORT_DIR = REGISTRY_DIR / 'exports'

for d in [REGISTRY_DIR, SNAPSHOT_DIR, EXPORT_DIR]:
    d.mkdir(parents=True, exist_ok=True)

REGISTRY_CSV = REGISTRY_DIR / 'claims_registry.csv'
print('Project root:', PROJECT_ROOT)
print('Registry CSV:', REGISTRY_CSV)


In [None]:
CLAIM_COLUMNS = [
    'claim_id',
    'paper_section',
    'claim_text',
    'claim_type',
    'model_variant',
    'data_split',
    'eval_track',
    'metric_name',
    'metric_value',
    'metric_unit',
    'comparator_name',
    'comparator_value',
    'checkpoint_ref',
    'run_metadata_json',
    'evidence_paths',
    'code_refs',
    'repro_steps',
    'status',
    'owner',
    'last_updated',
    'notes',
]

def init_registry(path=REGISTRY_CSV):
    if path.exists():
        df = pd.read_csv(path)
        for c in CLAIM_COLUMNS:
            if c not in df.columns:
                df[c] = ''
        df = df[CLAIM_COLUMNS]
    else:
        df = pd.DataFrame(columns=CLAIM_COLUMNS)
    return df

claims_df = init_registry()
print('Rows:', len(claims_df))
claims_df.head(3)


In [None]:
def upsert_claim(df, row_dict):
    row = {c: row_dict.get(c, '') for c in CLAIM_COLUMNS}
    row['last_updated'] = datetime.now().isoformat(timespec='seconds')
    claim_id = row['claim_id']
    if not claim_id:
        raise ValueError('claim_id is required')

    mask = df['claim_id'] == claim_id
    if mask.any():
        df.loc[mask, CLAIM_COLUMNS] = [row[c] for c in CLAIM_COLUMNS]
    else:
        df.loc[len(df)] = [row[c] for c in CLAIM_COLUMNS]
    return df

# Example placeholder claim
claims_df = upsert_claim(claims_df, {
    'claim_id': 'RES-TCN-001',
    'paper_section': 'Results',
    'claim_text': 'TCN outperforms baseline on OOS Sharpe in det_mean evaluation.',
    'claim_type': 'performance',
    'model_variant': 'TCN',
    'data_split': 'test_oos',
    'eval_track': 'det_mean',
    'metric_name': 'sharpe_ratio',
    'metric_value': '',
    'metric_unit': 'ratio',
    'comparator_name': 'baseline_sharpe',
    'comparator_value': '',
    'checkpoint_ref': '',
    'run_metadata_json': '',
    'evidence_paths': '',
    'code_refs': 'src/notebook_helpers/tcn_phase1.py::evaluate_experiment6_checkpoint',
    'repro_steps': 'Load checkpoint, run deterministic mean eval, extract summary metrics.',
    'status': 'Draft',
    'owner': 'Owner',
    'notes': 'Fill after full variant campaign.'
})
claims_df.tail(1)


In [None]:
def validate_evidence_paths(df, project_root=PROJECT_ROOT):
    rows = []
    for _, r in df.iterrows():
        claim_id = str(r.get('claim_id', ''))
        evidence = str(r.get('evidence_paths', '')).strip()
        if not evidence:
            rows.append({'claim_id': claim_id, 'path': '', 'exists': False, 'note': 'missing evidence_paths'})
            continue
        for p in [x.strip() for x in evidence.split(';') if x.strip()]:
            path = Path(p)
            if not path.is_absolute():
                path = project_root / path
            rows.append({'claim_id': claim_id, 'path': str(path), 'exists': path.exists(), 'note': ''})
    return pd.DataFrame(rows)

validation_df = validate_evidence_paths(claims_df)
validation_df.head(20)


In [None]:
# Persist registry + timestamped snapshot
claims_df = claims_df[CLAIM_COLUMNS].copy()
claims_df.to_csv(REGISTRY_CSV, index=False)

snapshot_path = SNAPSHOT_DIR / f"claims_registry_snapshot_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
claims_df.to_csv(snapshot_path, index=False)

print('Saved registry:', REGISTRY_CSV)
print('Saved snapshot:', snapshot_path)


In [None]:
# Export manuscript-ready views
verified = claims_df[claims_df['status'].str.lower() == 'verified'].copy()
draft = claims_df[claims_df['status'].str.lower() == 'draft'].copy()

verified_out = EXPORT_DIR / 'verified_claims.csv'
draft_out = EXPORT_DIR / 'draft_claims.csv'

verified.to_csv(verified_out, index=False)
draft.to_csv(draft_out, index=False)

print('Verified export:', verified_out)
print('Draft export:', draft_out)


## 5) Operating Rule for Writing

Before adding any numeric statement to the manuscript:
1. Add/Update a claim row here.
2. Attach evidence artifacts.
3. Mark as `Verified` only after rerun/repro check.
4. Use `claim_id` as inline comment in paper drafting notes.

This process prevents accidental over-claiming and keeps results publication-defensible.
