# IBM Smoke Test (Phase A) Notebook

This notebook reproduces the two-qubit Bell-state smoke test described in the experimental plan.
It gives you an interactive way to validate the IBM hardware path before running larger studies.

## Prerequisites

1. Install the Quartumse package in the same environment as this notebook (this repository already provides it).
2. Make sure Qiskit is installed and up to date.
3. Set the `QISKIT_IBM_TOKEN` environment variable to a valid IBM Quantum Platform API token **before** you run any hardware cells.
4. Ensure you have runtime minutes available on the `ibm_torino` backend.

> ⚠️ The notebook will raise an error if the IBM token is missing.

### Option: Set the IBM token in-notebook

If you did not export `QISKIT_IBM_TOKEN` when launching the notebook server you can do it here.
Remove the comment marker and paste your token string between quotes.

```
import os
os.environ["QISKIT_IBM_TOKEN"] = "paste-your-token"
```

In [None]:
# Load environment variables from .env file (if it exists)
try:
    from dotenv import load_dotenv
    load_dotenv()
    print('✓ Environment variables loaded from .env file')
except ImportError:
    print('⚠️ python-dotenv not installed. Install with: pip install python-dotenv')
    print('   Alternatively, set environment variables manually before starting Jupyter.')
except Exception as e:
    print(f'⚠️ Could not load .env file: {e}')

# Verify IBM token is available
import os
if not os.environ.get('QISKIT_IBM_TOKEN') and not os.environ.get('QISKIT_RUNTIME_API_TOKEN'):
    raise EnvironmentError(
        'QISKIT_IBM_TOKEN or QISKIT_RUNTIME_API_TOKEN is not set.\n'
        'Options:\n'
        '  1. Create a .env file with: QISKIT_RUNTIME_API_TOKEN=your-token-here\n'
        '  2. Set environment variable before starting Jupyter\n'
        '  3. Uncomment and use the manual token cell below\n'
        'Get your token from: https://quantum.ibm.com'
    )

# Show which token variable was found
token_var = 'QISKIT_IBM_TOKEN' if os.environ.get('QISKIT_IBM_TOKEN') else 'QISKIT_RUNTIME_API_TOKEN'
print(f'✓ IBM token detected from {token_var} (value not shown for security)')

## Imports and helpers

The helpers mirror the standalone smoke-test script so results line up with the plan.

In [None]:
from pathlib import Path
import numpy as np
from qiskit import QuantumCircuit, transpile

from quartumse.connectors import resolve_backend, is_ibm_runtime_backend, create_runtime_sampler
from quartumse.shadows.core import Observable
from quartumse import ShadowEstimator
from quartumse.shadows import ShadowConfig
from quartumse.shadows.config import ShadowVersion
from quartumse.reporting.manifest import MitigationConfig

In [None]:
TRANSPILE_OPTIONS = {
    'optimization_level': 1,
    'seed_transpiler': 314159,
}


def bell_circuit():
    qc = QuantumCircuit(2)
    qc.h(0)
    qc.cx(0, 1)
    return qc


def measure_in_basis(qc, pauli_str):
    circuit = qc.copy()
    for i, axis in enumerate(pauli_str):
        if axis == 'X':
            circuit.h(i)
        elif axis == 'Y':
            circuit.sdg(i)
            circuit.h(i)
    circuit.measure_all()
    return circuit


def parity_from_counts(counts, pauli_str, shots):
    def bit_value(bitstring, index):
        return int(bitstring[-(index + 1)])

    expectation = 0.0
    for bitstring, ct in counts.items():
        weight = ct / shots
        parity = 1.0
        for idx, axis in enumerate(pauli_str):
            if axis != 'I':
                parity *= 1 - 2 * bit_value(bitstring, idx)
        expectation += weight * parity
    return expectation


## Connect to IBM hardware

This resolves the IBM backend descriptor and prepares the output directory used by the smoke test.

In [None]:
backend, snapshot = resolve_backend('ibm:ibm_torino')
print('Connected to backend:', backend.name)
print('Backend version snapshot:', snapshot)
Path('validation_data').mkdir(exist_ok=True)

## Run direct measurement baselines

These replicate the 250-shot ZZ/XX parity checks. The results feed into the comparison table at the end.

In [None]:
observables = [Observable('ZZ', 1.0), Observable('XX', 1.0)]
direct_shots = {
    'ZZ': 250,
    'XX': 250,
}

direct_results = {}
qc = bell_circuit()

sampler = None
execution_mode = 'backend.run'
if is_ibm_runtime_backend(backend):
    sampler = create_runtime_sampler(backend)
    if sampler is not None:
        execution_mode = 'runtime_sampler'

for obs in observables:
    pauli = obs.pauli_string
    shots = direct_shots[pauli]
    circuit = measure_in_basis(qc, pauli)
    compiled = transpile(circuit, backend, **TRANSPILE_OPTIONS)

    if sampler is not None:
        job = sampler.run([compiled], shots=shots)
        result = job.result()
        counts = result[0].data.meas.get_counts()
    else:
        job = backend.run(compiled, shots=shots)
        counts = job.result().get_counts()

    expectation = parity_from_counts(counts, pauli, shots)

    try:
        compiled_qasm = compiled.qasm()
    except Exception:
        compiled_qasm = None

    direct_results[pauli] = {
        'expectation': float(expectation),
        'shots': shots,
        'counts': dict(counts),
        'compiled_circuit_qasm': compiled_qasm,
        'execution_mode': execution_mode,
        'transpile_options': dict(TRANSPILE_OPTIONS),
    }
    print(f"[Direct] {pauli}: expectation={expectation:.3f}, shots={shots}")


In [None]:
  # Retrieve the completed job results manually
  from qiskit_ibm_runtime import QiskitRuntimeService
  import os

  token = os.environ.get('QISKIT_IBM_TOKEN') or os.environ.get('QISKIT_RUNTIME_API_TOKEN')
  service = QiskitRuntimeService(token=token)

  # Get the most recent completed jobs
  jobs = service.jobs(limit=5, backend_name='ibm_torino')
  for job in jobs:
      status = job.status()
      print(f"Job {job.job_id()}: {status}")

      # Check if status is string 'DONE' or has .name attribute
      status_str = status if isinstance(status, str) else status.name

      if status_str == 'DONE':
          print(f"  ✓ Job completed successfully")
          try:
              result = job.result()
              print(f"  Result type: {type(result)}")

              # Extract counts from SamplerV2 result
              counts = result[0].data.meas.get_counts()
              print(f"  Counts: {counts}")
              print(f"  Total shots: {sum(counts.values())}")

              # Calculate which observable this was (ZZ or XX)
              # ZZ should show mostly '00' and '11'
              # XX should show more mixed results
              if '00' in counts and '11' in counts:
                  correlation = (counts.get('00', 0) + counts.get('11', 0)) / sum(counts.values())
                  print(f"  Correlation (00+11): {correlation:.3f}")

          except Exception as e:
              print(f"  Error extracting result: {e}")
              import traceback
              traceback.print_exc()

          print()  # Blank line between jobs

## Classical shadows v0 baseline

Uses the default (noise-agnostic) shadow workflow with 500 random measurements.

In [None]:
shadow_v0 = ShadowEstimator(
    backend='ibm:ibm_torino',
    shadow_config=ShadowConfig(
        version=ShadowVersion.V0_BASELINE,
        shadow_size=500,
        random_seed=42,
    ),
    data_dir='validation_data',
)

result_v0 = shadow_v0.estimate(qc, observables, save_manifest=True)
print('Shadow v0 manifest saved to:', result_v0.manifest_path)

## Classical shadows v1 with measurement error mitigation

This run matches the plan's MEM-assisted configuration using 4×128 calibration shots.

In [None]:
mem_shots = 128
shadow_v1 = ShadowEstimator(
    backend='ibm:ibm_torino',
    shadow_config=ShadowConfig(
        version=ShadowVersion.V1_NOISE_AWARE,
        shadow_size=200,
        random_seed=43,
        apply_inverse_channel=True,
    ),
    mitigation_config=MitigationConfig(techniques=[], parameters={'mem_shots': mem_shots}),
    data_dir='validation_data',
)

result_v1 = shadow_v1.estimate(qc, observables, save_manifest=True)
print('Shadow v1 manifest saved to:', result_v1.manifest_path)

## Compare the results

The cell below prints expectation values and (when available) the 95% confidence intervals pulled from the estimator outputs.

In [None]:
def format_ci(ci):
  if ci is None:
      return 'N/A'
  return f"[{ci[0]:.3f}, {ci[1]:.3f}]"

summary_rows = []
for obs in observables:
  obs_key = str(obs)  # Use full string representation as key
  pauli = obs.pauli_string  # Use for display
  direct = direct_results[pauli]['expectation']
  v0_stats = result_v0.observables[obs_key]  # Fixed: use obs_key
  v1_stats = result_v1.observables[obs_key]  # Fixed: use obs_key

  summary_rows.append({
      'Observable': pauli,
      'Direct expectation': f"{direct:.3f}",
      'Shadows v0 expectation': f"{v0_stats['expectation_value']:.3f}",
      'Shadows v0 CI95': format_ci(v0_stats.get('ci_95')),
      'Shadows v1 expectation': f"{v1_stats['expectation_value']:.3f}",
      'Shadows v1 CI95': format_ci(v1_stats.get('ci_95')),
  })

import pandas as pd
summary_df = pd.DataFrame(summary_rows)
summary_df

In [None]:
# Check what's in result_v0
print("Cell 12 result_v0 status:")
print(f"  Type: {type(result_v0)}")
print(f"  Has observables? {hasattr(result_v0, 'observables')}")

if hasattr(result_v0, 'observables'):
  print(f"  Observables keys: {list(result_v0.observables.keys())}")
  print(f"  Observables data: {result_v0.observables}")
else:
  print("  No observables attribute!")

# Check what observables were requested
print("\nRequested observables:")
for obs in observables:
  print(f"  Observable: {obs}")
  print(f"  pauli_string: {obs.pauli_string}")
  print(f"  str(obs): {str(obs)}")


## Validation artifacts

The smoke test writes manifest files and raw shot data under `validation_data/`. Use the cell below to confirm that captures occurred.

In [None]:
validation_entries = [
    ('Shadow v0 manifest', Path(result_v0.manifest_path)),
    ('Shadow v0 shot data', Path(result_v0.shot_data_path)),
    ('Shadow v1 manifest', Path(result_v1.manifest_path)),
    ('Shadow v1 shot data', Path(result_v1.shot_data_path)),
]

if result_v1.mitigation_confusion_matrix_path:
    validation_entries.append((
        'Shadow v1 MEM confusion matrix',
        Path(result_v1.mitigation_confusion_matrix_path),
    ))

def _candidate_paths(raw_path):
    path_obj = Path(raw_path)
    yield path_obj
    if not path_obj.is_absolute():
        yield Path('validation_data') / path_obj
        yield Path(result_v0.manifest_path).parent / path_obj
        yield Path(result_v1.manifest_path).parent / path_obj

for label, raw_path in validation_entries:
    candidates = list(_candidate_paths(raw_path))
    resolved = next((p for p in candidates if p.exists()), None)
    if resolved is None:
        raise FileNotFoundError(f"Expected artifact missing: {label} -> {raw_path}")
    size_kb = resolved.stat().st_size / 1024
    print(f"\u2713 {label}: {resolved} ({size_kb:.1f} KB)")


In [None]:
import json
import platform
import subprocess
from datetime import datetime, UTC
from pathlib import Path
import numpy as np

import qiskit
import quartumse

# Custom JSON encoder to handle numpy types
class NumpyEncoder(json.JSONEncoder):
  def default(self, obj):
      if isinstance(obj, np.integer):
          return int(obj)
      if isinstance(obj, np.floating):
          return float(obj)
      if isinstance(obj, np.ndarray):
          return obj.tolist()
      if isinstance(obj, np.bool_):
          return bool(obj)
      return super().default(obj)

EXPECTED_VALUES = {
  'ZZ': 1.0,
  'XX': 1.0,
}
CATASTROPHIC_DELTA = 0.8
WARNING_DELTA = 0.4

quality_checks = []
for obs in observables:
  pauli = obs.pauli_string
  obs_key = str(obs)
  expected = EXPECTED_VALUES.get(pauli, 0.0)
  direct = direct_results[pauli]['expectation']
  v0_stats = result_v0.observables[obs_key]
  v1_stats = result_v1.observables[obs_key]

  direct_delta = abs(direct - expected)
  v0_delta = abs(v0_stats['expectation_value'] - expected)
  v1_delta = abs(v1_stats['expectation_value'] - expected)

  if direct_delta > CATASTROPHIC_DELTA:
      raise AssertionError(
          f"Direct measurement for {pauli} deviates from expectation by {direct_delta:.3f} (> {CATASTROPHIC_DELTA})"
      )
  if v0_delta > CATASTROPHIC_DELTA:
      raise AssertionError(
          f"Shadows v0 result for {pauli} deviates from expectation by {v0_delta:.3f} (> {CATASTROPHIC_DELTA})"
      )
  if v1_delta > CATASTROPHIC_DELTA:
      raise AssertionError(
          f"Shadows v1 result for {pauli} deviates from expectation by {v1_delta:.3f} (> {CATASTROPHIC_DELTA})"
      )

  v0_ci = v0_stats.get('ci_95')
  v1_ci = v1_stats.get('ci_95')
  v0_ci_contains_expected = None if v0_ci is None else (v0_ci[0] <= expected <= v0_ci[1])
  v1_ci_contains_expected = None if v1_ci is None else (v1_ci[0] <= expected <= v1_ci[1])

  quality_checks.append({
      'observable': pauli,
      'expected_value': expected,
      'direct_expectation': direct,
      'shadows_v0_expectation': v0_stats['expectation_value'],
      'shadows_v1_expectation': v1_stats['expectation_value'],
      'direct_delta': direct_delta,
      'shadows_v0_delta': v0_delta,
      'shadows_v1_delta': v1_delta,
      'direct_within_warning_delta': direct_delta <= WARNING_DELTA,
      'shadows_v0_within_warning_delta': v0_delta <= max(WARNING_DELTA, v0_stats.get('ci_width', 0.0)),
      'shadows_v1_within_warning_delta': v1_delta <= max(WARNING_DELTA, v1_stats.get('ci_width', 0.0)),
      'shadows_v0_ci_95': v0_ci,
      'shadows_v1_ci_95': v1_ci,
      'shadows_v0_ci_contains_expected': v0_ci_contains_expected,
      'shadows_v1_ci_contains_expected': v1_ci_contains_expected,
      'direct_vs_v0_delta': abs(direct - v0_stats['expectation_value']),
      'direct_vs_v1_delta': abs(direct - v1_stats['expectation_value']),
  })

  status_flags = []
  status_flags.append('direct OK' if direct_delta <= WARNING_DELTA else 'direct ⚠')
  status_flags.append('v0 OK' if v0_delta <= WARNING_DELTA else 'v0 ⚠')
  status_flags.append('v1 OK' if v1_delta <= WARNING_DELTA else 'v1 ⚠')
  print(f"Quality check {pauli}: " + ', '.join(status_flags))

def _git_output(args):
  try:
      return subprocess.check_output(['git', *args], text=True).strip()
  except Exception:
      return None

git_commit = _git_output(['rev-parse', 'HEAD'])
git_status = _git_output(['status', '--short'])
if git_status == '':
  git_status = 'clean'

backend_snapshot_payload = getattr(snapshot, 'model_dump', None)
if callable(backend_snapshot_payload):
  backend_snapshot_dict = snapshot.model_dump(mode='json')
else:
  backend_snapshot_dict = {
      'backend_name': snapshot.backend_name,
      'backend_version': snapshot.backend_version,
      'num_qubits': snapshot.num_qubits,
      'calibration_timestamp': str(getattr(snapshot, 'calibration_timestamp', None)),
  }

smoke_test_results = {
  'metadata': {
      'test_name': 'preliminary_smoke_test',
      'timestamp': datetime.now(UTC).isoformat(),
      'backend': backend.name,
      'backend_snapshot': backend_snapshot_dict,
      'runtime_sampler_used': sampler is not None,
      'runtime_execution_mode': execution_mode,
      'git_commit': git_commit,
      'git_status': git_status,
      'software_versions': {
          'quartumse': quartumse.__version__,
          'qiskit': qiskit.__version__,
          'python': platform.python_version(),
      },
      'transpile_options': TRANSPILE_OPTIONS,
      'circuit': {
          'type': 'Bell state',
          'num_qubits': 2,
      },
      'observables': [str(obs) for obs in observables],
  },
  'direct_measurements': {
      'method': 'Direct Pauli measurement with basis rotations',
      'total_shots': sum(direct_results[k]['shots'] for k in direct_results),
      'results': direct_results,
  },
  'shadows_v0': {
      'method': 'Classical Shadows v0 (baseline)',
      'experiment_id': result_v0.experiment_id,
      'manifest_path': result_v0.manifest_path,
      'shot_data_path': result_v0.shot_data_path,
      'shadow_size': 500,
      'execution_time': result_v0.execution_time,
      'results': {
          str(obs): {
              'expectation_value': result_v0.observables[str(obs)]['expectation_value'],
              'variance': result_v0.observables[str(obs)]['variance'],
              'ci_95': result_v0.observables[str(obs)].get('ci_95'),
              'ci_width': result_v0.observables[str(obs)]['ci_width'],
          }
          for obs in observables
      },
  },
  'shadows_v1': {
      'method': 'Classical Shadows v1 (noise-aware with MEM)',
      'experiment_id': result_v1.experiment_id,
      'manifest_path': result_v1.manifest_path,
      'shot_data_path': result_v1.shot_data_path,
      'shadow_size': 200,
      'mem_shots': mem_shots,
      'execution_time': result_v1.execution_time,
      'mitigation_confusion_matrix_path': result_v1.mitigation_confusion_matrix_path,
      'results': {
          str(obs): {
              'expectation_value': result_v1.observables[str(obs)]['expectation_value'],
              'variance': result_v1.observables[str(obs)]['variance'],
              'ci_95': result_v1.observables[str(obs)].get('ci_95'),
              'ci_width': result_v1.observables[str(obs)]['ci_width'],
          }
          for obs in observables
      },
  },
  'comparison': {
      'description': 'Side-by-side comparison of all three methods',
      'expected_values': EXPECTED_VALUES,
      'table': summary_rows,
  },
  'analysis': {
      'notes': [
          'Bell state should show ZZ = XX = +1 (perfect correlation)',
          'Noise and finite sampling cause deviations from ideal',
          'v1 (noise-aware) should show reduced error on noisy hardware',
          'All methods should give consistent results within confidence intervals',
      ],
      'quality_checks': quality_checks,
      'thresholds': {
          'catastrophic_delta': CATASTROPHIC_DELTA,
          'warning_delta': WARNING_DELTA,
      },
  },
}

results_filename = f"smoke_test_results_{datetime.now(UTC).strftime('%Y%m%d_%H%M%S')}.json"
results_path = Path('validation_data') / results_filename

with open(results_path, 'w') as f:
  json.dump(smoke_test_results, f, indent=2, cls=NumpyEncoder)

if not results_path.exists():
  raise FileNotFoundError(f'Failed to create results JSON at {results_path}')

print(f"✓ Complete smoke test results saved to: {results_path}")
print(f"  File size: {results_path.stat().st_size / 1024:.1f} KB")
print(f"  Git commit: {git_commit}")
print(f"  Runtime sampler used: {sampler is not None}")
print('Quality checks recorded for analysis JSON.')
print(f"  Thresholds -> catastrophic: {CATASTROPHIC_DELTA}, warning: {WARNING_DELTA}")

## Summary: What Gets Saved

**When you run this notebook, the following data is automatically saved to `validation_data/`:**

### 1. Shadow Estimator Manifests (JSON)
- **Location**: `validation_data/manifests/{experiment_id}.json`
- **Contains**: Complete provenance including circuit, backend calibration, mitigation config, software versions, results
- **Files**: 2 manifests (one for v0, one for v1)
- **Size**: ~10 KB each

### 2. Shot Data (Parquet)
- **Location**: `validation_data/shots/{experiment_id}.parquet`
- **Contains**: All raw measurement outcomes (basis, bitstring) for replay
- **Files**: 2 parquet files (one for v0, one for v1)
- **Size**: ~50-100 KB each

### 3. MEM Calibration Artifacts
- **Location**: `validation_data/mem/{experiment_id}.npz`
- **Contains**: Confusion matrix required for noise-aware replays
- **Files**: 1 NPZ file when MEM is enabled
- **Size**: ~5-20 KB

### 4. Smoke Test Summary (JSON)
- **Location**: `validation_data/smoke_test_results_{timestamp}.json`
- **Contains**: All results from all three methods (direct, v0, v1), git metadata, transpilation options, quality checks
- **Files**: 1 summary file per notebook run
- **Size**: ~5-15 KB

### 5. Direct Measurement Raw Counts
- **Location**: Included in smoke test summary JSON
- **Contains**: Full count dictionaries, compiled circuit QASM, execution mode

**Total storage per smoke test run**: ~220-330 KB

**To review later**:
```python
# Load smoke test summary
import json
from quartumse.shadows.core import Observable
from quartumse import ShadowEstimator

with open('validation_data/smoke_test_results_YYYYMMDD_HHMMSS.json', 'r') as f:
    results = json.load(f)

# Access any result
print(results['shadows_v1']['results']['ZZ']['expectation_value'])

# Load full provenance manifests
from quartumse.reporting.manifest import ProvenanceManifest
manifest = ProvenanceManifest.from_json(results['shadows_v1']['manifest_path'])

# Replay with new observables (handles noise-aware MEM replays)
estimator = ShadowEstimator(backend='aer_simulator', data_dir='validation_data')
new_result = estimator.replay_from_manifest(
    results['shadows_v1']['manifest_path'],
    observables=[Observable('YY'), Observable('XZ')],
)
print(new_result.observables['YY']['expectation_value'])
```


## Save Complete Results for Later Review

Export all results (direct, v0, v1, comparison) to JSON for later analysis.