# Tutorial 7 — Working with `qcom.data`

This notebook shows how to use the utilities in **`qcom.data`**:

- `qcom.data.ops` — normalize counts → probabilities, truncate small entries, and print top states.
- `qcom.data.sampling` — resample (generate synthetic datasets) and combine datasets safely.
- `qcom.data.noise` — add classical readout noise (Monte Carlo) and optionally **mitigate** with `mthree` using simple per-qubit error rates.

The examples use small toy datasets, so you can run everything quickly.

In [1]:
# Imports
from qcom.data.ops import normalize_to_probabilities, truncate_probabilities, print_most_probable_data
from qcom.data.sampling import sample_data, combine_datasets
from qcom.data.noise import introduce_error, m3_mitigate_counts_from_rates

import random
from pprint import pprint

# For reproducibility in sampling/noise demos
random.seed(1234)

## 1) Start with a small counts dictionary
Assume you obtained integer **counts** from an experiment or simulation.

In [2]:
counts = {
    "00": 510,
    "01": 230,
    "10": 220,
    "11": 40,
}
total_count = sum(counts.values())
print("Total shots:", total_count)
pprint(counts)

Total shots: 1000
{'00': 510, '01': 230, '10': 220, '11': 40}


## 2) Normalize to probabilities & inspect the most likely states

In [3]:
probs = normalize_to_probabilities(counts, total_count)
print("Sum of probabilities:", sum(probs.values()))
print_most_probable_data(probs, n=4)

Sum of probabilities: 1.0
Top 4 Most probable bit strings:
1.  Bit string: 00, Probability: 0.51000000
2.  Bit string: 01, Probability: 0.23000000
3.  Bit string: 10, Probability: 0.22000000
4.  Bit string: 11, Probability: 0.04000000


## 3) Truncate small probabilities (no renormalization)
You might want to ignore a tail of tiny entries during reporting or plotting.

In [4]:
truncated = truncate_probabilities(probs, threshold=0.05)
print("Kept entries ≥ 0.05:")
pprint(truncated)
print("Sum after truncation (no renorm):", sum(truncated.values()))

Kept entries ≥ 0.05:
{'00': 0.51, '01': 0.23, '10': 0.22}
Sum after truncation (no renorm): 0.96


## 4) Sampling: generate a synthetic dataset from counts
Use `sample_data` to resample a new integer-count dataset of a desired size.

In [5]:
sample_size = 2000
sampled_counts = sample_data(counts, total_count=total_count, sample_size=sample_size, show_progress=False)
print("Total sampled:", sum(sampled_counts.values()))
pprint(sampled_counts)

Total sampled: 2000
{'00': 994, '01': 465, '10': 464, '11': 77}


## 5) Combine datasets safely
You can merge **two probability dictionaries** (they will be renormalized) **or** merge **two counts dictionaries** directly. Mixing a prob-dict with a counts-dict raises an error to prevent silent mistakes.

In [6]:
# Merge two counts dicts
more_counts = {"00": 50, "01": 40, "10": 10, "11": 0}
merged_counts = combine_datasets(counts, more_counts, show_progress=False)
print("Merged counts total:", sum(merged_counts.values()))
pprint(merged_counts)

# Merge two probability dicts
probs_a = normalize_to_probabilities(counts, total_count)
probs_b = normalize_to_probabilities(more_counts, sum(more_counts.values()))
merged_probs = combine_datasets(probs_a, probs_b, show_progress=False)
print("Merged probs sum:", sum(merged_probs.values()))
print_most_probable_data(merged_probs, n=4)

Merged counts total: 1100
{'00': 560, '01': 270, '10': 230, '11': 40}
Merged probs sum: 1.0
Top 4 Most probable bit strings:
1.  Bit string: 00, Probability: 0.50500000
2.  Bit string: 01, Probability: 0.31500000
3.  Bit string: 10, Probability: 0.16000000
4.  Bit string: 11, Probability: 0.02000000


## 6) Add classical readout noise (Monte Carlo)
`introduce_error` simulates independent per-qubit flips **on the measured bitstrings**.

- `ground_rate`: probability a measured `0` flips to `1`.
- `excited_rate`: probability a measured `1` flips to `0`.

You can pass a **single float** (same for all qubits), a **list/tuple of per-qubit rates**, or a **{index → rate} mapping** for sparse overrides.

In [7]:
# Global scalar rates
noisy_counts_global = introduce_error(counts, ground_rate=0.01, excited_rate=0.08, seed=1234)
print("Noisy counts (global rates) total:", sum(noisy_counts_global.values()))
pprint(noisy_counts_global)

# Per-qubit rates (here 2 qubits)
noisy_counts_per_qubit = introduce_error(counts, ground_rate=[0.02, 0.00], excited_rate=[0.05, 0.10], seed=1234)
print("Noisy counts (per-qubit rates) total:", sum(noisy_counts_per_qubit.values()))
pprint(noisy_counts_per_qubit)

Noisy counts (global rates) total: 1000
{'00': 542, '01': 212, '10': 207, '11': 39}
Noisy counts (per-qubit rates) total: 1000
{'00': 544, '01': 201, '10': 216, '11': 39}


## 7) Optional: Readout **mitigation** with `mthree`
If you have the `mthree` package installed (`pip install qiskit-addon-mthree`), you can mitigate readout error using only two per-qubit numbers: `ground_rate` (0→1) and `excited_rate` (1→0). Under the hood, we build the confusion matrices and apply `M3Mitigation`.

> The function returns **quasi-probabilities** (may include tiny negative values internally). We clip and renormalize for convenience.

In [8]:
try:
    import mthree  # noqa: F401
    have_m3 = True
except Exception:
    have_m3 = False

if have_m3:
    mitigated = m3_mitigate_counts_from_rates(
        noisy_counts_global,
        ground_rate=0.01,
        excited_rate=0.08,
        qubits=None,  # defaults to [0..N-1]
    )
    print("Mitigated distribution (sum=", sum(mitigated.values()), ")")
    print_most_probable_data(mitigated, n=4)
else:
    print("mthree not installed; skipping mitigation demo. Install with: pip install qiskit-addon-mthree")

Mitigated distribution (sum= 0.9999999999999999 )
Top 4 Most probable bit strings:
1.  Bit string: 00, Probability: 0.51703898
2.  Bit string: 01, Probability: 0.22362034
3.  Bit string: 10, Probability: 0.21812584
4.  Bit string: 11, Probability: 0.04121483


## 8) Where to go next
- Use `qcom.io` (JSON/text/Parquet) to load/save datasets, then process them with the functions shown here.
- Try your own per-qubit error maps to stress-test mitigation.
- For larger bitstrings, prefer working with truncated or sparse dicts when printing/plotting.