# QCOM I/O Tutorial

This tutorial shows how to **load** measurement and probability data from a few common formats and (optionally) **save** them back out using QCOM's I/O helpers:

- `qcom.io.aquila.parse_json` — read **QuEra Aquila** JSON job results
- `qcom.io.parquet.parse_parquet` / `save_dict_to_parquet` — read/write **Parquet**
- `qcom.io.text.parse_file` / `save_data` — read/write **plaintext** files (`<bitstring> <value>`)  

> The examples below assume your repository layout has an `example_data` folder at the project root. Paths are set accordingly. If your paths differ, just edit `DATA_DIR` below.

`Note:` The method names above clarify class structure, however, you can call these methods using just the prefix `qcom.(method here)` instead of the full path.


## Contents
1) [Setup](#setup)
2) [Load Aquila JSON](#aquila)
3) [Load Parquet](#parquet)
4) [Load Plaintext](#text)
5) [Normalize to probabilities](#normalize)
6) [Save examples (commented)](#save)


## 1) Setup <a id='setup'></a>
Install (if needed):
```bash
pip install pyarrow pandas
```
Now import the I/O helpers and set paths to your local example files.

In [3]:
import qcom as qc

In [4]:
# --- Paths (edit if your layout differs) ---
DATA_DIR = "../example_data"
AQUILA_JSON = f"{DATA_DIR}/16_atom_aquila.json"
PARQUET_FILE = f"{DATA_DIR}/1_billion_counts_3.0_4_rungs.parquet"
TEXT_FILE = f"{DATA_DIR}/1_billion_3.0_4_rungs.txt"

print("Data paths:\n ", AQUILA_JSON, "\n ", PARQUET_FILE, "\n ", TEXT_FILE)

Data paths:
  ../example_data/16_atom_aquila.json 
  ../example_data/1_billion_counts_3.0_4_rungs.parquet 
  ../example_data/1_billion_3.0_4_rungs.txt


## 2) Load Aquila JSON <a id='aquila'></a>
The Aquila reader returns a **counts dict** (bitstring → count) and a **total count**.
By default it filters out shots with missing atoms (`sorted=True`) and inverts `postSequence` bits (0↔1) to match QCOM's historical convention.

In [5]:
try:
    aquila_counts, aquila_total = qc.parse_json(AQUILA_JSON, sorted=True, show_progress=False)
    print(f"Loaded Aquila JSON: {len(aquila_counts)} unique states, total shots = {aquila_total:.0f}")

    # show top 10 by count
    qc.print_most_probable_data(aquila_counts,10)
except FileNotFoundError as e:
    print("Aquila JSON not found at:", AQUILA_JSON)
    print("Edit DATA_DIR if your path differs.")
    aquila_counts, aquila_total = {}, 0.0

Loaded Aquila JSON: 313 unique states, total shots = 812
Top 10 Most probable bit strings:
 1.  Bit string: 1001001000010010, Probability: 34.00000000
 2.  Bit string: 1001001001001001, Probability: 34.00000000
 3.  Bit string: 0100100001100001, Probability: 34.00000000
 4.  Bit string: 0110000100100001, Probability: 31.00000000
 5.  Bit string: 1000011000010010, Probability: 27.00000000
 6.  Bit string: 1000010010010010, Probability: 27.00000000
 7.  Bit string: 1000010010000110, Probability: 25.00000000
 8.  Bit string: 0100100001001001, Probability: 23.00000000
 9.  Bit string: 0110000110000110, Probability: 21.00000000
10.  Bit string: 0100100100100001, Probability: 18.00000000


## 3) Load Parquet <a id='parquet'></a>
Parquet files are read into a **probabilities dict** (bitstring → probability).

In [7]:
try:
    parq_counts = qc.parse_parquet(PARQUET_FILE, show_progress=True)
    print(f"Loaded Parquet: {len(parq_counts)} states; total probability = {sum(parq_counts.values()):.6f}")
    # peek
    qc.print_most_probable_data(parq_counts,10)
except FileNotFoundError as e:
    print("Parquet file not found at:", PARQUET_FILE)
    print("Edit DATA_DIR if your path differs.")
    parq_probs = {}

# obtain total counts from dictionary
parq_total = sum(parq_counts.values())

Starting: Parsing Parquet file...
Completed: Parsing Parquet file. Elapsed time: 0.05 seconds.                    ning:    0.00s
Loaded Parquet: 68 states; total probability = 10000000.000000
Top 10 Most probable bit strings:
 1.  Bit string: 01000010, Probability: 3892973.00000000
 2.  Bit string: 10000001, Probability: 3891684.00000000
 3.  Bit string: 10000010, Probability: 487991.00000000
 4.  Bit string: 01000001, Probability: 487430.00000000
 5.  Bit string: 00000001, Probability: 205958.00000000
 6.  Bit string: 10000000, Probability: 205637.00000000
 7.  Bit string: 00000010, Probability: 205552.00000000
 8.  Bit string: 01000000, Probability: 205038.00000000
 9.  Bit string: 10000100, Probability: 49952.00000000
10.  Bit string: 00100001, Probability: 49793.00000000


## 4) Load Plaintext <a id='text'></a>
Plaintext files are simple whitespace-delimited lines: `<state> <value>`.
The reader returns a **counts dict** and a **total count**.

In [8]:
try:
    txt_counts, txt_total = qc.parse_file(TEXT_FILE, show_progress=True)
    print(f"Loaded TXT: {len(txt_counts)} unique states; total = {txt_total:.0f}")
    # peek
    qc.print_most_probable_data(txt_counts,10)
except FileNotFoundError as e:
    print("Text file not found at:", TEXT_FILE)
    print("Edit DATA_DIR if your path differs.")
    txt_counts, txt_total = {}, 0.0

Starting: Parsing file...
Completed: Parsing file. Elapsed time: 2.04 seconds.                            ng:    0.00s
Loaded TXT: 105 unique states; total = 1000000000
Top 10 Most probable bit strings:
 1.  Bit string: 01000010, Probability: 389245564.00000000
 2.  Bit string: 10000001, Probability: 389223579.00000000
 3.  Bit string: 01000001, Probability: 48778624.00000000
 4.  Bit string: 10000010, Probability: 48777628.00000000
 5.  Bit string: 01000000, Probability: 20562997.00000000
 6.  Bit string: 00000001, Probability: 20561460.00000000
 7.  Bit string: 00000010, Probability: 20550353.00000000
 8.  Bit string: 10000000, Probability: 20549831.00000000
 9.  Bit string: 10000100, Probability: 4983517.00000000
10.  Bit string: 00100001, Probability: 4981922.00000000


## 5) Normalize to probabilities <a id='normalize'></a>
Helper to convert **counts → probabilities**.

In [11]:
# normalize counts for aquila
aquila_probs = qc.normalize_to_probabilities(aquila_counts, aquila_total)
print("Aquila JSON:")
qc.print_most_probable_data(aquila_probs,10)

# normalize counts for parquet
parq_probs = qc.normalize_to_probabilities(parq_counts, parq_total)
print("\nParquet:")
qc.print_most_probable_data(parq_probs,10)

# normalize counts for text
txt_probs = qc.normalize_to_probabilities(txt_counts, txt_total)
print("Text:")
qc.print_most_probable_data(txt_probs,10)


Aquila JSON:
Top 10 Most probable bit strings:
 1.  Bit string: 1001001000010010, Probability: 0.04187192
 2.  Bit string: 1001001001001001, Probability: 0.04187192
 3.  Bit string: 0100100001100001, Probability: 0.04187192
 4.  Bit string: 0110000100100001, Probability: 0.03817734
 5.  Bit string: 1000011000010010, Probability: 0.03325123
 6.  Bit string: 1000010010010010, Probability: 0.03325123
 7.  Bit string: 1000010010000110, Probability: 0.03078818
 8.  Bit string: 0100100001001001, Probability: 0.02832512
 9.  Bit string: 0110000110000110, Probability: 0.02586207
10.  Bit string: 0100100100100001, Probability: 0.02216749

Parquet:
Top 10 Most probable bit strings:
 1.  Bit string: 01000010, Probability: 0.38929730
 2.  Bit string: 10000001, Probability: 0.38916840
 3.  Bit string: 10000010, Probability: 0.04879910
 4.  Bit string: 01000001, Probability: 0.04874300
 5.  Bit string: 00000001, Probability: 0.02059580
 6.  Bit string: 10000000, Probability: 0.02056370
 7.  Bit stri

## 6) Save examples (commented) <a id='save'></a>
Here are examples for saving. They're **commented out** so running the notebook won't write files by default.
Uncomment to save to Parquet or plaintext.


In [None]:
# --- Save probabilities to Parquet ---
# OUT_PARQUET = "./aquila_probs.parquet"
# if aquila_probs:
#     qc.save_dict_to_parquet(aquila_probs, OUT_PARQUET)

# --- Save counts to text ---
# OUT_TEXT = "./aquila_counts.txt"
# if aquila_counts:
#     qc.save_data(aquila_counts, OUT_TEXT)

print("Save examples are commented out — uncomment to write files.")

Save examples are commented out — uncomment to write files.


---
**Notes**
- Aquila loader inverts `postSequence` bits to follow QCOM's convention (1 = excitation).
- The `sorted=True` option filters out shots with incomplete `preSequence`. Set `sorted=False` to keep all.
- Parquet I/O relies on `pyarrow` under the hood via pandas.
- Plaintext is handy for quick inspection and version diffs.
