# Startup Guide

This guide explains how to reproduce the PE-LiNN quantum error mitigation experiments shipped in this repository. It captures the exact environment assumptions, commands, deliverables, and tips for adapting the workflow.

## 1. Repository Overview

- `src/` – core quantum dataset generation logic (`dataset_generator.py`).
- `Experiments/dataset_saving.py` – entry point that builds a demo dataset via `GenerateQuantumDataset`.
- `Experiments/scripts/train_pelinn_from_npz.py` – trains the PE-LiNN model from a NumPy dataset.
- `Experiments/pelinn/` – model definition (`model.py`) and supporting utilities.
- `Experiments/data/` – default location for generated datasets (`demo_dataset.npz`).
- `Experiments/artifacts_*/` – folders for training outputs (loss curves, prediction plots, JSON summaries).

## 2. Prerequisites

- Python 3.10 or newer (the project was validated with 3.10).
- Windows PowerShell 5.1 (default terminal) with permissions to run local scripts (`Set-ExecutionPolicy -Scope Process RemoteSigned` if needed).
- Optional CUDA-capable GPU; training runs fine on CPU but may take longer.
- Virtual environment located at `.venv/` in the repository root.

### Python packages

Install these once inside the virtual environment:

```
pip install --upgrade pip
pip install qiskit qiskit-aer qiskit-ibm-runtime torch matplotlib numpy pandas scipy tqdm
```

> The generator relies on `qiskit_aer` for simulation. If IBM Quantum Runtime access is available, configure credentials separately (`qiskit-ibm-runtime` is already included above).

## 3. Environment Setup

All commands below assume the repository root `QAMP_project_group_14` as the working directory.

1. Activate the virtual environment:
   ```powershell
   .\.venv\Scripts\Activate.ps1
   ```
2. (Optional) Confirm Python and pip resolve to the environment:
   ```powershell
   python --version
   pip --version
   ```

Stay in the activated shell for the remaining steps.

## 4. Generate the Quantum Dataset

The demo workflow first produces a NumPy dataset that captures noisy vs. noiseless observable expectations for variational circuits.

1. Move into the `Experiments` directory so relative paths align with the script defaults:
   ```powershell
   Set-Location Experiments  or cd Experiments
   ```
2. Run the dataset generator:
   ```powershell
   python dataset_saving.py
   ```

### What the script does

- Imports `GenerateQuantumDataset` from `pelinn.data.dataset_generator`.
- Builds 150 samples of 3-qubit variational circuits with depth 4.
- Uses `SparsePauliOp("ZIZ")` as the observable and simulates both noisy (`noise_list` empty by default) and noiseless expectations with 1024 shots.
- Persists the dataset to `data/demo_dataset.npz` in NumPy archive format via `QuantumDataset.save_dataset(..., format="numpy")`.

### Deliverables

- `Experiments/data/demo_dataset.npz` containing:
  - `X`: noisy expectation values (`shape = (150, 1)`).
  - `Y`: noiseless expectation values (same shape as `X`).
  - `metadata`: global dataset metadata (`n_samples`, `n_qubits`, `circuit_type`, `shots`, etc.).
- Console log confirming the save path.

> To explore alternative circuit families, adjust `circuit_type` and `circuit_params` in `dataset_saving.py` before running the command. Supported types include `random`, `random_clifford`, `qaoa`, and `variational` (see `src/dataset_generator.py`).

## 5. Train the PE-LiNN Model

With the dataset in place, launch the training script (still within `Experiments/`).

```powershell
python scripts/train_pelinn_from_npz.py --dataset data/demo_dataset.npz --epochs 50 --batch-size 32 --lr 3e-4 --weight-decay 1e-2 --hid-dim 96 --steps 6 --dt 0.25 --output-dir artifacts_new --val-fraction 0.2 --log-level INFO
```

### Important arguments

- `--dataset`: Path to the `.npz` file created earlier.
- `--epochs`: Total training epochs (default 40; example uses 50 as in current runs).
- `--batch-size`, `--lr`, `--weight-decay`: Optimisation hyperparameters for AdamW.
- `--hid-dim`, `--steps`, `--dt`: Architecture controls for the liquid neural network (`PELiNNQEM`).
- `--tanh-head`: Optional flag to clamp predictions via `tanh`; omit for raw outputs.
- `--val-fraction`: Fraction of data reserved for validation (set to 0 to disable).
- `--no-normalise`: Disable feature normalisation if desired.
- `--log-level`: Verbosity (`INFO` recommended for progress tracking).
- `--seed`: Ensures reproducibility across dataset splits and PyTorch initialisation.

The script auto-detects CUDA; if unavailable it falls back to CPU.

### Training deliverables (saved under `--output-dir`)

- `loss_curve.png`: Training (and validation) loss trajectory across epochs.
- `pred_vs_true.png`: Scatter plot of predicted vs. true validation targets (only when validation data exists).
- `training_summary.json`: Structured summary containing model configuration, training hyperparameters, dataset stats, and final metrics (MAE, RMSE).

Execution also logs per-epoch metrics in the console. The JSON summary is designed for experiment tracking; integrate it with your logging solution as needed.

## 6. Extending the Workflow

- **Custom noise models**: Update `noise_config` in `dataset_saving.py` with entries supported by `NoiseModelFactory` (see `src/dataset_generator.py`).
- **Alternate observables**: Supply additional `SparsePauliOp` entries to the `observables` list for multi-target datasets.
- **Batch experiments**: Duplicate the dataset + training commands with modified arguments and direct each run to a unique `--output-dir`.
- **Notebook exploration**: `tutorials/Dataset_generation.ipynb` provides an interactive view of dataset creation if you prefer a Jupyter workflow.

## 7. Troubleshooting Checklist

- **`ModuleNotFoundError`**: Re-check virtual environment activation and that dependencies are installed inside it.
- **`FileNotFoundError: Dataset not found`**: Ensure `--dataset` points to the `.npz` file relative to the current working directory or provide an absolute path.
- **`Dataset generation returned 0 samples`**: Investigate the circuit parameters or noise configuration; the generator raises this error to catch invalid setups early.
- **GPU memory errors**: Reduce `--batch-size` or run on CPU by setting `CUDA_VISIBLE_DEVICES=` before invoking the training script.

## 8. Next Steps

- Commit experiment artefacts (`artifacts_*` folders) to your tracking repository if needed.
- Compare runs by diffing `training_summary.json` files.
- Integrate automated sweeps by wrapping the commands in PowerShell scripts or invoking from CI.

With this workflow, any teammate can activate the environment, generate datasets, and train the PE-LiNN model reproducibly.
