# CLI Dataset Management Walkthrough

This notebook demonstrates how to use the command-line interface for the CNN dataset annotation tool to list, add, update, and remove dataset entries stored in parquet files.


## Overview

Each section introduces a CLI command, explains what it does, and then runs it so you can review the resulting output.


## 1. Environment Setup

Ensure the project dependencies (including PySide6) are installed before running the commands below.


In [None]:
# Optional: install project dependencies in this environment
# %pip install -r ../requirements.txt


We will work from a scratch dataset at `../datasets/cli_demo.parquet` so the demo can be rerun at any time.


In [None]:
from pathlib import Path

DATASET_PATH = Path('../datasets/cli_demo.parquet')
if DATASET_PATH.exists():
    DATASET_PATH.unlink()

DATASET_PATH


For convenience we define a helper that shells out to the CLI and prints its stdout and stderr.


In [None]:
import subprocess

def run_cli(*extra_args):
    cmd = ['python', '-m', 'cnn_dataset_annotation_tool.cli', '--dataset', str(DATASET_PATH)]
    cmd.extend(extra_args)
    print('$', ' '.join(cmd))
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.stdout:
        print(result.stdout, end='')
    if result.stderr:
        print(result.stderr, end='')
    if result.returncode != 0:
        raise RuntimeError(f'Command exited with status {result.returncode}')


### Sample assets

These are the source image and label files used throughout the walkthrough.


In [None]:
from pprint import pprint

image_files = sorted(Path('../datasets/images').glob('*.png'))
label_files = sorted(Path('../datasets/labels').glob('*.tif'))
pprint(image_files)
pprint(label_files)


## 2. Inspect an empty dataset

Listing entries on a brand-new parquet file shows that no data is present yet.


In [None]:
run_cli('list')


## 3. Add image/label pairs

Add entries that point at existing assets and attach metadata such as the data split and notes.


In [None]:
run_cli(
    'add',
    'ID17_1_Image',
    '../datasets/images/ID17_1_Image.png',
    '../datasets/labels/ID17_1_Image.tif',
    '-m', 'split=train',
    '-m', 'notes=baseline capture',
)


In [None]:
run_cli(
    'add',
    'ID22_295_Image',
    '../datasets/images/ID22_295_Image.png',
    '../datasets/labels/ID22_295_Image.tif',
    '-m', 'split=val',
    '-m', 'notes=quality check',
)


After adding entries we can list again to confirm they were written to the parquet file.


In [None]:
run_cli('list')


## 4. Update metadata or assets

Updates allow you to revise metadata fields or swap the underlying image/label files when needed.


In [None]:
run_cli(
    'update',
    'ID17_1_Image',
    '-m', 'notes=field verification complete',
)


In [None]:
run_cli(
    'update',
    'ID22_295_Image',
    '--replace-metadata',
    '-m', 'split=test',
    '-m', 'operator=Alice',
)


Verify that the metadata changes were applied.


In [None]:
run_cli('list')


## 5. Remove entries

Entries that are no longer needed can be deleted from the dataset.


In [None]:
run_cli('remove', 'ID22_295_Image')


A final listing confirms the dataset state.


In [None]:
run_cli('list')
